Browse by Tags

Tagged Content List
  • Wiki Page: Running HDInsight C# Hadoop Streaming Sample

    MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks. Most of the MapReduce jobs are written in Java. Hadoop provides a streaming API to MapReduce that enables you to write map and reduce functions in languages...
  • Wiki Page: The Hadoop on Azure Pegasus Page Rank Sample

    Overview This tutorial shows how to deploy Pegasus from the Hadoop on Azure portal to compute the page rank for a simple 16-node graph. The rank calculated for a node is a measure of how well connected it is to the other nodes in the graph structure. A graph is type of abstract mathematical structure...
  • Wiki Page: HDInsight Services for Windows Azure QuickStart: Running Hadoop Jobs

    This tutorial shows two ways in which Hadoop MapReduce programs can be run on an Hadoop Distributed File System (HDFS) using HDInsight Services for Windows Azure. Use the Create Job UI to run MapReduce programs written in Java, contained in Hadoop jar files Use the Interactive JavaScript Console...
  • Wiki Page: Analyzing Twitter Data with Hive in HDInsight and SteamInsight

    In this tutorial you will query, explore, and analyze data from twitter using Apache™ Hadoop™-based Services for Windows Azure and a Hive query in Excel. Social web sites are one of the major driving forces for Big Data adoption. Public APIs provided by sites like Twitter are a useful source of data...
  • Wiki Page: Simple recommendation engine using Apache Mahout

    Apache Mahout™ is a machine learning library built for use in scalable machine learning applications. Recommender engines are some of the most immediately recognizable machine learning applications in use today. In this tutorial you use the Million Song Dataset to create song recommendations for users...
  • Wiki Page: Working With Data in Windows Azure HDInsight Service

    This tutorial covers several techniques for storing and importing data for use in Hadoop MapReduce jobs run with Windows Azure HDInsight Service ( formerly Apache™ Hadoop™-based Services for Windows Azure). Apache Hadoop is a software framework that supports data-intensive distributed applications...
  • Wiki Page: Introduction to HDInsight Services for Windows Azure

    Overview HDInsight Services for Windows Azure is a service that deploys and provisions Apache™ Hadoop™ clusters in the cloud, providing a software framework designed to manage, analyze and report on big data. Data is described as "big data" to indicate that it is being collected in ever...
  • Wiki Page: Hadoop on Azure WordCount Sample Tutorial

    Overview This tutorial shows two ways to use Hadoop on Azure to run a MapReduce program that counts word occurences in a text. First, with a Hadoop .jar file by using the Create Job UI. Second, with a query by using the fluent API layered on Pig that is provided by the Interactive Console . The...
  • Wiki Page: Analyzing Twitter Movie Data with Hive in HDInsight

    In this tutorial you will query, explore, and analyze data from twitter using Apache™ Hadoop™-based Services for Windows Azure and a Hive query in Excel. Social web sites are one of the major driving forces for Big Data adoption. Public APIs provided by sites like Twitter are a useful source of data...
  • Wiki Page: The Hadoop on Azure Sqoop Import Sample Tutorial

    Overview This tutorial shows how to use Sqoop to import data from a SQL database on Windows Azure to an Hadoop on Azure HDFS cluster. While Hadoop is a natural choice for processing unstructured and semi-structured data, such as logs and files, there may also be a need to process structured data...
  • Wiki Page: The Hadoop on Azure Pi Estimator Sample Tutorial

    Overview This tutorial shows how to deploy a MapReduce program that uses a statistical (quasi-Monte Carlo) method to estimate the value of Pi. Points placed at random inside of a unit square also fall within a circle inscribed within that square with a probability equal to the area of the circle...
  • Wiki Page: The Hadoop on Azure Pegasus Degree Distribution Sample Tutorial

    Overview This tutorial shows how to deploy Pegasus from the Hadoop on Azure portal to compute the degree of each node and the distribution of degrees for a simple 16-node graph. The degree distribution gives the number of nodes in the graph at each degree. The degree of a node in a network (or...
  • Wiki Page: The Hadoop on Azure Mahout Clustering Sample Tutorial

    Overview This tutorial illustrates how to use Hadoop on Azure to do cluster analysis with Mahout. The various forms of cluster analysis attempt to answer the problem: given a collection of objects with values for a set of properties, devise a scheme for grouping them where similar ones are put...
  • Wiki Page: The Hadoop on Azure Mahout Classification Sample Tutorial

    Overview This tutorial illustrates how to use Apache Mahout in Hadoop on Azure to do classification. Classification techniques attempt to answer the question: how much some object is or is not part of some type or category, or, whether it does or does not have some attribute. The sample used...
  • Wiki Page: Hadoop on Azure 10 GB GraySort Sample Tutorial

    Overview This tutorial shows how to run a general purpose GraySort on a 10 GB file using Hadoop on Azure. A GraySort is a benchmark sort whose metric is the sort rate (TB/minute) that is achieved while sorting a very large amount of data, usually a 100 TB minimum. This sample uses a more modest 10...
  • Wiki Page: The Hadoop on Azure Pegasus Page Rank Sample Tutorial

    Overview This tutorial shows how to deploy Pegasus from the Hadoop on Azure portal to compute the page rank for a simple 16-node graph. The rank calculated for a node is a measure of how well connected it is to the other nodes in the graph structure. A graph is type of abstract mathematical structure...
  • Wiki Page: Interactive Javascript console on MDH

    Table of Contents Getting Started Walkthrough: Visualizing Word Count 1. Write the Javascript MapReduce script 2. Upload the script and input data 3. Run the query 4. Visualize the results By David Zhang. Getting Started The Microsoft Distribution of Hadoop comes with a web-based interactive...
  • Wiki Page: Fluent Queries on the Interactive JavaScript Console

    By Alejandro Trigo. Table of Contents Querying Data Applying Schemas Input Data Format Executing Queries Storing and Reusing Queries JavaScript Predicates Data Projection Filtering Sorting Aggregation Limiting the Result Set Map Reduce Jobs as Queries The interactive JavaScript console that...
  • Wiki Page: Introduction to Hadoop Services on Azure Interactive JavaScript Console (video)

    The Microsoft Azure deployment of Hadoop Services for Windows lets you set up a private Hadoop cluster on Azure. One of the included administration/deployment tools is an Interactive Console for JavaScript and Hive. This video introduces the Interactive JavaScript console. Tester David Zhang demonstrates...
  • Wiki Page: Running MapReduce Jobs on the Javascript Interactive Console

    By David Zhang. Javascript MapReduce The interactive Javascript console supports running a MapReduce job defined in Javascript. For example, the canonical word count example can be written as follows: var map = function (key, value, context) { var words = value.split(/[^a-zA-Z]/); for...
  • Wiki Page: Data Visualization on the Interactive Javascript Console

    By Beth Mantey. Data Format The graphing library will display data in the form of a bar, pie, or line chart. The library expects the data to contain an array of objects, each with at least two properties with the same property names and data types (at least one of which is numeric): var wordCounts...
  • Wiki Page: HDFS Operations on the Interactive Javascript Console

    By David Zhang. Shell Commands The web console supports execution of HDFS commands via the Javascript function “fs”. For example, you can run: fs( "ls" ) This function exposes the same set of file system commands that is available from running “hadoop...
  • Wiki Page: Interactive Javascript Console Session Management

    By David Zhang. The Javascript console supports the concept of sessions. You can save the session with a name, close your browser, and later come back, reload the session and resume where you left off. When you save a session, the following pieces of information are saved: Console output ...
Page 1 of 1 (23 items)
Can't find it? Write it!