Browse by Tags

Tagged Content List
  • Wiki Page: How to Import Data to Hadoop on Windows Azure from Windows Azure Marketplace

    Before you use the Apache Hadoop on Windows Azure portal to import Windows Azure Marketplace data into Hadoop on Windows Azure, you must know the following information: User name: the live ID used to sign in to the marketplace. PassKey Sign in http://datamarket.azure.com with your live ID...
  • Wiki Page: Running HDInsight C# Hadoop Streaming Sample

    MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks. Most of the MapReduce jobs are written in Java. Hadoop provides a streaming API to MapReduce that enables you to write map and reduce functions in languages...
  • Wiki Page: HDInsight Services For Windows

    This article is the main portal for technical information about HDInsight Services for Windows and related Microsoft technologies. It provides a brief overview of Apache Hadoop, as well as information for the HDInsight Services provided by Microsoft for deployment on both Windows and Windows Azure...
  • Wiki Page: Getting Started with the HDInsight Server Developer Preview

    Table of Contents Introduction Installation of Hadoop on Windows The Apache™ Hadoop™-based services on Windows dashboard Getting started with Microsoft Hadoop on Windows Load Some Data Running MapReduce Jobs Running Pig Jobs Running Hive Jobs Additonal Resources: Apache Hadoop, Hadoop on Windows, and...
  • Wiki Page: Installing the Developer Preview of HDInsight Services on Windows

    Introduction The HDInsight Server Developer Preview is an implementation of HDInsight on Windows. This Developer Preview of Apache™ Hadoop™-based services on Windows uses only a single node deployment. HDInsight Server provides a local development environment for the Windows Azure HDInsight Service...
  • Wiki Page: The Hadoop on Azure Pegasus Page Rank Sample

    Overview This tutorial shows how to deploy Pegasus from the Hadoop on Azure portal to compute the page rank for a simple 16-node graph. The rank calculated for a node is a measure of how well connected it is to the other nodes in the graph structure. A graph is type of abstract mathematical structure...
  • Wiki Page: HDInsight Services for Windows Azure QuickStart: Running Hadoop Jobs

    This tutorial shows two ways in which Hadoop MapReduce programs can be run on an Hadoop Distributed File System (HDFS) using HDInsight Services for Windows Azure. Use the Create Job UI to run MapReduce programs written in Java, contained in Hadoop jar files Use the Interactive JavaScript Console...
  • Wiki Page: Analyzing Twitter Data with Hive in HDInsight and SteamInsight

    In this tutorial you will query, explore, and analyze data from twitter using Apache™ Hadoop™-based Services for Windows Azure and a Hive query in Excel. Social web sites are one of the major driving forces for Big Data adoption. Public APIs provided by sites like Twitter are a useful source of data...
  • Wiki Page: Simple recommendation engine using Apache Mahout

    Apache Mahout™ is a machine learning library built for use in scalable machine learning applications. Recommender engines are some of the most immediately recognizable machine learning applications in use today. In this tutorial you use the Million Song Dataset to create song recommendations for users...
  • Wiki Page: Working With Data in Windows Azure HDInsight Service

    This tutorial covers several techniques for storing and importing data for use in Hadoop MapReduce jobs run with Windows Azure HDInsight Service ( formerly Apache™ Hadoop™-based Services for Windows Azure). Apache Hadoop is a software framework that supports data-intensive distributed applications...
  • Wiki Page: Introduction to HDInsight Services for Windows Azure

    Overview HDInsight Services for Windows Azure is a service that deploys and provisions Apache™ Hadoop™ clusters in the cloud, providing a software framework designed to manage, analyze and report on big data. Data is described as "big data" to indicate that it is being collected in ever...
  • Wiki Page: Hadoop on Azure WordCount Sample Tutorial

    Overview This tutorial shows two ways to use Hadoop on Azure to run a MapReduce program that counts word occurences in a text. First, with a Hadoop .jar file by using the Create Job UI. Second, with a query by using the fluent API layered on Pig that is provided by the Interactive Console . The...
  • Wiki Page: Analyzing Twitter Movie Data with Hive in HDInsight

    In this tutorial you will query, explore, and analyze data from twitter using Apache™ Hadoop™-based Services for Windows Azure and a Hive query in Excel. Social web sites are one of the major driving forces for Big Data adoption. Public APIs provided by sites like Twitter are a useful source of data...
  • Wiki Page: The Hadoop on Azure Sqoop Import Sample Tutorial

    Overview This tutorial shows how to use Sqoop to import data from a SQL database on Windows Azure to an Hadoop on Azure HDFS cluster. While Hadoop is a natural choice for processing unstructured and semi-structured data, such as logs and files, there may also be a need to process structured data...
  • Wiki Page: The Hadoop on Azure Pi Estimator Sample Tutorial

    Overview This tutorial shows how to deploy a MapReduce program that uses a statistical (quasi-Monte Carlo) method to estimate the value of Pi. Points placed at random inside of a unit square also fall within a circle inscribed within that square with a probability equal to the area of the circle...
  • Wiki Page: The Hadoop on Azure Pegasus Degree Distribution Sample Tutorial

    Overview This tutorial shows how to deploy Pegasus from the Hadoop on Azure portal to compute the degree of each node and the distribution of degrees for a simple 16-node graph. The degree distribution gives the number of nodes in the graph at each degree. The degree of a node in a network (or...
  • Wiki Page: The Hadoop on Azure Mahout Clustering Sample Tutorial

    Overview This tutorial illustrates how to use Hadoop on Azure to do cluster analysis with Mahout. The various forms of cluster analysis attempt to answer the problem: given a collection of objects with values for a set of properties, devise a scheme for grouping them where similar ones are put...
  • Wiki Page: The Hadoop on Azure Mahout Classification Sample Tutorial

    Overview This tutorial illustrates how to use Apache Mahout in Hadoop on Azure to do classification. Classification techniques attempt to answer the question: how much some object is or is not part of some type or category, or, whether it does or does not have some attribute. The sample used...
  • Wiki Page: Hadoop on Azure 10 GB GraySort Sample Tutorial

    Overview This tutorial shows how to run a general purpose GraySort on a 10 GB file using Hadoop on Azure. A GraySort is a benchmark sort whose metric is the sort rate (TB/minute) that is achieved while sorting a very large amount of data, usually a 100 TB minimum. This sample uses a more modest 10...
  • Wiki Page: The Hadoop on Azure Pegasus Page Rank Sample Tutorial

    Overview This tutorial shows how to deploy Pegasus from the Hadoop on Azure portal to compute the page rank for a simple 16-node graph. The rank calculated for a node is a measure of how well connected it is to the other nodes in the graph structure. A graph is type of abstract mathematical structure...
  • Wiki Page: Restoring HDFS Metadata from a Backup

    Transcript In the event of a catastrophic failure, your Hadoop cluster can be restored from a backup. In a previous video, Brad Sarsfield demonstrated how to configure and use the namenode backup service to save a copy of your HDFS metadata to your Azure storage account. In this video, he shows...
Page 1 of 1 (21 items)
Can't find it? Write it!