Hadoop-based Services for Windows Azure includes several samples you can use for learning and testing. In this video, Developer Brad Sarsfield demonstrates two different ways to upload data to Hadoop-Based Services for Windows Azure. After he uploads the data, he uses the WordCount sample (included) to run a MapReduce program on the uploaded data.
Hi, my name is Brad Sarsfield, I’m a Developer on the Hadoop Services for Windows and Windows Azure team. Today I’m going to show you two different ways to upload data into a Hadoop cluster on Windows Azure. Once the data is uploaded to my cluster, I’ll use one of the samples – which are included with Hadoop Services on Windows Azure – to run a word count MapReduce job against the new data in my cluster. To upload the data, I have many options – I can use the Interactive Javascript console, secure FTPS, Azure Blob store, Amazon S3, or import data from Azure Data Market. Let’s start with the JavaScript Interactive Console which I can access from the Hadoop Services on Azure web portal.
Another way to upload data into HDFS on Windows Azure is via secure ftp. The ftp server runs on the headnode inside Windows Azure. We chose secure ftp because regular ftp puts your credentials over the wire in cleartext. Another security requirement is that the FTP password must be MD5hashed.
And there you have the 2nd way to upload data to your Hadoop cluster on Windows Azure. Now it’s time to deploy the wordcount job.
Based on my parameters, the Final Command that will be executed is constructed below.
Each map process reads a line from the file and then parses all of the words. The output of the map is a key-value pair for each word and the number one. The reducers then sum up the counts for each of the words from all of the map outputs and in turn output each word and its total occurrences for the final output.
The Job Page displays status. The Standard Errors section contains messages from Hadoop, things like status, statistics, and informational messages. The Output section contains messages generated by the wordcount Java code.
My job completed successfully. I see that a new file called DaVinciALLTopWords has been created.
Michele [MSFT] edited Revision 54. Comment: move video up
Eric Battalio edited Revision 42. Comment: added link to a specific video in the playlist
Michele [MSFT] edited Revision 35. Comment: mistake
Michele [MSFT] edited Revision 34. Comment: mistake
Michele [MSFT] edited Revision 13. Comment: updated player
Michele [MSFT] edited Revision 12. Comment: naming