The Microsoft Azure deployment of Hadoop Services for Windows lets you set up a private Hadoop cluster on Azure. One of the included administration/deployment tools is an Interactive Console for JavaScript and Hive. This video introduces the Interactive JavaScript console. Tester David Zhang demonstrates running several JavaScript commands against your Hadoop cluster.
Introduction to the Hadoop Services on Azure Interactive JavaScript Console (video)
Hi, my name is David Zhang and I'm a Tester on the Microsoft Hadoop Services for Windows team. In this video I'll introduce you to the Interactive JavaScript console on Windows Hadoop.
We have a bunch of commands in the browser and in this video I’m going to do a quick walkthrough of those features. I’ll use the WordCount sample to find the top 10 most-common words in the Gutenberg samples that come installed with Hadoop Services on Windows Azure.
Upload the Sample Files to the HDFS
So now the 3 files are uploaded. Let’s review what I’ve done so far.
The next thing I'll do is use the Gutenberg files as input and run the JavaScript MapReduce program on it. But I’m going to do this in such a way that I order the word count I get descending by the top 10.
a) takes input from the gutenbery directory,
js>From(“gutenburg”).mapReduce(“WordCount.js”, “word, count:long”).orderBy(“count DESC”).take(10).to(“gbtop10”)
As expected, the is the most-common word in the corpus, with 47,430 occurrences.
So now that I have these results in the HDFS I can read these results back out into my JavaScript console. fs.read is a function we provide that allows me to do that.
Js> file = fs.read(“gbtop10”)
If I don't specify any files, it reads all the files in that directory and concatenates them into one string. And that data is now stored in the variable file.
js> data = parse(file.data, “word, count:long”)
I tell it to parse the file data and I give it a schema string and you notice that it's the same schema string that I used before. So I’m saying that the data I am using is actually 2 columns. The first column I call word and the second column is called count of type long. The first column I don't specify a type and it defaults to string.
This data is now a JavaScript array and it has all the standard JavaScript things that JavaScript arrays come with.
The good thing about this is that I can push this into our graphing function, because our graphing functions can take a JavaScript array with this schema and this data.
What I get is a bar graph of the top 10 words in the Gutenberg examples. This bar graph is made using SVG which is a new HTML5 feature. I can do all sorts of things with it.
That's the end of the demo. Thank you for watching.
Michele [MSFT] edited Revision 21. Comment: moved video up
Michele [MSFT] edited Revision 20. Comment: naming
Nice demo. I followed along with the windows version. For some reason, that version requires "pig.from" instead of just "from" in the pig query/script. I also had to change my machine's name. I think that something in the Java.net package does not like the way Windows names their machines. I just got rid of the hyphens and underscores.