Hadoop-based Services for Windows Azure includes several samples you can use for learning and testing. One sample is the 10GB GraySort which is a scaled-down version of the Hadoop Terasort benchmark. There are three jobs to run and in this video, Developer Brad Sarsfield walks you through Teraval.
Hi, my name is Brad Sarsfield and I’m a Developer on the Hadoop-based Services for Windows and Windows Azure team.
This video is Part 3 in the 10GB GraySort series. In videos 1 and 2 we generated and sorted the data. In this video I will show you how to validate that the sort was correct.
So let’s get started.
Hadoop for Windows Azure constructs and displays the Final Command that will be executed on the headnode below.
Behind the scenes, Hadoop is validating that each of the parts of the file have been sorted correctly. It goes through and validates that the sorting is correct and the records have been sorted in the correct order.
While the job is running, I switch over to the terminal services view and review the 10GB-sort-output, the output from the terasort example. Here it is. I take a look at one of these files. There are 25 files – they correspond with the number of reduce tasks we requested.
I take a closer look at one of those files, part 9. The data in this file is sorted from AH and ends with AHt. So the teravalidate program is now using the 10GB-sort-output, the output of the terasort, to validate that this is in fact actually the correct sort order.
But ‘success’ doesn’t mean that the the sort is valid, it just means the task completed successfully. To see if the sort order is valid, take a look at the Exit Code and the Logs. Exit Code is 0 and the log file is empty – a zero byte file. This indicates the sort was correct.
That concludes the 10GB GraySort sample video series. Thank you for watching, I hope you found it helpful.
Michele [MSFT] edited Original. Comment: removed wrong video