MapReduce is a key method for the parallel analysis of business data. Unlike statistical methods, MapReduce is an umbrella term for the way data split data may be processed in several steps, and then the results may be combined. Each data analysis may need its own MapReduce implementation.

Read Chapter 2 of the Mining of Massive Dataset textbook.

Note that the book has supplementary materials for your convenience.

Pick one of the uses of MapReduce algorithms (2.3.4 – 2.3.10). In your initial post explain how MapReduce will work in this case. Provide a numerical example, demonstrating what mapper and reduciques will be doing in that case.

Leave a Reply