Map Reduce Explained U﻿sing SQL

Map Reduce Explained Using SQL

6/2/2016

One of the key components in the Hadoop framework is Map Reduce. It is a distributed data processing model and execution environment that runs on large clusters of commodity machines. It uses the MapReduce algorithm which breaks down all operations into Map, Sorting and Reduce functions. Coming from a non-programming background can make this a difficult concept to understand. A good analogy to use when trying to understand Map Reduce is to associate it to SQL based concepts. Most of the world is familiar with relational databases and SQL, thus, putting it in that context can put a tangible definition to Map Reduce.

Listed below I have broken out each of the components of Map Reduce and provided a SQL construct as association. It is important to note that Map Reduce jobs and SQL Queries have their similarities but also can have very different purposes. As such, there are times when the SQL analogy cannot be used verbatim or may not align 100%.

MAP (From, Where, Union)

Maps are individual tasks that seek out and retrieve Key Value pairs that match a criteria. Although Maps are functions, when using the analogy of SQL it is easiest to think of them as the FROM, WHERE and UNION operations in a SQL query. Maps act to seek out data sets before any reducing, shuffling or sorting is performed . In SQL, the FROM, WHERE and UNION operations seek out data sets before reducing (SUM, COUNT, MIN...etc), shuffling or sorting (GROUP BY, ORDER BY) is performed.

Reduce (Aggregation)

Reduces are tasks that aggregate, summarize and reduce the shuffled and sorted outputs of the mappers. Using the analogy of SQL, the Reduce task is similar to the aggregate functions like SUM, DISTINCT, COUNT, MIN, MAX, AVG...etc in a SQL query. Reducers act to aggregate and summarize the data from a full data set into a smaller and manageable output that was originally specified by the user or job. In SQL, the aggregate functions (SUM, DISTINCT, COUNT, MIN, MAX, AVG..etc) operations act to reduce the data across multiple tables into a smaller output specified by the query or user.

Shuffle & Sort (Group By, ORDER BY)

Shuffles and Sorts are tasks that group, sort and organize the outputs of the mappers. Using the analogy of SQL the Shuffle and Sort tasks closely resemble the GROUP BY and ORDER BY operations in a SQL query. Shuffles and Sorts are key to organizing the data in formats and orders that are easiest to perform reduce functions upon. In SQL, the GROUP BY and ORDER BY operations act similarly to organize and group data so that reducing (SUM, COUNT, MIN...etc) functions can be easily performed.

21 Comments

Archives

Disclaimer
All content represented in this blog is that of the owner. They do not represent any connection with Apache, HortonWorks, Cloudera or any other company. This blog does not claim ownership of any of the content as original thought. This blog will not be held accountable or take any responsibility for any content. All views and recommendations are based upon the opinion of the owner.

Examhelpline link

12/30/2016 09:35:38 pm

Really this is very great information sharing with us..Thanks lot.<a href="http://Examhelpline.in">Examhelpline.in</a>

1/8/2017 11:56:33 pm

Really this is very great information sharing with us. Thanks lot.<a href="http://competition.examhelpline.in">Examhelpline.in</a>

examhelpline link

1/23/2017 03:45:49 am

such very good detail. This is the best sites for proving such kinds of good information. <a href="http://medical.examhelpline.in/yu-pget-entrance-exam">YU PGET entrance exam</a>

1/29/2017 03:35:44 am

Thanks for such very great information. This is the best sites for proving such kinds of good information.<a href="http://school.examhelpline.in/meghalaya-10th-board-sslc-examination-schedule">Meghalaya 10th Board SSLC Examination Schedule 2017</a>

1/30/2017 02:57:40 am

Such very good information. Thank you for your sites for proving such kinds of good information.
<a href="http://school.examhelpline.in/board-exams/12th-exams/tn-12th-board-time-table">TN 12th board Time Table 2017</a>

2/6/2017 03:44:52 am

such a amazing information provide your site thank you a lot..
<a href="http://engineering.examhelpline.in/gate-me-answer-key"> GATE ME Answer Key 2017</a>
<a href="http://engineering.examhelpline.in/gate-ec-answer-key"> GATE EC Answer Key 2017</a>
<a href="http://mba.examhelpline.in/mat-answer-key-2016-of-7th-february">mba.examhelpline.in</a>
<a href="http://engineering.examhelpline.in/tbjee-admit-card-2016">TBJEE Admit Card 2017</a>
<a href="http://engineering.examhelpline.in/viteee-application-form">VITEEE Application Form 2017</a>
<a href="http://engineering.examhelpline.in/aeee-application-form">AEEE Application Form 2017</a>
<a href="http://school.examhelpline.in/gseb-ssc-result">GSEB SSC result 2017</a>
<a href="http://school.examhelpline.in/cbse-10th-result">CBSE 10th class result 2017</a>
<a href ="http://competition.examhelpline.in/ca-ipcc-exam-centres">CA IPCC Exam Centres 2017</a>

3/4/2017 03:26:32 am

really nice post sharing with us...thanks.
<a href="http://www.school.examhelpline.in/ap-10th-board-result">AP 10th board result 2017</a>
<a href="http://www.school.examhelpline.in/assam-board-hslc-result">Assam board HSLC result 2017</a>
<a href="http://www.school.examhelpline.in/arunachal-pradesh-board-10th-class-result">Arunachal Pradesh board 10th class result 2017</a>
<a href="http://www.school.examhelpline.in/cgbse-10th-result">Bihar board Matric result 2017</a
<a href="http://www.school.examhelpline.in/cgbse-10th-result">CGBSE 10th result 2017</a>
<a href="http://www.school.examhelpline.in/cbse-10th-result">CBSE 10th class result 2017</a>
<a href="http://www.school.examhelpline.in/goa-hssc-result-2016">Goa HSSC Result 2017</a>
<a href="http://www.school.examhelpline.in/gseb-ssc-result">GSEB SSC result 2017</a>

3/21/2017 05:27:40 am

amazing post thanks a lot sharing with us.
<a href="http://www.competition.examhelpline.in/mp-set-answer-key/">MP SET Answer Key 2017</a>
<a href="http://www.competition.examhelpline.in/mp-set-result">MP SET Result 2017</a>
<a href="http://www.competition.examhelpline.in/gujarat-police-result/">Gujarat Police Result 2017</a>
<a href="http://www.competition.examhelpline.in/jkpsc-cce-answer-key/">JKPSC CCE ANswer Key2017</a>
<a href="http://www.competition.examhelpline.in/ssc-cgl-answer-key">SSC CGL Answer Key 2017</a>

Dipika Bhoi link

7/30/2017 11:29:02 pm

Thanks for providing such nice information to us. It provides such amazing information on <a href="http://www.careinfo.in/">care/</a>as well
<a href="http://www.carebaba.com/">Health/</a>.The post is really helpful and very much thanks to you. The information can be really helpful on health, care as well as on
<a href="http://www.examhelpline.in/"> exam/</a> tips. The post is really helpful.
Thanks for providing such nice information to us. It provides such amazing information on <a href="http://medical.examhelpline.in/">Medical Exams/</a>

system design architecture interview questions link

9/19/2017 10:03:38 pm

thanks for the information!

Data Science Training in Hyderabad link

10/13/2017 11:58:23 pm

Thanks for Sharing the article in the blog..I have clearly understood the MapReduce Framework and its parallel distributing concepts and the entire execution environment which runs on large cluster environment.

Big Data Analytics Training In Hyderabad link

10/16/2017 04:08:08 am

Hi,
Thanks for sharing such a great article with us on Big Data.
We are expecting more articles from this blog
Thank you

Sridhar link

11/9/2017 10:11:01 pm

Thanks for sharing the real example of Mapreduce Program. i really liked it

<a href="http://www.online-trainings.org/hadoop-online-training/">Big Data hadoop online training in hyderabad</a>

Eronita Scott link

4/16/2018 08:40:53 pm

Thanks for Sharing - Great Article

4/16/2018 08:42:54 pm

Thanks Again..

kevin george link

5/10/2018 05:02:51 am

Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area.

lenin link

5/11/2018 02:42:51 am

Very impressive blog thanks for this update. keep on blogging, we are excepting these types of news only thanks.
<a href https://www.besanttechnologies.com/training-courses/cloud-computing-training/amazon-web-services-training-institute-in-chennai>AWS Training in chennai </a>

Udal Kumar Bharti link

7/19/2018 03:17:52 am

Thanks for your post. valuable information . Big Data Admin Hadoop training course is a comprehensive training designed by industry experts considering current industry job requirements to provide in-depth learning on Big Data and Hadoop Admin Modules. Cloud Lab access will be provided for a month.

To get more details please visit - http://www.knowledgeera.co.in/

DevOps Online Training link

7/26/2018 06:14:41 am

Hi,
Thanks for sharing such a nice information on DevOps ,we are expecting more information on DevOps.

Naman Modi link

12/21/2018 12:03:13 am

Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing

Ashish Doneriya link

9/20/2020 12:31:02 am

It would be better if you display just summary of blog in index pages