- Notifications
You must be signed in to change notification settings - Fork0
skotak2/Pasrsing-Text-with-MapReduce-programming-Paradigm-with-multithreading
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Perform processing of text and count the occurence of each word using map-reduce concept amd mimic Hadoop infrastructure with parallel processing. Multi-threading is used to execute two mapper and reducer functions.
Project is created with:
- Python - Multi-Threading
The data is made availablehere
Consider the following Text - "I am a human being. I am a Data Scientist"
MAP : Read Input and produce a set of key value pairs
(I,1)(am,1)(a,1)(human,1)(being,1)(I,1)(am,1)(a,1)(Data,1)(Scientist,1)
GroupBy : Collect all pairs with same key
(I,1),(I,1) | (am,1),(am,1) | (a,1),(a,1) | (human,1),(being,1) | (Data,1),(Scientist,1)
Reduce : Collect all values belonging to the key & output
(I,2) | (am,2) | (a,2) | (human,1) | (being,1) | (Data,1) | (Scientist,1)
Here we implement the concept of multithreading, to parallelize the process. Map Reduce is divided into sub tasks in parallel & aggregate teh results of sub-totals to final output. The process of mapping key to value and further aggregating them through reducers is achieved by the theards.
With the above concept in place, we implement the setup in the following steps:
Step1 : Map for key value pairs with multiple mappers
Step2 : Sort the values and load in to the partition holder
Step3 : Multiple Reducers to pic from the partition and aggregate them
The above steps will yield a list of outputs from the reducer, which could be concatenated and loaded into a datafram or a spreasheet
The deployed model can be accessed from the url from any system to translate kannada sentences to english.
About
Understand how map reduce works for parsing a text data with parallel processing of sub tasks using multi threading
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.

