Map Reduce

MapReduce is a design pattern suitable when you have either:

Large input data (e.g., multiple files to process), or
Large output data (e.g., multiple forms to fill)

and there is a logical way to break the task into smaller, ideally independent parts.

You first break down the task usingBatchNode in the map phase, followed by aggregation in the reduce phase.

Example: Document Summarization

classSummarizeAllFiles(BatchNode):defprep(self,shared):files_dict=shared["files"]# e.g. 10 filesreturnlist(files_dict.items())# [("file1.txt", "aaa..."), ("file2.txt", "bbb..."), ...]defexec(self,one_file):filename,file_content=one_filesummary_text=call_llm(f"Summarize the following file:\n{file_content}")return(filename,summary_text)defpost(self,shared,prep_res,exec_res_list):shared["file_summaries"]=dict(exec_res_list)classCombineSummaries(Node):defprep(self,shared):returnshared["file_summaries"]defexec(self,file_summaries):# format as: "File1: summary\nFile2: summary...\n"text_list=[]forfname,summinfile_summaries.items():text_list.append(f"{fname} summary:\n{summ}\n")big_text="\n---\n".join(text_list)returncall_llm(f"Combine these file summaries into one final summary:\n{big_text}")defpost(self,shared,prep_res,final_summary):shared["all_files_summary"]=final_summarybatch_node=SummarizeAllFiles()combine_node=CombineSummaries()batch_node>>combine_nodeflow=Flow(start=batch_node)shared={"files":{"file1.txt":"Alice was beginning to get very tired of sitting by her sister...","file2.txt":"Some other interesting text ...",# ...}}flow.run(shared)print("Individual Summaries:",shared["file_summaries"])print("\nFinal Summary:\n",shared["all_files_summary"])

Performance Tip: The example above works sequentially. You can speed up the map phase by running it in parallel. See(Advanced) Parallel for more details.

Movatterモバイル変換

Map Reduce

Example: Document Summarization