ScrapeGraphAI/Scrapegraph-aiPublic

NotificationsYou must be signed in to change notification settings
Fork1.9k
Star21.9k

MergeAnswersNode module understanding#712

Unanswered

prateekkohli21 asked this question inQ&A

prateekkohli21

Sep 30, 2024

· 2 comments· 3 replies

Return to top

Discussion options

prateekkohli21
Sep 30, 2024

I am trying to use SmartScraperMultiGraph and have a small doubt regarding its operation. When answers from all the URLs are merged using the MergeAnswersNode module, it sends all those answers to the LLM at once. Are there chances that the combined size might exceed the context size of the LLM? I cannot see any chunking or other methods to reduce the context size in the MergeAnswersNode module. Is my understanding correct?

    smart_scraper_graph = SmartScraperMultiGraph(    prompt = prompt,    # also accepts a string with the already downloaded HTML code    source=ref_url_list,    config=graph_config    )    result = smart_scraper_graph.run()    return result

You must be logged in to vote

Replies: 2 comments 3 replies

Comment options

VinciGit00
Sep 30, 2024
Maintainer

The chunking is made in the parse node, it should not create a context window overlap

You must be logged in to vote

1 reply

Comment options

prateekkohli21 Oct 1, 2024
Author

Thanks for the reply@VinciGit00.

Suppose there are 50 URLs, MergeAnswersNode will combine the results from all of these and send them to LLM at once. Is my understanding correct?

If yes, couldn't these combined answers potentially exceed LLMs context size?

Comment options

VinciGit00
Oct 1, 2024
Maintainer

not once, it depends of the tokens of your llm, if they overlap it they will use chunks

You must be logged in to vote

2 replies

Comment options

prateekkohli21 Oct 1, 2024
Author

Sorry if the question seems very basic, but I couldn't find this information in the code for the MergeAnswersNode Module. It is just merging all the answers into 1 string and sending that complete string to PromptTemplate.

Is it the feature of Langchain's PromptTemplate to create chunks based on LLM's context size?

    # merge the answers in one string    answers_str = ""    for i, answer in enumerate(answers):        answers_str += f"CONTENT WEBSITE {i+1}: {answer}\n"    output_parser = JsonOutputParser()    format_instructions = output_parser.get_format_instructions()    template_merge = """    You are a website scraper and you have just scraped some content from multiple websites.\n    You are now asked to provide an answer to a USER PROMPT based on the content you have scraped.\n    You need to merge the content from the different websites into a single answer without repetitions (if there are any). \n    The scraped contents are in a JSON format and you need to merge them based on the context and providing a correct JSON structure.\n    OUTPUT INSTRUCTIONS: {format_instructions}\n    You must format the output with the following schema, if not None:\n    SCHEMA: {schema}\n    USER PROMPT: {user_prompt}\n    WEBSITE CONTENT: {website_content}    """    prompt_template = PromptTemplate(        template=template_merge,        input_variables=["user_prompt"],        partial_variables={            "format_instructions": format_instructions,            "website_content": answers_str,            "schema": self.node_config.get("schema", None),        },    )    merge_chain = prompt_template | self.llm_model | output_parser    answer = merge_chain.invoke({"user_prompt": user_prompt})    # Update the state with the generated answer    state.update({self.output[0]: answer})

Comment options

VinciGit00 Oct 1, 2024
Maintainer

yes

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MergeAnswersNode module understanding#712

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

prateekkohli21
Sep 30, 2024

Replies: 2 comments 3 replies

Uh oh!

{{title}}