Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
r/UnstructuredIO iconr/UnstructuredIO icon

r/UnstructuredIO

members
online
Best
Open sort options
Change post view

Community highlights

  • Welcome to r/UnstructuredIO 👋 Introduce yourself!

    Sara_Unstructured votes • comments

  • We're open-sourcing our document parsing evaluation framework + a new benchmark dataset We're open-sourcing our document parsing evaluation framework + a new benchmark dataset
    News

    We've been evaluating document parsing systems internally for a while now, and today we're releasing SCORE-Bench to the community: 224 real-world documents with expert annotations, plus our complete evaluation methodology and code.

    The dataset includes documents that challenge parsers in production: scanned forms with visual degradation, financial reports with deeply nested tables, multi-column layouts, mixed handwriting and printed text. Every document has been manually annotated by domain experts rather than algorithmically labeled.

    We built the SCORE, our evaluation framework to handle the tricky parts of comparing generative systems fairly—like recognizing when different structural representations are semantically equivalent. We've already open-sourced the framework and methodology. Now with SCORE-Bench, we hope you can benchmark your own systems using the same approach, reproduce results, and track progress as document parsing evolves.

    What we're releasing:

    We built this to solve our own evaluation needs but figured the community might find it useful too. We’d love to hear about what documents break your systems usually, other evaluation scenarios that you follow, and more importantly, how we could make this more useful for your case. Feel free to reach out to us!


    Experience a new level of design automation and accuracy: AI-powered Designcenter X NX accelerates every step, from concept to completion. Compare bundles and unlock the advanced tools trusted by leading engineers, choose yours today.
    Experience a new level of design automation and accuracy: AI-powered Designcenter X NX accelerates every step, from concept to completion. Compare bundles and unlock the advanced tools trusted by leading engineers, choose yours today.

    plm.sw.siemens.com
    Learn More

    New Notebook Alert: RAG Over Evolving Documents New Notebook Alert: RAG Over Evolving Documents
    Tutorial

    Connectors are soooo underrated. We take so many of their features for granted as we use them.
    In this notebook, I cover how change detection works and how your databases can be kept up-to-date with all the latest documents all by setting a single boolean input :)

    Check it outhere!


    Upcoming Webinar: RAG Over Evolving Enterprise Knowledge Upcoming Webinar: RAG Over Evolving Enterprise Knowledge
    Event

    Your enterprise knowledge base is constantly changing. New document versions, updated policies, revised procedures. But here's the problem: most RAG pipelines handle this by reprocessingeverything each time, but full reprocessing on every sync is expensive and slow. You need a way to track what's actually changed and only process those deltas without breaking your pipeline.

    What we're covering:

    • Understanding incremental processing — when to use delta-aware syncs vs. full reprocessing, and why it matters

    • See how connectors detect changes across different storage systems and track evolving files

    • Follow a practical example with a live walkthrough of evolving documents

    The core question: How do you keep your RAG system current with evolving knowledge without wasting compute and time reprocessing unchanged content?

    📺Dec 4 at 10am PT / 1pm ET -Register here

    Stay for a live demo and Q&A session with the experts.


    Created Aug 28, 2025
    Public

    Anyone can view, post, and comment to this community

    80



    ETL+ for GenAI Data


    r/UnstructuredIO Rules

    Keep discussions professional and courteous.

    Posts should focus on UnstructuredIO, unstructured data, or GenAI topics.

    Promotions or irrelevant links will be removed.

    Ask thoughtful questions and provide useful answers.

    Don’t share private or sensitive information.


    Moderators

    View all moderators

    [8]
    ページ先頭

    ©2009-2026 Movatter.jp