Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[MEDI] Allow Pipeline and Reader to work with any Source#7090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
adamsitnik wants to merge9 commits intodotnet:main
base:main
Choose a base branch
Loading
fromadamsitnik:anySource

Conversation

@adamsitnik
Copy link
Member

@adamsitnikadamsitnik commentedNov 27, 2025
edited
Loading

The idea is to introduce a new interface, calledIIngestionDocumentReader, where the generic type parameter specifies the source. Source can be anything (FileInfo, Stream but also int or Guid or custom user type). It's up to the reader to get the document for given source (parse a file or read it from DB or somewhere else).

publicinterfaceIIngestionDocumentReader<TSource>{Task<IngestionDocument>ReadAsync(TSourcesource,stringidentifier,string?mediaType=null,CancellationTokencancellationToken=default);}

TheIngestionDocumentReader class remains, we are still opinionated and believe that readers should implementFileInfo andStream support. It implements the new interface.
But we also know that it's not enough for all the scenarios, so Pipline no longer accepts the old reader class, but something that implements the new interface.

Because of that, the pipeline now needs to specify two generic type arguments instead of one. This is a breaking change.
Moreover, pipeline itself no longer defines methods for processing multiple files or directory. This functionality was moved to extension methods.

fixes#7082

Microsoft Reviewers:Open in CodeFlow

Copy link
Contributor

CopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Pull request overview

This PR refactors the data ingestion pipeline to be generic over source types, allowing it to work with any source type beyond justFileInfo andStream. The key change introducesIIngestionDocumentReader<TSource> interface and convertsIngestionPipeline toIngestionPipeline<TSource, TChunk>.

Key changes:

  • IntroducesIIngestionDocumentReader<TSource> interface to enable generic source type support
  • RefactorsIngestionPipeline<T> toIngestionPipeline<TSource, TChunk> with two type parameters
  • Extracts file system-specific operations intoFileSystemIngestionExtensions class
  • Moves media type detection logic toMediaTypeProvider utility class

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
FileDescription
IIngestionDocumentReader.csNew generic interface for document readers with any source type
IngestionDocumentReader.csUpdated abstract class to implement new interface and useMediaTypeProvider
IngestionPipeline.csConverted to generic over source type; file system operations moved to extensions
FileSystemIngestionExtensions.csNew extension class withFileInfo-specific processing methods
MediaTypeProvider.csNew utility class for media type detection, extracted fromIngestionDocumentReader
MarkdownReader.cs, MarkItDownReader.cs, MarkItDownMcpReader.csUpdated to makemediaType parameter nullable
Test filesUpdated to use generic type parameters and new test cases
Comments suppressed due to low confidence (1)

src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/IngestionDocumentReader.cs:1

  • The.markdown and.md extensions mapping is duplicated in bothIngestionDocumentReader.cs (line 41) andMediaTypeProvider.cs (line 41). SinceMediaTypeProvider is now the centralized location for media type mappings and is being linked into this project, this duplication should be removed.
// Licensed to the .NET Foundation under one or more agreements.

@adamsitnikadamsitnik requested a review froma team as acode ownerNovember 27, 2025 16:37
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

Copilot code reviewCopilotCopilot left review comments

At least 1 approving review is required to merge this pull request.

Assignees

@adamsitnikadamsitnik

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Using IngestionPipeline for content not originating from the file system

1 participant

@adamsitnik

[8]ページ先頭

©2009-2025 Movatter.jp