- Notifications
You must be signed in to change notification settings - Fork845
Introduce Microsoft.Extensions.DataIngestion.Abstractions#6949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Pull Request Overview
This PR introduces the Microsoft.Extensions.DataIngestion.Abstractions library, implementing the APIs approved in the referenced GitHub issues. The library provides abstractions for processing documents from various formats into structured chunks suitable for data ingestion scenarios (e.g., RAG pipelines).
Key changes:
- Core document representation classes (
IngestionDocument,IngestionDocumentElementand its specialized types) - Processing pipeline abstractions (
IngestionDocumentReader,IngestionDocumentProcessor,IngestionChunker,IngestionChunkProcessor,IngestionChunkWriter) - Support classes (
IngestionChunk) for representing processed content chunks
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| Microsoft.Extensions.DataIngestion.Abstractions.csproj | Project file defining multi-targeting (including netstandard2.0) and conditional package references |
| IngestionDocument.cs | Core document container with section management and content enumeration |
| IngestionDocumentElement.cs | Base class and specialized element types (Section, Paragraph, Header, Footer, Table, Image) |
| IngestionDocumentReader.cs | Abstract reader with file/stream overloads and extensive media type mapping |
| IngestionDocumentProcessor.cs | Abstract processor for document transformation pipeline |
| IngestionChunk.cs | Generic chunk representation with metadata support and validation |
| IngestionChunker.cs | Abstract chunker for splitting documents into chunks |
| IngestionChunkProcessor.cs | Abstract processor for chunk transformation pipeline |
| IngestionChunkWriter.cs | Abstract writer with disposable pattern for chunk output |
| Microsoft.Extensions.DataIngestion.Tests.csproj | Test project configuration with analyzer suppressions |
| IngestionDocumentTests.cs | Unit tests for document enumeration and validation |
src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/IngestionDocumentElement.cs OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/IngestionChunk.csShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/IngestionChunk.cs OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/IngestionChunk.cs OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/IngestionDocument.csShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/IngestionDocumentElement.cs OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/IngestionDocumentReader.cs OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/IngestionDocumentReader.cs OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
...Extensions.DataIngestion.Abstractions/Microsoft.Extensions.DataIngestion.Abstractions.csprojShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
...Extensions.DataIngestion.Abstractions/Microsoft.Extensions.DataIngestion.Abstractions.csproj OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/IngestionDocumentTests.csShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/IngestionDocumentElement.csShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
...Extensions.DataIngestion.Abstractions/Microsoft.Extensions.DataIngestion.Abstractions.csprojShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
src/Libraries/Microsoft.Extensions.DataIngestion/Microsoft.Extensions.DataIngestion.csproj OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
db5d273 to7bbb1efCompare72f930a intodotnet:mainUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
The APIs got approved in#6893 (comment) and in#6895 (comment)
Microsoft Reviewers:Open in CodeFlow