NotificationsYou must be signed in to change notification settings
Fork1.1k
Star4.9k

v4; motivation and initial thoughts#951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Draft

mgravell wants to merge114 commits intomain

base:main

Choose a base branch

fromv4

Draft

v4; motivation and initial thoughts#951

mgravell wants to merge114 commits intomainfromv4

Conversation

Copy link

Member

mgravell commentedSep 6, 2022•
edited
Loading

This PR covers some initial exploration into v4

Key Motivations

improve AOT support
improve performance
support additional memory usage scenarios
smaller outputs

2 and 3 are most likely by way of a new reader/writer API with additional optimizations; 1 is most likely via new build tools which integrate with the outputs from 2 and 3

Improve AOT Support

Currently the core engine is focused on runtime reflection-based IL emit. The library conceptually supports AOT scenarios, including library separation of the core and reflection-based aspects, and attribute based annotation support for manually-written serializers, but none of the tools currently generate code-based serializers. We aim to provide both code-first and contract-first AOT scenarios, typically using Roslyn generators (either based on the discovered code model, or the .proto files parsed - the machinery ahead of these bits already exists).

Additionally:

runtime reflection-based emit is slow at the initial usage, requiring lots of additional system assemblies, lots of type discovery, and consideration of a complicated system, and the actual emit; this impacts cold-start performance, particularly relevant for serverless scenarios where the process is typically short-lived
runtime reflection discovery and emit is not well supported on all platforms, in particular impacting "unity" etc (also: IL2CPP doesn't support all IL scenarios, and isn't perfect in some of the relevant cases)
runtime reflection discovery and emit demands a wide graph of system assemblies; this impacts "pruning", meaning either we need to retain a lot of libraries, or it won't work properly; this impacts "blazor" in particular
runtime reflection discovery and emit is hard to debug, maintain, and extend; if we want to add radical new features (a new core reader/writer API, async, etc) it is prohibitively expensive to implement this in the existing design, and demands very niche skills (reducing the ability of people to contribute)

Improve performance

Profiling has shown that the existing API is sub-optimal; discovery work has been done ahead of this PR to investigate a "from first principles" re-imagining of the core reader/writer API. It is fundamentally not possible to achieve all of the aims here without a new API, although it may be possible to reuse the new API from without the older API as a wrapper layer.

These changes include:

reworking the data buffer to reduce all unnecessary optimizations
using CPU primitives where profiling shows it to be useful
using better generated serializer code to reduce operations
exploit framework features list list-span access

Support Additional Memory Usage Scenarios

Some models are inherently "allocatey"; consider, for example, a model with arepeated chunk of multiple sub-items, each of which has abytes payload, resulting in large numbers of smallbyte[] chunks. The idea here is to facilitate more efficient scenarios here; e.g. we could generateReadOnlyMemory<byte> instead ofbyte[], and allow multiple leaf levels to be slices of the same underlying oversized buffer. The existing PR explores this scenario. Note, however, that profiling is mixed on the outcome of this. We want toenable this, but as an option, allowing us to play with multiple options with real data.

Smaller Outputs

Right now the runtime library needs to contain chains for things itmight need - niche random code paths for obscure and esoteric models. Because this discovery is done via reflection, these edge-cases are largely not trimmable (in the AOT sense), because discovering whether they are reached are not is basically impossible. By moving to an AOT path, without all the reflection gunk, it isvery clear at build time what code is reached - thereis no reflection gunk. This means we don't need all the reflection dependencies, and we don't need all the dependencies for all the stuff that isn't used by the model. This saving can be significant.

Likely implementation

We need to consider code-first and contract-first separately here. Let's consider a simple scenario:

syntax="proto3";messageFoo {repeatedstringbar=1;}

Currently, this can be used to generate something akin to the same contract, as seen from a code-first perspective:

[ProtoContract]publicpartialclassFoo{[ProtoMember(1)]publicList<string>Bars{get;}=new();.}

What we want to achieve is that whether starting code-first or contract-first, we generate code that includes the actual serialization code, either at the same time as generating the code (contract-first), or in an additional partial-class (code-first). Typical output code is shown in the exploration work in the PR.

The key point here, though, is that code-first and contract-first start from completely different code models - contract-first (and the existing code-gen) starts from theFileDescriptorSet view, where-as code-first starts from a Roslyn view. The actual code-gen should not have to content with this, and we do not intend duplication, so: the proposal instead is to create a new source-agnostic API that the new code-gen tools should use, and populate the source-agnostic API from the specific scenarios.

For example, we could have:

class CodeGenerationModel  List<CodeGenerationFile> Filesclass CodeGenerationFile  string Name  List<CodeGenerationType> Typesclass CodeGenerationType  string Name, OriginalName // takes Name when null  string Namespace  ReadOnlyMemory<string> ParentTypes  List<CodeGenerationMember> Members  // flags and other helpers; is it an enum? value-type?  // what are we generating for this type? members? serializer?  // note: we expect inbuilt primitives to exist as CodeGenerationType,  // for example, maybe `static CodeGenerationType.String`class CodeGenerationMember  string Name, OriginalName // takes Name when null  string BackingMember  int FieldNumber  CodeGenerationType Type  // data format? wire-type?  // repeated? if so, what kind? other flags?

So here, we would generate the equivalent of

var model = new() {  Files = {    new() {        Name = "my.generated.cs",        Types = {          new() {            Name = "Foo",            Generate = /* serializer+members for contract-first; serializer for code-first */            Members = {              FieldNumber = 1,              Name = "Bars", OriginalName = "bar",              Type = CodeGenerationType.String,              MemberType = Repeated            }          }        }    }  }};

So; the initial work items:

define a rough skeleton model for the above new API
parse the Roslyn code-first model to populate the new model
parse theFileDescriptorSet contract-first model to populate the new model
emit new model+serializer code from the new model, against the new serializer API
implement the new serializer API

It isnot a goal of the current stage to emit code for theold serializer API from the new model; while that might be a nice feature in the future, it is not seen as solving an immediate need, and will only add support costs.

High level tasks

setup test skeleton
- parse .proto toFileDescriptorSet
- parse C# to Roslyn model
setup new working model
populate working model fromFileDescriptorSet
populate working model from Roslyn model
basic DTO output from working model
serializer output from working model
complete the reader/writer API

Test skeleton; somehow setup multi-input test (folder-based?) that takes a corpus of examples

mgravell added30 commits

May 30, 2022 11:34

nano

839a0b2

comments etc

3473280

clarify write API

8897ce6

benchmarks for Nano

048bdc0

nit

b5a2512

use collection size hint in nano benchmark list init; implement varin…

bfd219e

…t on down-level fx

test ref-counted slab vs simple slab

f2ff974

ignore unity test proj

f3b3be1

simple slab test

f1aae77

focus on deserialize tests; avoid the ref-count check (basically neve…

e5e10a7

…r useful)

profile GC uninitialized arrays

e5a66c4

tidy

c799f95

update numbers

e227547

construction approaches

1452f8b

playing

baf4c4d

Signed-off-by: Marc Gravell <marc.gravell@gmail.com>

wokring on alloc

a8d85eb

more hacking

a5dd269

tons more hacking on that damned slab

620a5f5

numbers

3e30520

add protobuf-net

d2b0f56

caveat geek

7f4431d

explore PEXT/TZCNT varint decode

6fc19ee

compare Unsafe.Add

ea5826c

add a version that uses 32-bit PEXT for small values, then 64-bit PEX…

7481b84

…T for larger values

words

c007aa8

results

0059499

one last stab

da4e0c6

add sample gRPC client/server with timings

09bca9e

encode ideas

d807221

encode numbers

1fd8612

mgravell added4 commits

September 28, 2022 18:49

generalize diagnostic reporting

04dc939

fix broken gen test

345df7e

more generalized diagnostics

f600ce3

fix broken tests

cb10404

charlicopter mentioned this pull request

Jan 13, 2023

What is the status of V3 for AOT environments?#997

Closed

mgravell added3 commits

February 25, 2023 18:29

Merge branch 'main' into v4

264be1d

# Conflicts:#protobuf-net.sln#src/Benchmark/Benchmark.csproj#src/BenchmarkBaseline/BenchmarkBaseline.csproj#src/BuildToolsUnitTests/BuildToolsUnitTests.csproj#src/Directory.Build.props#src/Examples/Examples.csproj#src/LongDataTests/LongDataTests.csproj#src/NativeGoogleTests/NativeGoogleTests.csproj#src/protobuf-net.AspNetCore/protobuf-net.AspNetCore.csproj#src/protobuf-net.BuildTools.Legacy/protobuf-net.BuildTools.Legacy.csproj#src/protobuf-net.BuildTools/protobuf-net.BuildTools.csproj#src/protobuf-net.Core/protobuf-net.Core.csproj#src/protobuf-net.FSharp.Test/protobuf-net.FSharp.Test.fsproj#src/protobuf-net.FSharp/protobuf-net.FSharp.csproj#src/protobuf-net.MSBuild.Test/protobuf-net.MSBuild.Test.csproj#src/protobuf-net.MSBuild/protobuf-net.MSBuild.csproj#src/protobuf-net.MessagePipes/protobuf-net.MessagePipes.csproj#src/protobuf-net.NodaTime/protobuf-net.NodaTime.csproj#src/protobuf-net.Protogen/protobuf-net.Protogen.csproj#src/protobuf-net.Reflection.Test/protobuf-net.Reflection.Test.csproj#src/protobuf-net.ServiceModel/protobuf-net.ServiceModel.csproj#src/protobuf-net.Test/protobuf-net.Test.csproj#src/protobuf-net/protobuf-net.csproj#src/protogen.site/protogen.site.csproj#src/protogen/protogen.csproj

make intellisense less unhappy

de6a0b3

Merge branch 'main' into v4

2e849b6

Copy link

listepo commentedAug 28, 2023

Hey@mgravell thanks for your work, is there any news about it?

Copy link

Dona278 commentedFeb 23, 2024

Hi@mgravell , I know that you have a lot of work + family + combat criminals at night but I think this is the best protobuf library for dotnet, and Microsoft since net8 pushes a lot on performance + trimming + AOT + source generator, so I wanna ask:

After years, there is any eta for this work?
There is any chance to get help from microsoft to support this project as already did with Grpc.AspNetCore?

Anyway thank you for your work!

Copy link

MemberAuthor

mgravell commentedFeb 23, 2024

Hi; no hard ETA, but definitely still in progress; I'm very aware of the AOT work, and the hope is for the Dapper.AOT learnings to lead into the protobuf-net work; there exists an AOT branch for the analyzer pieces, but I think a lot of it will need some significant rework, but: I'm also a little distracted by Google's recent discussion of "edition 2024", and the "group" changes, which I also want to integrate (parser now works, so... yay!). This is relevant because the "editions" work and the "AOT" work need to interact, so understanding both pieces at the same time is essential.

As for MSFT time: my MSFT time is focused on cache work at the moment, but: let's see how it goes a little later in the year,

Copy link

michaldobrodenka commentedFeb 23, 2024

About AOT, it seems, that AssemblyBuilder.Save will work in .NET 9. I know generating c# code is better solution, but would this be supported? Generating serializer assemblies for AOT in some "model.csproj" after build step?

Copy link

MemberAuthor

mgravell commentedFeb 23, 2024

@michaldobrodenka if AssemblyBuilder.Save starts working, I'll happily light up that API, and if that unblocks some scenarios: great! However, that will be unrelated to and tangential to the intended AOT route, which I hope to be codegen based

Copy link

tuga001-sme commentedOct 28, 2024

Any news?

PanzerFowst mentioned this pull request

Apr 15, 2025

Working with AOT - .NET 7#1025

Closed

Copy link

PanzerFowst commentedApr 15, 2025

First off, thank you for your work! It is great!

I know this is not a rushed change (family, day job, etc.), but I was curious what could be done to help this PR along? Are there API improvements of code generators in .NET 9 that can be taken advantage of now?

Copy link

MemberAuthor

mgravell commentedApr 15, 2025

The APIs haven't changed hugely (I don't think interceptors give us much); but I do need to revisit this from the ground up, using our learnings here as a foundation - the object model needs a lot of rework based on my learnings from Roslyn incremental generators over the last few years; the approach here is naive. Doable: yes. But it needs dedicated time.

Copy link

PanzerFowst commentedApr 15, 2025

Thanks for getting back so quickly, Marc!

Ah, I see. So then would there be an issue / milestone with TODOs etc. to give a roadmap of what needs to be done so that we could help contribute where able?

Copy link

michaldobrodenka commentedApr 16, 2025•
edited
Loading

I started to play with generators and created a demo for protobuf generated serializers/deserializers from protobuf-net attributes.

https://github.com/michaldobrodenka/GProtobuf

It's far from usable, only deserialization is supported with only handful of types. Not tested/used. Just a proof of concept. Maybe will return to it sometimes. But when it's working, deserialization is crazy fast.

Copy link

PanzerFowst commentedApr 16, 2025

That's neat,@michaldobrodenka!

I am working on converting some code to be NativeAoT compliant and unfortunately haven't found a way to keep the NativeAoT runtime from trimming awayprotobuf-net. The only thing I have found so far is to useGoogle.Protobuf and manually create a.proto file for my DTOs, and it just ends up really messy...

But it did give me the idea (I haven't looked too deeply at this repo to see how feasible it is)--what if the[ProtoContract] and[ProtoMember(n)] attributes could create the.proto files automatically and and add the<Protobuf Include="car.proto" /> to the .csproj to generate theGoogle.Protobuf code that can then be used to automagically accomplish the same behavior in a NativeAoT context?

I am sure there are reasons that this wouldn't work, but with .NET 9 giving full NativeAoT support for iOS, I am seeing a lot of movement towards NativeAoT to get off of MonoAoT.

Copy link

MemberAuthor

mgravell commentedApr 16, 2025

Eesh, I should just dust this off and ship something, even if it is incomplete. My plans are wider than my calendar, it seems.

Copy link

KybernetikGames commentedApr 26, 2025

Is there any chance v4 could bring back support forAsReference that was in v2 which allowed a full object graph to be serialized with multiple fields referencing the same object?

I'm trying to find a good serializer for Unity and ProtoBuf v2 is the only one I've found which meets all my needs except that I can't seem to use it in Android builds due to IL2CPP requiring AOT compilation so it would be a huge shame to find a solution to that problem only to lose such a useful feature.

Copy link

Dona278 commentedApr 26, 2025

@KybernetikGames did you looked atcysharp repos? They develop games with Unity and they are the creators of R3 (observables) and [Message/Memory]Pack (serializers) both developed in the way to be compatible with Unity.

Copy link

michaldobrodenka commentedApr 26, 2025•
edited
Loading

Is there any chance v4 could bring back support forAsReference that was in v2 which allowed a full object graph to be serialized with multiple fields referencing the same object?
I'm trying to find a good serializer for Unity and ProtoBuf v2 is the only one I've found which meets all my needs except that I can't seem to use it in Android builds due to IL2CPP requiring AOT compilation so it would be a huge shame to find a solution to that problem only to lose such a useful feature.

If you need solution now, you can check my protobuf-net 2 fork - with precompile you can prepare serializer in post build step as a dll. I'm using it in production. And you don't need old net framework. It works with net6+https://github.com/michaldobrodenka/protobuf-net

Copy link

KybernetikGames commentedApr 26, 2025

@Dona278 I briefly tried MessagePack and MemoryPack but ran into issues with each of them (here andhere) which would have required me to refactor quite a bit of my code base. ProtoBuf v2 seemed like a silver bullet which handled everything I need to do with it right up until I tried to use it in a runtime build. But if I can't get it going then I'll definitely be revisiting the cysharp systems.

@michaldobrodenka I found your repo earlier today and have been trying to get it to work in Unity with no success so far and there's no Issues page so I wasn't sure how to contact you. Do you have a preferred contact method?

Copy link

michaldobrodenka commentedApr 26, 2025

@KybernetikGames have you checkedaot-net6 branch?
I have added issues to this project, but I don't plan to maintain this project much further; I'm just using it until I find a replacement. It works on all my projects and I'm looking for more modern solution - using Span and code generated. Something like myGProtobuf which is only a proof of concept now.

Copy link

MemberAuthor

mgravell commentedApr 26, 2025

I genuinely do have plans to revisit the AOT work. I just need the world to switch to a 36 hour day so I have enough hours in each...

Copy link

PanzerFowst commentedMay 6, 2025

Well, I just wanted to ask if you maybe had an outline of the work (that you know of so far) that needed to be done so that anyone who has the time and could contribute would (I have been looking intoIIncremementalGenerator and experimenting) be able to help?

I know I am certainly interested in contributing.

EricGarnier mentioned this pull request

Jun 17, 2025

Support for AOTJKorf/CryptoClients.Net#9

Closed

Copy link

listepo commentedNov 2, 2025

@mgravell I understand you very well, but are there any deadlines?

Labels

None yet

9 participants

Movatterモバイル変換

Uh oh!

v4; motivation and initial thoughts#951

Are you sure you want to change the base?

v4; motivation and initial thoughts#951

Uh oh!

Conversation

mgravell commentedSep 6, 2022• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Key Motivations

Improve AOT Support

Improve performance

Support Additional Memory Usage Scenarios

Smaller Outputs

Likely implementation

High level tasks

Uh oh!

listepo commentedAug 28, 2023

Uh oh!

Dona278 commentedFeb 23, 2024

Uh oh!

mgravell commentedFeb 23, 2024

Uh oh!

michaldobrodenka commentedFeb 23, 2024

Uh oh!

mgravell commentedFeb 23, 2024

Uh oh!

tuga001-sme commentedOct 28, 2024

Uh oh!

PanzerFowst commentedApr 15, 2025

Uh oh!

mgravell commentedApr 15, 2025

Uh oh!

PanzerFowst commentedApr 15, 2025

Uh oh!

michaldobrodenka commentedApr 16, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

PanzerFowst commentedApr 16, 2025

Uh oh!

mgravell commentedApr 16, 2025

Uh oh!

KybernetikGames commentedApr 26, 2025

Uh oh!

Dona278 commentedApr 26, 2025

Uh oh!

michaldobrodenka commentedApr 26, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

KybernetikGames commentedApr 26, 2025

Uh oh!

michaldobrodenka commentedApr 26, 2025

Uh oh!

mgravell commentedApr 26, 2025

Uh oh!

PanzerFowst commentedMay 6, 2025

Uh oh!

listepo commentedNov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

mgravell commentedSep 6, 2022•
edited
Loading

michaldobrodenka commentedApr 16, 2025•
edited
Loading

michaldobrodenka commentedApr 26, 2025•
edited
Loading