v4; motivation and initial thoughts #951

mgravell · 2022-09-06T11:37:01Z

This PR covers some initial exploration into v4

Key Motivations

improve AOT support
improve performance
support additional memory usage scenarios
smaller outputs

2 and 3 are most likely by way of a new reader/writer API with additional optimizations; 1 is most likely via new build tools which integrate with the outputs from 2 and 3

Improve AOT Support

Currently the core engine is focused on runtime reflection-based IL emit. The library conceptually supports AOT scenarios, including library separation of the core and reflection-based aspects, and attribute based annotation support for manually-written serializers, but none of the tools currently generate code-based serializers. We aim to provide both code-first and contract-first AOT scenarios, typically using Roslyn generators (either based on the discovered code model, or the .proto files parsed - the machinery ahead of these bits already exists).

Additionally:

runtime reflection-based emit is slow at the initial usage, requiring lots of additional system assemblies, lots of type discovery, and consideration of a complicated system, and the actual emit; this impacts cold-start performance, particularly relevant for serverless scenarios where the process is typically short-lived
runtime reflection discovery and emit is not well supported on all platforms, in particular impacting "unity" etc (also: IL2CPP doesn't support all IL scenarios, and isn't perfect in some of the relevant cases)
runtime reflection discovery and emit demands a wide graph of system assemblies; this impacts "pruning", meaning either we need to retain a lot of libraries, or it won't work properly; this impacts "blazor" in particular
runtime reflection discovery and emit is hard to debug, maintain, and extend; if we want to add radical new features (a new core reader/writer API, async, etc) it is prohibitively expensive to implement this in the existing design, and demands very niche skills (reducing the ability of people to contribute)

Improve performance

Profiling has shown that the existing API is sub-optimal; discovery work has been done ahead of this PR to investigate a "from first principles" re-imagining of the core reader/writer API. It is fundamentally not possible to achieve all of the aims here without a new API, although it may be possible to reuse the new API from without the older API as a wrapper layer.

These changes include:

reworking the data buffer to reduce all unnecessary optimizations
using CPU primitives where profiling shows it to be useful
using better generated serializer code to reduce operations
exploit framework features list list-span access

Support Additional Memory Usage Scenarios

Some models are inherently "allocatey"; consider, for example, a model with a repeated chunk of multiple sub-items, each of which has a bytes payload, resulting in large numbers of small byte[] chunks. The idea here is to facilitate more efficient scenarios here; e.g. we could generate ReadOnlyMemory<byte> instead of byte[], and allow multiple leaf levels to be slices of the same underlying oversized buffer. The existing PR explores this scenario. Note, however, that profiling is mixed on the outcome of this. We want to enable this, but as an option, allowing us to play with multiple options with real data.

Smaller Outputs

Right now the runtime library needs to contain chains for things it might need - niche random code paths for obscure and esoteric models. Because this discovery is done via reflection, these edge-cases are largely not trimmable (in the AOT sense), because discovering whether they are reached are not is basically impossible. By moving to an AOT path, without all the reflection gunk, it is very clear at build time what code is reached - there is no reflection gunk. This means we don't need all the reflection dependencies, and we don't need all the dependencies for all the stuff that isn't used by the model. This saving can be significant.

Likely implementation

We need to consider code-first and contract-first separately here. Let's consider a simple scenario:

syntax = "proto3";

message Foo {
    repeated string bar = 1;
}

Currently, this can be used to generate something akin to the same contract, as seen from a code-first perspective:

[ProtoContract]
public partial class Foo
{
    [ProtoMember(1)]
    public List<string> Bars { get; } = new();.
}

What we want to achieve is that whether starting code-first or contract-first, we generate code that includes the actual serialization code, either at the same time as generating the code (contract-first), or in an additional partial-class (code-first). Typical output code is shown in the exploration work in the PR.

The key point here, though, is that code-first and contract-first start from completely different code models - contract-first (and the existing code-gen) starts from the FileDescriptorSet view, where-as code-first starts from a Roslyn view. The actual code-gen should not have to content with this, and we do not intend duplication, so: the proposal instead is to create a new source-agnostic API that the new code-gen tools should use, and populate the source-agnostic API from the specific scenarios.

For example, we could have:

class CodeGenerationModel
  List<CodeGenerationFile> Files

class CodeGenerationFile
  string Name
  List<CodeGenerationType> Types


class CodeGenerationType
  string Name, OriginalName // takes Name when null
  string Namespace
  ReadOnlyMemory<string> ParentTypes

  List<CodeGenerationMember> Members
  // flags and other helpers; is it an enum? value-type?
  // what are we generating for this type? members? serializer?

  // note: we expect inbuilt primitives to exist as CodeGenerationType,
  // for example, maybe `static CodeGenerationType.String`

class CodeGenerationMember
  string Name, OriginalName // takes Name when null
  string BackingMember
  int FieldNumber
  CodeGenerationType Type
  // data format? wire-type?
  // repeated? if so, what kind? other flags?

So here, we would generate the equivalent of

var model = new() {
  Files = {
    new() {
        Name = "my.generated.cs",
        Types = {
          new() {
            Name = "Foo",
            Generate = /* serializer+members for contract-first; serializer for code-first */
            Members = {
              FieldNumber = 1,
              Name = "Bars", OriginalName = "bar",
              Type = CodeGenerationType.String,
              MemberType = Repeated
            }
          }
        }
    }
  }
};

So; the initial work items:

define a rough skeleton model for the above new API
parse the Roslyn code-first model to populate the new model
parse the FileDescriptorSet contract-first model to populate the new model
emit new model+serializer code from the new model, against the new serializer API
implement the new serializer API

It is not a goal of the current stage to emit code for the old serializer API from the new model; while that might be a nice feature in the future, it is not seen as solving an immediate need, and will only add support costs.

High level tasks

setup test skeleton
- parse .proto to FileDescriptorSet
- parse C# to Roslyn model
setup new working model
populate working model from FileDescriptorSet
populate working model from Roslyn model
basic DTO output from working model
serializer output from working model
complete the reader/writer API

Test skeleton; somehow setup multi-input test (folder-based?) that takes a corpus of examples

…t on down-level fx

…r useful)

Signed-off-by: Marc Gravell <marc.gravell@gmail.com>

…T for larger values

# Conflicts: # protobuf-net.sln # src/Benchmark/Benchmark.csproj # src/BenchmarkBaseline/BenchmarkBaseline.csproj # src/BuildToolsUnitTests/BuildToolsUnitTests.csproj # src/Directory.Build.props # src/Examples/Examples.csproj # src/LongDataTests/LongDataTests.csproj # src/NativeGoogleTests/NativeGoogleTests.csproj # src/protobuf-net.AspNetCore/protobuf-net.AspNetCore.csproj # src/protobuf-net.BuildTools.Legacy/protobuf-net.BuildTools.Legacy.csproj # src/protobuf-net.BuildTools/protobuf-net.BuildTools.csproj # src/protobuf-net.Core/protobuf-net.Core.csproj # src/protobuf-net.FSharp.Test/protobuf-net.FSharp.Test.fsproj # src/protobuf-net.FSharp/protobuf-net.FSharp.csproj # src/protobuf-net.MSBuild.Test/protobuf-net.MSBuild.Test.csproj # src/protobuf-net.MSBuild/protobuf-net.MSBuild.csproj # src/protobuf-net.MessagePipes/protobuf-net.MessagePipes.csproj # src/protobuf-net.NodaTime/protobuf-net.NodaTime.csproj # src/protobuf-net.Protogen/protobuf-net.Protogen.csproj # src/protobuf-net.Reflection.Test/protobuf-net.Reflection.Test.csproj # src/protobuf-net.ServiceModel/protobuf-net.ServiceModel.csproj # src/protobuf-net.Test/protobuf-net.Test.csproj # src/protobuf-net/protobuf-net.csproj # src/protogen.site/protogen.site.csproj # src/protogen/protogen.csproj

listepo · 2023-08-28T11:11:09Z

Hey @mgravell thanks for your work, is there any news about it?

Dona278 · 2024-02-23T14:02:59Z

Hi @mgravell , I know that you have a lot of work + family + combat criminals at night but I think this is the best protobuf library for dotnet, and Microsoft since net8 pushes a lot on performance + trimming + AOT + source generator, so I wanna ask:

After years, there is any eta for this work?
There is any chance to get help from microsoft to support this project as already did with Grpc.AspNetCore?

Anyway thank you for your work!

mgravell · 2024-02-23T14:14:17Z

Hi; no hard ETA, but definitely still in progress; I'm very aware of the AOT work, and the hope is for the Dapper.AOT learnings to lead into the protobuf-net work; there exists an AOT branch for the analyzer pieces, but I think a lot of it will need some significant rework, but: I'm also a little distracted by Google's recent discussion of "edition 2024", and the "group" changes, which I also want to integrate (parser now works, so... yay!). This is relevant because the "editions" work and the "AOT" work need to interact, so understanding both pieces at the same time is essential.

As for MSFT time: my MSFT time is focused on cache work at the moment, but: let's see how it goes a little later in the year,

michaldobrodenka · 2024-02-23T17:20:34Z

About AOT, it seems, that AssemblyBuilder.Save will work in .NET 9. I know generating c# code is better solution, but would this be supported? Generating serializer assemblies for AOT in some "model.csproj" after build step?

mgravell · 2024-02-23T18:53:13Z

@michaldobrodenka if AssemblyBuilder.Save starts working, I'll happily light up that API, and if that unblocks some scenarios: great! However, that will be unrelated to and tangential to the intended AOT route, which I hope to be codegen based

tuga001-sme · 2024-10-28T11:46:38Z

Any news?

PanzerFowst · 2025-04-15T01:04:32Z

First off, thank you for your work! It is great!

I know this is not a rushed change (family, day job, etc.), but I was curious what could be done to help this PR along? Are there API improvements of code generators in .NET 9 that can be taken advantage of now?

mgravell · 2025-04-15T10:26:01Z

The APIs haven't changed hugely (I don't think interceptors give us much); but I do need to revisit this from the ground up, using our learnings here as a foundation - the object model needs a lot of rework based on my learnings from Roslyn incremental generators over the last few years; the approach here is naive. Doable: yes. But it needs dedicated time.

PanzerFowst · 2025-04-15T13:00:38Z

Thanks for getting back so quickly, Marc!

Ah, I see. So then would there be an issue / milestone with TODOs etc. to give a roadmap of what needs to be done so that we could help contribute where able?

michaldobrodenka · 2025-04-16T10:30:52Z

I started to play with generators and created a demo for protobuf generated serializers/deserializers from protobuf-net attributes.

https://github.com/michaldobrodenka/GProtobuf

It's far from usable, only deserialization is supported with only handful of types. Not tested/used. Just a proof of concept. Maybe will return to it sometimes. But when it's working, deserialization is crazy fast.

PanzerFowst · 2025-04-16T17:05:59Z

That's neat, @michaldobrodenka!

I am working on converting some code to be NativeAoT compliant and unfortunately haven't found a way to keep the NativeAoT runtime from trimming away protobuf-net. The only thing I have found so far is to use Google.Protobuf and manually create a .proto file for my DTOs, and it just ends up really messy...

But it did give me the idea (I haven't looked too deeply at this repo to see how feasible it is)--what if the [ProtoContract] and [ProtoMember(n)] attributes could create the .proto files automatically and and add the <Protobuf Include="car.proto" /> to the .csproj to generate the Google.Protobuf code that can then be used to automagically accomplish the same behavior in a NativeAoT context?

I am sure there are reasons that this wouldn't work, but with .NET 9 giving full NativeAoT support for iOS, I am seeing a lot of movement towards NativeAoT to get off of MonoAoT.

mgravell · 2025-04-16T19:23:24Z

Eesh, I should just dust this off and ship something, even if it is incomplete. My plans are wider than my calendar, it seems.

KybernetikGames · 2025-04-26T07:11:04Z

Is there any chance v4 could bring back support for AsReference that was in v2 which allowed a full object graph to be serialized with multiple fields referencing the same object?

I'm trying to find a good serializer for Unity and ProtoBuf v2 is the only one I've found which meets all my needs except that I can't seem to use it in Android builds due to IL2CPP requiring AOT compilation so it would be a huge shame to find a solution to that problem only to lose such a useful feature.

Dona278 · 2025-04-26T07:52:44Z

@KybernetikGames did you looked at cysharp repos? They develop games with Unity and they are the creators of R3 (observables) and [Message/Memory]Pack (serializers) both developed in the way to be compatible with Unity.

michaldobrodenka · 2025-04-26T08:00:52Z

Is there any chance v4 could bring back support for AsReference that was in v2 which allowed a full object graph to be serialized with multiple fields referencing the same object?

I'm trying to find a good serializer for Unity and ProtoBuf v2 is the only one I've found which meets all my needs except that I can't seem to use it in Android builds due to IL2CPP requiring AOT compilation so it would be a huge shame to find a solution to that problem only to lose such a useful feature.

If you need solution now, you can check my protobuf-net 2 fork - with precompile you can prepare serializer in post build step as a dll. I'm using it in production. And you don't need old net framework. It works with net6+ https://github.com/michaldobrodenka/protobuf-net

KybernetikGames · 2025-04-26T08:15:21Z

@Dona278 I briefly tried MessagePack and MemoryPack but ran into issues with each of them (here and here) which would have required me to refactor quite a bit of my code base. ProtoBuf v2 seemed like a silver bullet which handled everything I need to do with it right up until I tried to use it in a runtime build. But if I can't get it going then I'll definitely be revisiting the cysharp systems.

@michaldobrodenka I found your repo earlier today and have been trying to get it to work in Unity with no success so far and there's no Issues page so I wasn't sure how to contact you. Do you have a preferred contact method?

michaldobrodenka · 2025-04-26T08:33:41Z

@KybernetikGames have you checked aot-net6 branch?
I have added issues to this project, but I don't plan to maintain this project much further; I'm just using it until I find a replacement. It works on all my projects and I'm looking for more modern solution - using Span and code generated. Something like my GProtobuf which is only a proof of concept now.

mgravell · 2025-04-26T10:01:48Z

I genuinely do have plans to revisit the AOT work. I just need the world to switch to a 36 hour day so I have enough hours in each...

PanzerFowst · 2025-05-06T16:57:18Z

Well, I just wanted to ask if you maybe had an outline of the work (that you know of so far) that needed to be done so that anyone who has the time and could contribute would (I have been looking into IIncremementalGenerator and experimenting) be able to help?

I know I am certainly interested in contributing.

mgravell added 30 commits May 30, 2022 11:34

nano

839a0b2

comments etc

3473280

clarify write API

8897ce6

benchmarks for Nano

048bdc0

nit

b5a2512

use collection size hint in nano benchmark list init; implement varin…

bfd219e

…t on down-level fx

test ref-counted slab vs simple slab

f2ff974

ignore unity test proj

f3b3be1

simple slab test

f1aae77

focus on deserialize tests; avoid the ref-count check (basically neve…

e5e10a7

…r useful)

profile GC uninitialized arrays

e5a66c4

tidy

c799f95

update numbers

e227547

construction approaches

1452f8b

playing

baf4c4d

Signed-off-by: Marc Gravell <marc.gravell@gmail.com>

wokring on alloc

a8d85eb

more hacking

a5dd269

tons more hacking on that damned slab

620a5f5

numbers

3e30520

add protobuf-net

d2b0f56

caveat geek

7f4431d

explore PEXT/TZCNT varint decode

6fc19ee

compare Unsafe.Add

ea5826c

add a version that uses 32-bit PEXT for small values, then 64-bit PEX…

7481b84

…T for larger values

words

c007aa8

results

0059499

one last stab

da4e0c6

add sample gRPC client/server with timings

09bca9e

encode ideas

d807221

encode numbers

1fd8612

mgravell added 5 commits September 28, 2022 17:21

avoid problems from parsing the same symbol from different trees

dc85cd9

generalize diagnostic reporting

04dc939

fix broken gen test

345df7e

more generalized diagnostics

f600ce3

fix broken tests

cb10404

charlicopter mentioned this pull request Jan 13, 2023

What is the status of V3 for AOT environments? #997

Closed

mgravell added 3 commits February 25, 2023 18:29

make intellisense less unhappy

de6a0b3

Merge branch 'main' into v4

2e849b6

PanzerFowst mentioned this pull request Apr 15, 2025

Working with AOT - .NET 7 #1025

Closed

EricGarnier mentioned this pull request Jun 17, 2025

Support for AOT JKorf/CryptoClients.Net#9

Closed

Uh oh!

v4; motivation and initial thoughts #951

Are you sure you want to change the base?

v4; motivation and initial thoughts #951

Uh oh!

Conversation

mgravell commented Sep 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Motivations

Improve AOT Support

Improve performance

Support Additional Memory Usage Scenarios

Smaller Outputs

Likely implementation

High level tasks

Uh oh!

listepo commented Aug 28, 2023

Uh oh!

Dona278 commented Feb 23, 2024

Uh oh!

mgravell commented Feb 23, 2024

Uh oh!

michaldobrodenka commented Feb 23, 2024

Uh oh!

mgravell commented Feb 23, 2024

Uh oh!

tuga001-sme commented Oct 28, 2024

Uh oh!

PanzerFowst commented Apr 15, 2025

Uh oh!

mgravell commented Apr 15, 2025

Uh oh!

PanzerFowst commented Apr 15, 2025

Uh oh!

michaldobrodenka commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PanzerFowst commented Apr 16, 2025

Uh oh!

mgravell commented Apr 16, 2025

Uh oh!

KybernetikGames commented Apr 26, 2025

Uh oh!

Dona278 commented Apr 26, 2025

Uh oh!

michaldobrodenka commented Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KybernetikGames commented Apr 26, 2025

Uh oh!

michaldobrodenka commented Apr 26, 2025

Uh oh!

mgravell commented Apr 26, 2025

Uh oh!

PanzerFowst commented May 6, 2025

Uh oh!

Uh oh!

mgravell commented Sep 6, 2022 •

edited

Loading

michaldobrodenka commented Apr 16, 2025 •

edited

Loading

michaldobrodenka commented Apr 26, 2025 •

edited

Loading