Skip to content

hook a visitor into protogen to implement schema-aware payload parsing #919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

mgravell
Copy link
Member

@mgravell mgravell commented Jun 9, 2022

Scenarios:

  1. enable usage similar to protoc --decode to walk a payload+schema and understand the contents
  2. enable weak object-model usage, i.e. payload+schema to dynamic or similar, ideally in a way that is usable from other common serializers, for example as a transcoder
  3. act as a backbone for any other schema+payload scenarios without the implementor needing to do all the thinking

Currently this is implemented via:

  • DecodeVisitor - has the logic to iterate over a payload while tracking a schema
  • TextDecodeVisitor : DecodeVisitor - provides a protoc --decode style raw text dump of the interpreted payload contents
  • ObjectDecodeVisitor : DecodeVisitor - provides an ExpandoObject-based object-model from the interpreted payload

Work items:

  • basic command-line hooks into protogen (global tool)
  • implement base visitor that processes fields via a parser
  • implement text visitor (i.e. stdout) for output
  • implement object-model visitor
  • documentation (at a minimum, for -h and ///; a new page under /docs would be good too, though)
  • basic unit test
  • comprehensive unit tests
  • validation of command-line usage (rather than library usage)
  • comparison to protoc output
  • well known types (.google.protobuf.Duration, .google.protobuf.Timestamp, for example)
  • types with special expected formatting (.google.protobuf.Timestamp, for example) (note: this might not apply if we get the well-known-types support to map to regular .NET types that already have the correct handling)
    • in particular, struct.proto and wrappers.proto
  • how should enums be represented in ObjectDecodeVisitor? currently uses int, but: is that right? should it be an option on the object, i.e. new ObjectDecodeVisitor { Enums = EnumMode.Name } ? (which would presumably use the name when possible, and the integer as fallback) - Google lib writes names, reads either names or integers
  • support for packed primitive values (note this is based on reader.WireType being String for a primitive value (i.e. not string, bytes, group or message) and not on anything related to field)
  • consideration of extension fields; i.e. should we try to discover known extended field definitions, and present them sensibly?
  • what output (if any) for unknown fields?
  • need to detect and implement map scenarios
  • ObjectDecodeVisitor: default values and presence tracking; i.e. in proto3 a non-presence-tracked field with value zero/false/etc is not transmitted, but that is still the expected value (if it is presence-tracked, then it is transmitted if assigned)
  • ObjectDecodeVisitor: option to use json_name rather than name (when json_name specified)
  • naming; are the names correct? ObjectDecodeVisitor seems... weird

@SamiSadfa
Copy link

Can't understand quite clearly what does "validation of command-line usage (rather than library usage)" means. Is it for input validation when a user tries to do protogen.exe --decode command? Why rather than library usage?

@mgravell
Copy link
Member Author

@SamiSadfa I simply mean: I haven't tested that the command-line scenario works at all, let alone that it does the correct thing. The main logic is in the library and is validated via unit tests, but the console exe needs validation too. In particular, that protogen --decode {something} gives similar behaviour and output to protoc --decode {something}

@SamiSadfa
Copy link

SamiSadfa commented Jun 15, 2022

@SamiSadfa I simply mean: I haven't tested that the command-line scenario works at all, let alone that it does the correct thing. The main logic is in the library and is validated via unit tests, but the console exe needs validation too. In particular, that protogen --decode {something} gives similar behaviour and output to protoc --decode {something}

protogen.exe output:

image

protoc.exe output:
image

It seems like it's working

@SamiSadfa
Copy link

"support for packed primitive values (note this is based on reader.WireType being String for a primitive value (i.e. not string, bytes, group or message) and not on anything related to field)"

I am not sure where the support for packed primitives can be implemented. Also, is there a correlation between implementing packed primitives support and repeated fields? I am not really versed in Protobuf packed primitives. Can you please point me to some documentation/code I need to check through to implement this support?

@SamiSadfa
Copy link

"what output (if any) for unknown fields?"
In a perfect scenario, this should not happen. In case of corrupted data (the bytes), or when a user tries to decode a payload with the wrong schema using protogen.exe, we can just skip it or raise an exception. What do you think about that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants