- Notifications
You must be signed in to change notification settings - Fork2
A reimagined scala-pickling in the Scala 3 world
License
jsuereth/sauerkraut
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The library for those cabbage lovers out there who wantto send data over the wire.
A revitalization ofPickling in theScala 3 world.
When defining over-the-wire messages, do this:
importsauerkraut.core.{Buildable,Writer,given}caseclassMyMessage(field:String,data:Int)derivesBuildable,Writer
Then, when you need to serialize, pick a format and go:
importformat.json.{Json,given}importsauerkraut.{pickle,read,write}valout=StringWriter()pickle(Json).to(out).write(MyMessage("test",1))println(out.toString())valmsg= pickle(Json).from(out.toString()).read[MyMessage]
Here's a feature matrix for each format:
Format | Reader | Writer | All Types | Evolution Friendly | Notes |
---|---|---|---|---|---|
Json | Yes | Yes | Yes | Yes | Uses Jawn for parsing |
Protos | Yes | Yes | Yes | Yes | Binary format evolution friendly format |
NBT | Yes | Yes | Yes | For the kids. | |
XML | Yes | Yes | Yes | Inefficient prototype. | |
Pretty | No | Yes | No | For pretty-printing strings |
SeeCompliance for more details on what this means.
Everyone's favorite non-YAML web data transfer format! This uses Jawn under the covers for parsing, butcan write Json without any dependencies.
Example:
importsauerkraut.{pickle,read,write}importsauerkraut.core.{Buildable,Writer,given}importsauerkraut.format.json.JsoncaseclassMyWebData(value:Int,someStuff:Array[String])derivesBuildable,Writerdefread(in: java.io.InputStream):MyWebData= pickle(Json).from(in).read[MyWebData]defwrite(out: java.io.OutputStream):Unit= pickle(Json).to(out).write(MyWebData(1214,Array("this","is","a","test")))
sbt build:
libraryDependencies+="com.jsuereth.sauerkraut"%%"json"%"<version>"
Seejson project for more information.
A new encoding for protocol buffers within Scala! This supports a subset of all possible protocol buffer messagesbut allows full definition of the message format within your Scala code.
Example:
importsauerkraut.{pickle,write,read,Field}importsauerkraut.core.{Writer,Buildable,given}importsauerkraut.format.pb.{Proto,,given}caseclassMyMessageData(value:Int@Field(3),someStuff:Array[String]@Field(2))derivesWriter,Buildabledefwrite(out: java.io.OutputStream):Unit= pickle(Proto).to(out).write(MyMessageData(1214,Array("this","is","a","test")))
This example serializes to the equivalent of the following protocol buffer message:
messageMyMessageData {int32value=3;repeatedstringsomeStuff=2;}
sbt build:
libraryDependencies+="com.jsuereth.sauerkraut"%%"pb"%"<version>"
Seepb project for more information.
Named-Binary-Tags, a format popularized by Minecraft.
Example:
importsauerkraut.{pickle,read,write}importsauerkraut.core.{Buildable,Writer,given}importsauerkraut.format.nbt.NbtcaseclassMyGameData(value:Int,someStuff:Array[String])derivesBuildable,Writerdefread(in: java.io.InputStream):MyGameData= pickle(Nbt).from(in).read[MyGameData]defwrite(out: java.io.OutputStream):Unit= pickle(Nbt).to(out).write(MyGameData(1214,Array("this","is","a","test")))
sbt build:
libraryDependencies+="com.jsuereth.sauerkraut"%%"nbt"%"<version>"
Seenbt project for more information.
Everyone's favorite markup language for data transfer!
Example:
importsauerkraut.{pickle,read,write}importsauerkraut.core.{Buildable,Writer,given}importsauerkraut.format.xml.{Xml,given}caseclassMySlowWebData(value:Int,someStuff:Array[String])derivesBuildable,Writerdefread(in: java.io.InputStream):MySlowWebData= pickle(Xml).from(in).read[MySlowWebData]defwrite(out: java.io.Writer):Unit= pickle(Xml).to(out).write(MySlowWebData(1214,Array("this","is","a","test")))
sbt build:
libraryDependencies+="com.jsuereth.sauerkraut"%%"xml"%"<version>"
Seexml project for more information.
A format that is solely used to pretty-print object contents to strings. This does not havea [PickleReader] only a [PickleWriter].
Example:
importsauerkraut._,sauerkraut.core.{Writer,given}caseclassMyAwesomeData(theBest:Int,theCoolest:String)derivesWriterscala>MyAwesomeData(1,"The Greatest").prettyPrintvalres0:String=Struct(rs$line$2.MyAwesomeData) {theBest:1theCoolest:TheGreatest}
We split Serialization into three layers:
- The
source
layer. It is expected these are some kind of stream. - The
Format
layer. This is responsible for reading a raw source and converting intothe component types used in theShape
layer. SeePickleReader
andPickleWriter
. - The
Shape
layer. This is responsible for turning Primitives, Structs, Choices and Collectionsinto component types.
It's the circle of data:
Source => format => shape => memory => shape => format => Destination [PickleData] => PickleReader => Builder[T] => T => Writer[T] => PickleWriter => [PickleData]
This, hopefully, means we can reuse a lot of logic betwen various formats with light loss to efficiency.
Note: This library is not measuring performance yet.
The Shape layer is responsible for extracting Scala types into known shapes that can be used forserialization. These shapes, current, areCollection
,Structure
andPrimitive
. Customshapes can be created in terms of these three shapes.
The Shape layer defines these three classes:
sauerkraut.core.Writer[T]
:Can translate a value into write* calls of Primitive, Structure or Collection.sauerkraut.core.Builder[T]
:
Can accept an incomiing stream of collections/structures/primitives and build a value of T from them.sauerkraut.core.Buildable[T]
:Can provide aBuilder[T]
when asked.
The format layer is responsible for mapping sauerkraut shapes (Collection
,Structure
,Primitive
,Choice
) intothe underlying format. Not all shapes in sauerkraut will map exactly to underlying formats, and so eachformat may need to adjust/tweak incoming data as appropriate.
The format layer has these primary classes:
sauerkraut.format.PickleReader
: Can load data and push it into a Builder of type Tsauerkraut.format.PickleWriter
: Accepts pushed structures/collections/primitives and places it into a Pickle
Thesource
layer is allowed to be any type that a format wishes to support. Inputs and outputs areprovided to the API via these two classes:
sauerkraut.format.PickleReaderSupport[Input, Format]
:A given of this instance will allow thePickleReader
to be constructed from a type of input.sauerkraut.format.PickleWriterSupport[Output,Format]
:A given of this instance will allowPickleWriter
to be constructed from a type of output.
This layer is designed to support any type of input and output, not just an in-memory store (like a Json Ast) ora streaming input. Formats can define what types of input/output (or execution environment) they allow.
New formats are expected to provide the "format" + "source" layer implementations they require.
TODO - a bit more here.
There are a few major differences from the oldscala pickling project.
- The core library is built for 100% static code generation. While we think that dynamic (i.e. runtime-reflection-based)pickling could be built using this library, it is a non-goal.
- Users are expected to rely on typeclass derivation to generate Reader/Writers, rather than using macros
- The supported types that can be pickled are limited to the same supported by typeclass derivation or thatcan have hand-written
Writer[_]
/Builder[_]
instances.
- Readers are no longer driven by the Scala type. Instead we use a new
Buildable[A]
/Builder[A}
designto allow eachPickleReader
to push value into aBuilder[A]
that will then construct the scala class. - There have been no runtime performance optimisations around codegen. Those will come as we test thelimits of Scala 3 / Dotty.
- Format implementations are separate libraries.
- The
PickleWriter
contract has been split into several types to avoid misuse. This places a heavier amountof lambdas in play, but may be offsite with optimisations in modern versions of Scala/JVM. - The name is more German.
Benchmarking is still being built-out, and is pending the final design on Choice/Sum-Types within the Format/Shape layer.
You can see benchmark results via: benchmarks/jmh:run -rf csv
.
Latest status/analysis can be found in thebenchmarks directory.
- Basic comparison of all formats
- Size-of-Pickle measurement
- Well-thought out dataset for reading/writing
- Isolated read vs. write testing
- Comparison against other frameworks.
- Protos vs. protocol buffer java implementation
- Json Reading vs. raw JAWN to AST (measure overhead)
- Jackson
- Kryo
- Thrift
- Circe
- uPickle
- Automatic well-formatted graph dump in Markdown of results.
Thanks to everyone who contributed to the original pickling library for inspiration, with a few callouts.
- Heather Miller + Philipp Haller for the original idea, innovation and motivation for Scala.
- Havoc Pennington + Eugene Yokota for helping define what's important when pickling a protocol and evolving that protocol.