- Notifications
You must be signed in to change notification settings - Fork127
hamba/avro
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
A fast Go avro codec
Warning
This project is no longer maintained.
If you wish to update or extendavro, please do so in a fork.
I am grateful for the contributions and support from the community over the years.
This project was born out of necessity for a fast and reliable Avro codec for Go.It has been a labor of love, and I hope has served you well in your projects, butI no longer have the time to maintain it.
Install with:
go get github.com/hamba/avro/v2
Note: This project has renamed the default branch frommaster tomain. You will need to update your local environment.
typeSimpleRecordstruct {Aint64`avro:"a"`Bstring`avro:"b"`}schema,err:=avro.Parse(`{ "type": "record", "name": "simple", "namespace": "org.hamba.avro", "fields" : [ {"name": "a", "type": "long"}, {"name": "b", "type": "string"} ]}`)iferr!=nil {log.Fatal(err)}in:=SimpleRecord{A:27,B:"foo"}data,err:=avro.Marshal(schema,in)iferr!=nil {log.Fatal(err)}fmt.Println(data)// Outputs: [54 6 102 111 111]out:=SimpleRecord{}err=avro.Unmarshal(schema,data,&out)iferr!=nil {log.Fatal(err)}fmt.Println(out)// Outputs: {27 foo}
More examples in thegodoc.
| Avro | Go Struct | Go Interface |
|---|---|---|
null | nil | nil |
boolean | bool | bool |
bytes | []byte | []byte |
float | float32 | float32 |
double | float64 | float64 |
long | int*,int64,uint32** | int,int64,uint32 |
int | int*,int32,int16,int8,uint8*,uint16* | int,uint8,uint16 |
fixed | uint64 | uint64 |
string | string | string |
array | []T | []any |
enum | string | string |
fixed | [n]byte | [n]byte |
map | map[string]T{} | map[string]any |
record | struct | map[string]any |
union | see below | see below |
int.date | time.Time | time.Time |
int.time-millis | time.Duration | time.Duration |
long.time-micros | time.Duration | time.Duration |
long.timestamp-millis | time.Time | time.Time |
long.timestamp-micros | time.Time | time.Time |
long.local-timestamp-millis | time.Time | time.Time |
long.local-timestamp-micros | time.Time | time.Time |
bytes.decimal | *big.Rat | *big.Rat |
fixed.decimal | *big.Rat | *big.Rat |
string.uuid | string | string |
* Please note that the size of the Go typeint is platform dependent. Decoding an Avrolong into a Goint isonly allowed on 64-bit platforms and will result in an error on 32-bit platforms. Similarly, be careful when encoding aGoint using Avroint on a 64-bit platform, as that can result in an integer overflow causing misinterpretation ofthe data.
** Please note that when the Go type is an unsigned integer care must be taken to ensure that information is not lostwhen converting between the Avro type and Go type. For example, storing anegative number in Avro ofint = -100would be interpreted asuint16 = 65,436 in Go. Another example would be storing numbers in Avroint = 256 thatare larger than the Go typeuint8 = 0.
The following union types are accepted:map[string]any,*T andany.
- map[string]any: If the union value is
nil, anilmap will be en/decoded.When a non-nilunion value is encountered, a single key is en/decoded. The key is the avrotype name, or schema full name in the case of a named schema (enum, fixed or record). - *T: This is allowed in a "nullable" union. A nullable union is defined as a two schema union,with one of the types being
null(ie.["null", "string"]or["string", "null"]), in this casea*Tis allowed, withTmatching the conversion table above. In the case of a slice, the slice can be useddirectly. - *struct{}: implementing the
UnionConverterinterface:
// UnionConverter to handle Avro Union's in a type-safe waytypeUnionConverterinterface {// FromAny payload decode into any of the mentioned types in the Union.FromAny(payloadany)error// ToAny from the Union structToAny() (any,error)}// for example:constSchema=`{"name": "Payload", "type": "record", "fields": [{"name": "union", "type": ["int", {"type": "record", "name": "test", "fields" : [{"name": "a", "type": "long"}, {"name": "b", "type": "string"}]}]}]}`typePayloadstruct {Union*UnionRecord`avro:"union"`}typeUnionRecordstruct {Int*intTest*TestRecord}func (u*UnionRecord)ToAny() (any,error) {ifu.Int!=nil {returnu.Int,nil }elseifu.Test!=nil {returnu.Test,nil }returnnil,errors.New("no value to encode")}func (u*UnionRecord)FromAny(payloadany)error {switcht:=payload.(type) {caseint:u.Int=&tcaseTestRecord:u.Test=&tdefault:returnerrors.New("unknown type during decode of union") }returnnil}typeTestRecordstruct {Aint64`avro:"a"`Bstring`avro:"b"`}
Note due to way Go checks if some type implements these interface, the type usedmust be a pointer as the interface methodsmustbe implemented with pointer receivers.
- any: An
interfacecan be provided and the type or name resolved. Primitive typesare pre-registered, but named types, maps and slices will need to be registered with theRegisterfunction.In the case of arrays and maps the enclosed schema type or name is postfix to the type with a:separator,e.g"map:string". Behavior when a type cannot be resolved will depend on your chosen configuation options:- !Config.UnionResolutionError && !Config.PartialUnionTypeResolution: the map type above is used
- Config.UnionResolutionError && !Config.PartialUnionTypeResolution: an error is returned
- !Config.UnionResolutionError && Config.PartialUnionTypeResolution: any registered type will get resolved while any unregistered type will fallback to the map type above.
- Config.UnionResolutionError && !Config.PartialUnionTypeResolution: any registered type will get resolved while any unregistered type will return an error.
The interfacesTextMarshaler andTextUnmarshaler are supported for astring schema type. The object willbe tested first for implementation of these interfaces, in the case of astring schema, before trying regularencoding and decoding.
Enums may also implementTextMarshaler andTextUnmarshaler, and must resolve to valid symbols in the given enum schema.
One type can beConvertibleTo another type if they have identical underlying types.A non-native type is allowed to be used if it can be convertible totime.Time,big.Rat oravro.LogicalDuration for the particular ofLogicalTypes.
Ex.:type Timestamp time.Time
In case of incompatible types, custom type conversion functions can be registered with theRegisterTypeConverters function.This requires the use ofmap[string]any or[]any.The type conversion for encoding will receive the original value that is to be encoded, and must return a data type that is compatible with the schema, as specified in the table above.The type conversion for decoding will receive the decoded value with a data type that is compatible with the schema, and its return value will be used as the final decoded value.
For security reasons, the configurationConfig.MaxByteSliceSize restricts the maximum size ofbytes andstring types createdby theReader. The default maximum size is1MiB and is configurable. This is required to stop untrusted input from consuming all memory andcrashing the application. Should this not be need, setting a negative number will disable the behaviour.
Benchmark source code can be found at:https://github.com/nrwiersma/avro-benchmarks
BenchmarkGoAvroDecode-8 788455 1505 ns/op 418 B/op 27 allocs/opBenchmarkGoAvroEncode-8 624343 1908 ns/op 806 B/op 63 allocs/opBenchmarkGoGenAvroDecode-8 1360375 876.4 ns/op 320 B/op 11 allocs/opBenchmarkGoGenAvroEncode-8 2801583 425.9 ns/op 240 B/op 3 allocs/opBenchmarkHambaDecode-8 5046832 238.7 ns/op 47 B/op 0 allocs/opBenchmarkHambaEncode-8 6017635 196.2 ns/op 112 B/op 1 allocs/opBenchmarkLinkedinDecode-8 1000000 1003 ns/op 1688 B/op 35 allocs/opBenchmarkLinkedinEncode-8 3170553 381.5 ns/op 248 B/op 5 allocs/opAlways benchmark with your own workload. The result depends heavily on the data input.
Go structs can be generated for you from the schema. The types generated follow the same logic intypes conversionsYou can use the avrogen command line tool to generate the structs, or use it as a lib in internal commands, it's thegen package.
Install the struct generator with:
go install github.com/hamba/avro/v2/cmd/avrogen@<version>
Example usage assuming there's a valid schema inin.avsc:
avrogen -pkg avro -o bla.go -tags json:snake,yaml:upper-camel in.avsc
Tip: Omit-o FILE to dump the generated Go structs to stdout instead of a file.
Check the options and usage with-h:
avrogen -h
You can register custom logical type mappings to be used during code generation.
The format of a custom logical type mapper isavroLogicalType,goType[,importPath]. For example,to map the logical typeuuid to the Go typegithub.com/google/uuid.UUID, you would use:
avrogen -pkg avro -o bla.go -logical-type uuid,uuid.UUID,github.com/google/uuid in.avsc
If the type you are mapping to is a built-in Go type (e.g.,string,int, etc.), you can omit the import path element in the mapping definition:
avrogen -pkg avro -o bla.go -logical-type date,int32 in.avsc
If you intend to use multiple custom logical type mappings, you can specify the-logicaltype flag multiple times.
A small Avro schema validation command-line utility is also available. This simple tool leverages theschema parsing functionality of the library, showing validation errors or optionally dumping parsedschemas to the console. It can be used in CI/CD pipelines to validate schema changes in a repository.
Install the Avro schema validator with:
go install github.com/hamba/avro/v2/cmd/avrosv@<version>
Example usage assuming there's a valid schema inin.avsc (exit status code is0):
avrosv in.avsc
An invalid schema will result in a diagnostic output and a non-zero exit status code:
avrosv bad-default-schema.avsc;echo$?Error: avro: invalid defaultfor field someString.<nil> not a string2
Schemas referencing other schemas can also be validated by providing all of them (schemas are parsed in order):
avrosv base-schema.avsc schema-withref.avsc
Check the options and usage with-h:
avrosv -h
Avro names are validated according to theAvro specification.
However, the official Java library does not validate said names accordingly, resulting to some files out in the wildto have invalid names. Thus, this library has a configuration option to allow for these invalid names to be parsed.
avro.SkipNameValidation=true
Note that this variable is global, so ideally you'd need to unset it after you're done with the invalid schema.
This library supports the last two versions of Go. While the minimum Go version isnot guaranteed to increase along side Go, it may jump from time to time to supportadditional features. This will be not be considered a breaking change.
About
A fast Go Avro codec
Topics
Resources
License
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.