![]() | |
Developer(s) | Apache Software Foundation |
---|---|
Initial release | 2 November 2009; 15 years ago (2009-11-02)[1] |
Stable release | |
Repository | Avro Repository |
Written in | Java,C,C++,C#,Perl,Python,PHP,Ruby |
Type | Remote procedure call framework |
License | Apache License 2.0 |
Website | avro |
Avro is arow-orientedremote procedure call and dataserializationframework developed withinApache's Hadoop project. It usesJSON for definingdata types andprotocols, and serializes data in a compact binary format. Its primary use is inApache Hadoop, where it can provide both a serialization format forpersistent data, and awire format for communication between Hadoopnodes, and from client programs to the Hadoopservices.Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages: one for human editing (Avro IDL) and another which is moremachine-readable based on JSON.[3]
It is similar toThrift andProtocol Buffers, but does not require running a code-generation program when aschema changes (unless desired forstatically-typed languages).
Apache Spark SQL can access Avro as a data source.[4]
An AvroObject Container File consists of:[5]
A file header consists of:
For data blocks Avro specifies two serialization encodings:[6] binary and JSON. Most applications will use the binary encoding, as it is smaller and faster. For debugging and web-based applications, the JSON encoding may sometimes be appropriate.
Avro schemas are defined using JSON. Schemas are composed of primitive types (null, boolean, int, long, float, double, bytes, and string) and complex types (record, enum, array, map, union, and fixed).[7]
Simple schema example:
{"namespace":"example.avro","type":"record","name":"User","fields":[{"name":"name","type":"string"},{"name":"favorite_number","type":["null","int"]},{"name":"favorite_color","type":["null","string"]}]}
Data in Avro might be stored with its corresponding schema, meaning a serialized item can be read without knowing the schema ahead of time.
Serialization:[8]
importavro.schemafromavro.datafileimportDataFileReader,DataFileWriterfromavro.ioimportDatumReader,DatumWriter# Need to know the schema to write. According to 1.8.2 of Apache Avroschema=avro.schema.parse(open("user.avsc","rb").read())writer=DataFileWriter(open("users.avro","wb"),DatumWriter(),schema)writer.append({"name":"Alyssa","favorite_number":256})writer.append({"name":"Ben","favorite_number":8,"favorite_color":"red"})writer.close()
File "users.avro" will contain the schema in JSON and a compact binary representation[9] of the data:
$od-v-tx1zusers.avro00000004f626a0104146176726f2e636f646563>Obj...avro.codec<0000020086e756c6c166176726f2e736368656d>.null.avro.schem<000004061ba037b2274797065223a2022726563>a..{"type": "rec<00000606f7264222c20226e616d65223a202255>ord", "name": "U<0000100736572222c20226e616d657370616365>ser", "namespace<0000120223a20226578616d706c652e6176726f>": "example.avro<0000140222c20226669656c6473223a205b7b22>", "fields": [{"<000016074797065223a2022737472696e67222c>type": "string",<000020020226e616d65223a20226e616d65227d> "name": "name"}<00002202c207b2274797065223a205b22696e74>, {"type": ["int<0000240222c20226e756c6c225d2c20226e616d>", "null"], "nam<000026065223a20226661766f726974655f6e75>e": "favorite_nu<00003006d626572227d2c207b2274797065223a>mber"}, {"type":<0000320205b22737472696e67222c20226e756c> ["string", "nul<00003406c225d2c20226e616d65223a20226661>l"], "name": "fa<0000360766f726974655f636f6c6f72227d5d7d>vorite_color"}]}<00004000005f9a38098475462bf6895a2ab42ef>......GTb.h...B.<000042024042c0c416c79737361008004020642>$.,.Alyssa.....B<0000440656e0010000672656405f9a380984754>en....red.....GT<000046062bf6895a2ab42ef24>b.h...B.$<0000471
Deserialization:
# The schema is embedded in the data filereader=DataFileReader(open("users.avro","rb"),DatumReader())foruserinreader:print(user)reader.close()
This outputs:
{'name':'Alyssa','favorite_number':256,'favorite_color':None}{'name':'Ben','favorite_number':8,'favorite_color':'red'}
Though theoretically any language could use Avro, the following languages have APIs written for them:[10][11]
In addition to supporting JSON for type and protocol definitions, Avro includes experimental[24] support for an alternativeinterface description language (IDL) syntax known as Avro IDL. Previously known as GenAvro, this format is designed to ease adoption by users familiar with more traditional IDLs and programming languages, with a syntax similar to C/C++,Protocol Buffers and others.
The original Apache Avro logo was from the defunct British aircraft manufacturerAvro (originally A.V. Roe and Company).[25]
The Apache Avro logo was updated to an original design in late 2023.[26]