Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Apache Avro

From Wikipedia, the free encyclopedia
Open-source remote procedure call framework

Apache Avro
Developer(s)Apache Software Foundation
Initial release2 November 2009; 15 years ago (2009-11-02)[1]
Stable release
1.11.3 / September 23, 2023; 17 months ago (2023-09-23)[2]
RepositoryAvro Repository
Written inJava,C,C++,C#,Perl,Python,PHP,Ruby
TypeRemote procedure call framework
LicenseApache License 2.0
Websiteavro.apache.org

Avro is arow-orientedremote procedure call and dataserializationframework developed withinApache's Hadoop project. It usesJSON for definingdata types andprotocols, and serializes data in a compact binary format. Its primary use is inApache Hadoop, where it can provide both a serialization format forpersistent data, and awire format for communication between Hadoopnodes, and from client programs to the Hadoopservices.Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages: one for human editing (Avro IDL) and another which is moremachine-readable based on JSON.[3]

It is similar toThrift andProtocol Buffers, but does not require running a code-generation program when aschema changes (unless desired forstatically-typed languages).

Apache Spark SQL can access Avro as a data source.[4]

Avro Object Container File

[edit]

An AvroObject Container File consists of:[5]

A file header consists of:

  • Four bytes,ASCII 'O', 'b', 'j', followed by the Avro version number which is 1 (0x01) (Binary values 0x4F 0x62 0x6A 0x01).
  • File metadata, including the schema definition.
  • The 16-byte, randomly-generated sync marker for this file.

For data blocks Avro specifies two serialization encodings:[6] binary and JSON. Most applications will use the binary encoding, as it is smaller and faster. For debugging and web-based applications, the JSON encoding may sometimes be appropriate.

Schema definition

[edit]

Avro schemas are defined using JSON. Schemas are composed of primitive types (null, boolean, int, long, float, double, bytes, and string) and complex types (record, enum, array, map, union, and fixed).[7]

Simple schema example:

{"namespace":"example.avro","type":"record","name":"User","fields":[{"name":"name","type":"string"},{"name":"favorite_number","type":["null","int"]},{"name":"favorite_color","type":["null","string"]}]}

Serializing and deserializing

[edit]

Data in Avro might be stored with its corresponding schema, meaning a serialized item can be read without knowing the schema ahead of time.

Example serialization and deserialization code in Python

[edit]

Serialization:[8]

importavro.schemafromavro.datafileimportDataFileReader,DataFileWriterfromavro.ioimportDatumReader,DatumWriter# Need to know the schema to write. According to 1.8.2 of Apache Avroschema=avro.schema.parse(open("user.avsc","rb").read())writer=DataFileWriter(open("users.avro","wb"),DatumWriter(),schema)writer.append({"name":"Alyssa","favorite_number":256})writer.append({"name":"Ben","favorite_number":8,"favorite_color":"red"})writer.close()

File "users.avro" will contain the schema in JSON and a compact binary representation[9] of the data:

$od-v-tx1zusers.avro00000004f626a0104146176726f2e636f646563>Obj...avro.codec<0000020086e756c6c166176726f2e736368656d>.null.avro.schem<000004061ba037b2274797065223a2022726563>a..{"type": "rec<00000606f7264222c20226e616d65223a202255>ord", "name": "U<0000100736572222c20226e616d657370616365>ser", "namespace<0000120223a20226578616d706c652e6176726f>": "example.avro<0000140222c20226669656c6473223a205b7b22>", "fields": [{"<000016074797065223a2022737472696e67222c>type": "string",<000020020226e616d65223a20226e616d65227d> "name": "name"}<00002202c207b2274797065223a205b22696e74>, {"type": ["int<0000240222c20226e756c6c225d2c20226e616d>", "null"], "nam<000026065223a20226661766f726974655f6e75>e": "favorite_nu<00003006d626572227d2c207b2274797065223a>mber"}, {"type":<0000320205b22737472696e67222c20226e756c> ["string", "nul<00003406c225d2c20226e616d65223a20226661>l"], "name": "fa<0000360766f726974655f636f6c6f72227d5d7d>vorite_color"}]}<00004000005f9a38098475462bf6895a2ab42ef>......GTb.h...B.<000042024042c0c416c79737361008004020642>$.,.Alyssa.....B<0000440656e0010000672656405f9a380984754>en....red.....GT<000046062bf6895a2ab42ef24>b.h...B.$<0000471

Deserialization:

# The schema is embedded in the data filereader=DataFileReader(open("users.avro","rb"),DatumReader())foruserinreader:print(user)reader.close()

This outputs:

{'name':'Alyssa','favorite_number':256,'favorite_color':None}{'name':'Ben','favorite_number':8,'favorite_color':'red'}

Languages with APIs

[edit]

Though theoretically any language could use Avro, the following languages have APIs written for them:[10][11]

Avro IDL

[edit]

In addition to supporting JSON for type and protocol definitions, Avro includes experimental[24] support for an alternativeinterface description language (IDL) syntax known as Avro IDL. Previously known as GenAvro, this format is designed to ease adoption by users familiar with more traditional IDLs and programming languages, with a syntax similar to C/C++,Protocol Buffers and others.

Logo

[edit]

The original Apache Avro logo was from the defunct British aircraft manufacturerAvro (originally A.V. Roe and Company).[25]

The Apache Avro logo was updated to an original design in late 2023.[26]

See also

[edit]

References

[edit]
  1. ^"Apache Avro: a New Format for Data Interchange".blog.cloudera.com. RetrievedMarch 10, 2019.
  2. ^"Apache Avro Releases".avro.apache.org. RetrievedSeptember 23, 2023.
  3. ^Kleppmann, Martin (2017).Designing Data-Intensive Applications (First ed.). O'Reilly. p. 122.
  4. ^"3 Reasons Why In-Hadoop Analytics are a Big Deal - Dataconomy".dataconomy.com. April 21, 2016.
  5. ^"Apache Avro Specification: Object Container Files".avro.apache.org. RetrievedSeptember 8, 2024.
  6. ^"Apache Avro Specification: Encodings".avro.apache.org. RetrievedSeptember 8, 2024.
  7. ^"Apache Avro Getting Started (Python)".avro.apache.org. Archived fromthe original on June 5, 2016. RetrievedMarch 11, 2019.
  8. ^"Apache Avro Getting Started (Python)".avro.apache.org. Archived fromthe original on June 5, 2016. RetrievedMarch 11, 2019.
  9. ^"Apache Avro Specification: Data Serialization".avro.apache.org. RetrievedSeptember 8, 2024.
  10. ^phunt."GitHub - phunt/avro-rpc-quickstart: Apache Avro RPC Quick Start. Avro is a subproject of Apache Hadoop".GitHub. RetrievedApril 13, 2016.
  11. ^"Supported Languages - Apache Avro - Apache Software Foundation". RetrievedApril 21, 2016.
  12. ^"Avro: 1.5.1 - ASF JIRA". RetrievedApril 13, 2016.
  13. ^"[AVRO-533] .NET implementation of Avro - ASF JIRA". RetrievedApril 13, 2016.
  14. ^"Supported Languages". RetrievedApril 13, 2016.
  15. ^"AvroEx".hexdocs.pm. RetrievedOctober 18, 2017.
  16. ^"Avrora — avrora v0.21.1".hexdocs.pm. RetrievedJune 11, 2021.
  17. ^"avro package - github.com/hamba/avro - Go Packages".pkg.go.dev. RetrievedJuly 4, 2023.
  18. ^goavro, LinkedIn, June 30, 2023, retrievedJuly 4, 2023
  19. ^"Native Haskell implementation of Avro". Thomas M. DuBuisson, Galois, Inc. RetrievedAugust 8, 2016.
  20. ^"Pure JavaScript implementation of the Avro specification".GitHub. RetrievedMay 4, 2020.
  21. ^"Getting Started (Python)".Apache Avro. RetrievedJuly 4, 2023.
  22. ^Avro, Apache,avro: Avro is a serialization and RPC framework., retrievedJuly 4, 2023
  23. ^"Apache Avro client library implementation in Rust". RetrievedDecember 17, 2018.
  24. ^"Apache Avro 1.8.2 IDL". Archived fromthe original on September 20, 2010. RetrievedMarch 11, 2019.
  25. ^"The Avro Logo".avroheritagemuseum.co.uk. RetrievedDecember 31, 2018.
  26. ^"[AVRO-3908] Update project logo everywhere - ASF JIRA".apache.org. RetrievedFebruary 6, 2024.

Further reading

[edit]
Top-level
projects
Commons
Incubator
Other projects
Attic
Licenses
Human readable
Binary
Retrieved from "https://en.wikipedia.org/w/index.php?title=Apache_Avro&oldid=1277533386"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp