| Apache Pinot | |
|---|---|
| Original authors |
|
| Developer | Apache Pinot |
| Stable release | 1.2.0 / 21 August 2024; 17 months ago (2024-08-21) |
| Written in | Java |
| Operating system | Cross-platform |
| Type | |
| License | Apache License 2.0 |
| Website | pinot |
| Repository | Pinot repository |
Apache Pinot is acolumn-oriented,open-source,distributeddata store written inJava. Pinot is designed to executeOLAP queries with low latency.[1][2][3][4][5] It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion.[6][7][8] The name Pinot comes from thePinot grape vines that are pressed into liquid that is used to produce a variety of different wines. The founders of the database chose the name as a metaphor for analyzing vast quantities of data from a variety of different file formats or streaming data sources.[9]
Pinot was first created atLinkedIn after the engineering staff determined that there were no off the shelf solutions that met the social networking site's requirements like predictable low latency, data freshness in seconds, fault tolerance and scalability.[9][10] Pinot is used in production by technology companies such asUber,[11]Microsoft,[8] andFactual.
Pinot was started as an internal project at LinkedIn in 2013 to power a variety of user-facing and business-facing products. The first analytics product at LinkedIn to use Pinot was a redesign of the social networking site's feature that allows members to see who has viewed their profile in real-time. The project was open-sourced in June 2015 under an Apache 2.0 license and was donated to the Apache Software Foundation by LinkedIn in June 2019.[9][8]

Pinot usesApache Helix for cluster management. Helix is embedded as an agent within the different components and usesApache ZooKeeper for coordination and maintaining the overall cluster state and health. All Pinot servers and brokers are managed by Helix. Helix is a generic cluster management framework to manage partitions and replicas in a distributed system.
Queries are received by brokers—which checks the request against the segment-to-server routing table—scattering the request between real-time and offline servers.
Pinot leverages Apache Helix for cluster management. Helix is a cluster management framework to manage replicated, partitioned resources in a distributed system. Helix uses Zookeeper to store cluster state and metadata.
Pinot shares similar features with comparable OLAP datastores, such asApache Druid.[12][13] Like Druid, Pinot is a column-oriented database with various compression schemes such asRun Length andFixed-Bit Length. Pinot supports pluggableindexing technologies - Sorted Index,Bitmap Index,Inverted Index, Star-Tree Index, and Range Index, which are what primarily differentiates Pinot from other OLAP datastores.
Pinot supports near real-time ingestion from streams such asKafka,AWS Kinesis andbatch ingestion from sources such asHadoop,S3,Azure,GCS. Like most otherOLAP datastores anddata warehousing solutions, Pinot supports aSQL-like query language that supports selection, aggregation, filtering, group by, order by, distinct queries on data.