| Apache Impala | |
|---|---|
| Developer | Apache Software Foundation |
| Initial release | April 28, 2013; 12 years ago (2013-04-28) |
| Stable release | 4.5.0 / March 4, 2025; 11 months ago (2025-03-04) |
| Written in | C++,Java |
| Operating system | Cross-platform |
| Type | RelationalHadoop-analytics |
| License | Apache License 2.0 |
| Website | impala |
| Repository | Impala Repository |
Apache Impala is anopen sourcemassively parallel processing (MPP) SQL query engine for data stored in acomputer cluster runningApache Hadoop.[1] Impala has been described as the open-source equivalent ofGoogle F1, which inspired its development in 2012.[2]
Apache Impala is a query engine that runs on Apache Hadoop. The project was announced in October 2012 with a publicbeta test distribution[3][4] and became generally available in May 2013.[5]
Impala brings scalableparallel database technology to Hadoop, enabling users to issue low-latencySQL queries to data stored inHDFS andApache HBase without requiring data movement or transformation. Impala is integrated with Hadoop to use the same file and data formats, metadata, security and resource management frameworks used byMapReduce,Apache Hive,Apache Pig and other Hadoop software.
Impala is promoted for analysts and data scientists to perform analytics on data stored in Hadoop via SQL orbusiness intelligence tools. The result is that large-scale data processing (via MapReduce) and interactive queries can be done on the same system using the same data and metadata – removing the need to migrate data sets into specialized systems and/or proprietary formats simply to perform analysis.
Features include:
In early 2013, acolumn-oriented file format calledParquet was announced for architectures including Impala.[6]In December 2013,Amazon Web Services announced support for Impala.[7]In early 2014,MapR added support for Impala.[8]In 2015, another format calledKudu was announced, whichCloudera proposed to donate to theApache Software Foundation along with Impala.[9]Impala graduated to an Apache Top-Level Project (TLP) on 28 November 2017.[10]