| Company type | Product ofBroadcom |
|---|---|
| Industry | Big data technologies |
| Headquarters | Palo Alto,California |
| Products | Database management system software |
| Greenplum Database | |
|---|---|
| Developer | Broadcom |
| Stable release | 7.0.0 / September 28, 2023; 2 years ago (2023-09-28) |
| Operating system | Linux |
| Type | Database management system |
| Website | greenplum |
Greenplum is abig data technology based onMPP architecture and thePostgres open source database technology. The technology was created by a company of the same name headquartered inSan Mateo,California around 2005. Greenplum was acquired byEMC Corporation in July 2010.[1]
Starting in 2012, itsdatabase management system software became known as thePivotal Greenplum Database sold throughPivotal Software. Pivotal open sourced the core engine and continued its development by the Greenplum Database open source community and Pivotal.
Starting in 2020 Pivotal was acquired byVMware[2] and VMware continued to sponsor the Greenplum Database open source community as well as commercialize the technology under the brand nameVMware Tanzu Greenplum. In November 2023, VMware was acquired by Broadcom.[3]
In May 2024, Tanzu by Broadcom made the decision to close source the Greenplum Database project. All future releases of Greenplum Database will be closed source and released as part of the VMware Tanzu Data Suite.
Greenplum, the company, was founded in September 2003 by Scott Yara and Luke Lonergan. It was a merger of two smaller companies: Metapa (founded in August 2000 nearLos Angeles)[4] and Didera inFairfax, Virginia.[5]
Investors included SoundView Ventures, Hudson Ventures and Royal Wulff Ventures. A total ofUS$20 million in funding was announced at the merger.[6] Greenplum, based inSan Mateo, California, released itsdatabase management system software based onPostgreSQL in April 2005 calling it Bizgres.[7] Rounds ofventure capital of aboutUS$15 million each were invested in March 2006 and February 2007.[8]
In July 2006 a partnership withSun Microsystems was announced.[9] Sun, which had also acquiredMySQL AB, participated in a round ofUS$27 million investment in January 2009, led byMeritech Capital Partners.[8] The Bizgres project included a few other members, and was supported through about 2008, when the product was just called "Greenplum" as well.[10][11] TheSun Fire X4500 was a reference architecture and used by the majority of customers until a transition was made toLinux around that time. Greenplum was acquired byEMC Corporation in July 2010, becoming the foundation of EMC'sbig data software division.[1] Although EMC did not disclose the value, it was estimated atUS$300 million.[12][13] Greenplum's products at the time of acquisition were the Greenplum Database, Chorus (a management tool), and Data Science Labs. Greenplum had customers invertical markets includingeBay.[14] It became part ofPivotal Software in 2012.[15]
A variant usingApache Hadoop to store data in the Hadoopfile system called Hawq was announced in 2013.[16][17] In 2015 the GreenplumDB and Hawqopen source software projects were announced.[18]
Pivotal's Greenplum database product usesmassively parallel processing (MPP) techniques. Eachcomputer cluster consists of a master node, standby master node, and segment nodes.[19] All of the data resides on the segment nodes and the catalog information is stored in the master nodes. Segment nodes run one or more segments, which are modified PostgreSQL database instances and are assigned a content identifier. For each table the data is divided among the segment nodes based on the distribution column keys specified by the user in thedata definition language. For each segment content identifier there is both a primary segment and mirror segment which are not running on the same physical host. When a query enters the master node, it is parsed, planned and dispatched to all of the segments to execute the query plan and either return the requested data or insert the result of the query into a database table. TheStructured Query Language, versionSQL:2003, is used to present queries to the system. Transaction semantics comply with constraints known asACID.[20]
Competitors include other MPP database management systems provided by major vendors such asTeradata,Amazon Redshift,Microsoft Azure, AlibabaAnalyticDB and, in the past, IBMNetezza.[19][21] Additional competition comes from other smaller competitors,column-oriented databases such as HPVertica,Exasol anddata warehousing vendors with non MPP architecture, such asOracle Exadata,IBM Db2 andSAP HANA.
In September 2023, Greenplum Database Version 7 was released.[22] Version 7 is based on PostgreSQL version 12.12.
In September 2019, Greenplum Database Version 6 was released. Version 6 is based on PostgreSQL version 9.4 and features massive gains in[23]OLTP performance. Greenplum 6 was reviewed in the media by several sources and mentioned for its Postgres open source alignment[24] and for its OLTP performance[25]
In September 2017, Greenplum Database Version 5 was released. Version 5 includes the first iteration of the Greenplum project strategy of merging PostgreSQL later versions back into Greenplum and is based on PostgreSQL version 8.3 up from the previous version 8.2.[26] Version 5 also introducing the General Availability of the GPORCA Optimizer[27] for cost based optimization of SQL designed for big data.