Conventions and terminology
For clarity, it is important to establish the meaning behind certain words as, the same wording might convey different meanings to different readers depending on one’s familiarity with SQL versus Elasticsearch.
This documentation while trying to be complete, does assume the reader hasbasic understanding of Elasticsearch and/or SQL. If that is not the case, continue reading the documentation however take notes and pursue the topics that are unclear either through the main Elasticsearch documentation or through the plethora of SQL material available in the open (there are simply too many excellent resources here to enumerate).
As a general rule, Elasticsearch SQL as the name indicates provides a SQL interface to Elasticsearch. As such, it follows the SQL terminology and conventions first, whenever possible. However, the backing engine itself is Elasticsearch for which Elasticsearch SQL was purposely created hence why features or concepts that are not available, or cannot be mapped correctly, in SQL appear in Elasticsearch SQL. Last but not least, Elasticsearch SQL tries to obey theprinciple of least surprise, though as all things in the world, everything is relative.
While SQL and Elasticsearch have different terms for the way the data is organized (and different semantics), essentially their purpose is the same.
So let’s start from the bottom; these roughly are:
| SQL | Elasticsearch | Description |
|---|---|---|
column | field | In both cases, at the lowest level, data is stored innamed entries, of a variety ofdata types, containingone value. SQL calls such an entry acolumn while Elasticsearch afield. Notice that in Elasticsearch a field can containmultiple values of the same type (essentially a list) while in SQL, acolumn can containexactly one value of said type. Elasticsearch SQL will do its best to preserve the SQL semantic and, depending on the query, reject those that return fields with more than one value. |
row | document | Columns andfields donot exist by themselves; they are part of arow or adocument. The two have slightly different semantics: arow tends to bestrict (and have more enforcements) while adocument tends to be a bit more flexible or loose (while still having a structure). |
table | index | The target against which queries, whether in SQL or Elasticsearch get executed against. |
schema | implicit | In RDBMS,schema is mainly a namespace of tables and typically used as a security boundary. Elasticsearch does not provide an equivalent concept for it. However when security is enabled, Elasticsearch automatically applies the security enforcement so that a role sees only the data it is allowed to (in SQL jargon, itsschema). |
catalog ordatabase | cluster instance | In SQL,catalog ordatabase are used interchangeably and represent a set of schemas that is, a number of tables.In Elasticsearch the set of indices available are grouped in acluster. The semantics also differ a bit; adatabase is essentially yet another namespace (which can have some implications on the way data is stored) while an Elasticsearchcluster is a runtime instance, or rather a set of at least one Elasticsearch instance (typically running distributed).In practice this means that while in SQL one can potentially have multiple catalogs inside an instance, in Elasticsearch one is restricted to onlyone. |
cluster | cluster (federated) | Traditionally in SQL,cluster refers to a single RDBMS instance which contains a number ofcatalogs ordatabases (see above). The same word can be reused inside Elasticsearch as well however its semantic clarified a bit.While RDBMS tend to have only one running instance, on a single machine (not distributed), Elasticsearch goes the opposite way and by default, is distributed and multi-instance. Further more, an Elasticsearch cluster can be connected to otherclusters in afederated fashion thuscluster means:single cluster<>Multiple Elasticsearch instances typically distributed across machines, running within the same namespace.multiple clusters>Multiple clusters, each with its own namespace, connected to each other in a federated setup (seeCross-cluster search). |
As one can see while the mapping between the concepts are not exactly one to one and the semantics somewhat different, there are more things in common than differences. In fact, thanks to SQL declarative nature, many concepts can move across Elasticsearch transparently and the terminology of the two likely to be used interchangeably throughout the rest of the material.