Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual.Learn more
Amazon S3 Select supports only theSELECT SQL command. The following ANSIstandard clauses are supported forSELECT:
SELECT list
FROM clause
WHERE clause
LIMIT clause
Amazon S3 Select queries currently do not support subqueries or joins.
TheSELECT list names the columns, functions, and expressions thatyou want the query to return. The list represents the output of the query.
SELECT *SELECTprojection1 AScolumn_alias_1,projection2 AScolumn_alias_2The first form ofSELECT with the* (asterisk) returns every rowthat passed theWHERE clause, as-is. The second form ofSELECT creates a row with user-defined output scalar expressions andprojection1 for eachcolumn.projection2
Amazon S3 Select supports the following forms of theFROM clause:
FROMtable_nameFROMtable_name aliasFROMtable_name ASaliasIn each form of theFROM clause,table_name is theS3Object that's being queried. Users coming from traditionalrelational databases can think of this as a database schema that contains multipleviews over a table.
Following standard SQL, theFROM clause creates rows that arefiltered in theWHERE clause and projected in theSELECTlist.
For JSON objects that are stored in Amazon S3 Select, you can also use the followingforms of theFROM clause:
FROM S3Object[*].pathFROM S3Object[*].path aliasFROM S3Object[*].path ASaliasUsing this form of theFROM clause, you can select from arrays or objectswithin a JSON object. You can specifypath by using one of the followingforms:
By name (in an object):. orname['name']
By index (in an array):[index]
By wildcard character (in an object):.*
By wildcard character (in an array):[*]
This form of theFROM clause works only with JSONobjects.
Wildcard characters always emit at least one record. If no record matches, then Amazon S3Select emits the valueMISSING. During output serialization(after the query finishes running), Amazon S3 Select replacesMISSING values with empty records.
Aggregate functions (AVG,COUNT,MAX,MIN, andSUM) skipMISSING values.
If you don't provide an alias when using a wildcard character, you can refer to the rowby using the last element in the path. For example, you could select allprices from a list of books by using the querySELECT price FROMS3Object[*].books[*].price. If the path ends in a wildcardcharacter instead of a name, then you can use the value_1 torefer to the row. For example, instead ofSELECT price FROMS3Object[*].books[*].price, you could use the querySELECT_1.price FROM S3Object[*].books[*].
Amazon S3 Select always treats a JSON document as an array of root-level values. Thus, even ifthe JSON object that you are querying has only one root element, theFROM clause must begin withS3Object[*].However, for compatibility reasons, Amazon S3 Select allows you to omit thewildcard character if you don't include a path. Thus, the complete clauseFROM S3Object is equivalent toFROM S3Object[*] asS3Object. If you include a path, you must also use the wildcardcharacter. So,FROM S3Object andFROMS3Object[*].are both valid clauses, butpathFROMS3Object.is not.path
Examples:
Example #1
This example shows results when using the following dataset and query:
{ "Rules": [{"id": "1"},{"expr": "y > x"},{"id": "2", "expr": "z = DEBUG"} ]}{ "created": "June 27", "modified": "July 6" }SELECT id FROM S3Object[*].Rules[*].id{"id":"1"}{}{"id":"2"}{}Amazon S3 Select produces each result for the following reasons:
{"id":"id-1"} –S3Object[0].Rules[0].id produced amatch.
{} –S3Object[0].Rules[1].id did not match a record, soAmazon S3 Select emittedMISSING, which was then changed to an emptyrecord during output serialization and returned.
{"id":"id-2"} –S3Object[0].Rules[2].id produced amatch.
{} –S3Object[1] did not match onRules, soAmazon S3 Select emittedMISSING, which was then changed to an emptyrecord during output serialization and returned.
If you don't want Amazon S3 Select to return empty records when it doesn't find amatch, you can test for the valueMISSING. The following query returnsthe same results as the previous query, but with the empty values omitted:
SELECT id FROM S3Object[*].Rules[*].id WHERE id IS NOT MISSING{"id":"1"}{"id":"2"}Example #2
This example shows results when using the following dataset and queries:
{ "created": "936864000", "dir_name": "important_docs", "files": [{ "name": "." },{ "name": ".." },{ "name": ".aws" },{ "name": "downloads" } ], "owner": "Amazon S3" }{ "created": "936864000", "dir_name": "other_docs", "files": [{ "name": "." },{ "name": ".." },{ "name": "my stuff" },{ "name": "backup" } ], "owner": "User" }SELECT d.dir_name, d.files FROM S3Object[*] d{"dir_name":"important_docs","files":[{"name":"."},{"name":".."},{"name":".aws"},{"name":"downloads"}]}{"dir_name":"other_docs","files":[{"name":"."},{"name":".."},{"name":"my stuff"},{"name":"backup"}]}SELECT _1.dir_name, _1.owner FROM S3Object[*]{"dir_name":"important_docs","owner":"Amazon S3"}{"dir_name":"other_docs","owner":"User"}TheWHERE clause follows this syntax:
WHEREconditionTheWHERE clause filters rows based on the. A condition is anexpression that has a Boolean result. Only rows for which the condition evaluates toconditionTRUE are returned in the result.
TheLIMIT clause follows this syntax:
LIMITnumberTheLIMIT clause limits the number of records that you want the query toreturn based on.number
TheSELECT andWHERE clauses can refer to record data by usingone of the methods in the following sections, depending on whether the file that isbeing queried is in CSV or JSON format.
Column Numbers – You can refer to theNth column of a row with the column name_, whereN is the columnposition. The position count starts at 1. For example, the first columnis namedN_1 and the second column is named_2.
You can refer to a column as_ orN.For example,alias._N_2 andmyAlias._2 are both validways to refer to a column in theSELECT list andWHERE clause.
Column Headers – For objects in CSV format thathave a header row, the headers are available to theSELECTlist andWHERE clause. In particular, as in traditionalSQL, withinSELECT andWHERE clauseexpressions, you can refer to the columns byoralias.column_name.column_name
Document – You can access JSON document fieldsas.You can also access nested fields, for example,alias.name.alias.name1.name2.name3
List – You can access elements in a JSON list byusing zero-based indexes with the[] operator. For example,you can access the second element of a list as. You cancombine accessing list elements with fields, for example,alias[1].alias.name1.name2[1].name3
Examples: Consider this JSON objectas a sample dataset:
{"name": "Susan Smith","org": "engineering","projects": [{"project_name":"project1", "completed":false},{"project_name":"project2", "completed":true} ]}Example #1
The following query returns these results:
Select s.name from S3Object s{"name":"Susan Smith"}Example #2
The following query returns these results:
Select s.projects[0].project_name from S3Object s{"project_name":"project1"}With Amazon S3 Select, you can use double quotation marks to indicate that column headers (forCSV objects) and attributes (for JSON objects) are case sensitive. Without doublequotation marks, object headers and attributes are case insensitive. An error isthrown in cases of ambiguity.
The following examples are either 1) Amazon S3 objects in CSV format with the specified columnheaders, and withFileHeaderInfo set to"Use" for thequery request; or 2) Amazon S3 objects in JSON format with the specifiedattributes.
Example #1: The object being queried has the header or attributeNAME.
The following expression successfully returns values from the object. Because there areno quotation marks, the query is case insensitive.
SELECT s.name from S3Object sThe following expression results in a 400 errorMissingHeaderName. Becausethere are quotation marks, the query is case sensitive.
SELECT s."name" from S3Object sExample #2: The Amazon S3 object being queried has one header or attributewithNAME and another header or attribute withname.
The following expression results in a 400 errorAmbiguousFieldName. Becausethere are no quotation marks, the query is case insensitive, but there aretwo matches, so the error is thrown.
SELECT s.name from S3Object sThe following expression successfully returns values from the object. Because there arequotation marks, the query is case sensitive, so there is noambiguity.
SELECT s."NAME" from S3Object sAmazon S3 Select has a set of reserved keywords that are needed to run the SQL expressions usedto query object content. Reserved keywords include function names, data types,operators, and so on. In some cases, user-defined terms, such as the column headers(for CSV files) or attributes (for JSON objects), might clash with a reservedkeyword. When this happens, you must use double quotation marks to indicate that youare intentionally using a user-defined term that clashes with a reserved keyword.Otherwise a 400 parse error will result.
For the full list of reserved keywords, seeReserved keywords.
The following example is either 1) an Amazon S3 object in CSV format with the specified columnheaders, withFileHeaderInfo set to"Use" for the queryrequest, or 2) an Amazon S3 object in JSON format with the specified attributes.
Example: The object being queried has a header or attribute namedCAST, which is a reserved keyword.
The following expression successfully returns values from the object. Because quotationmarks are used in the query, S3 Select uses the user-defined header orattribute.
SELECT s."CAST" from S3Object sThe following expression results in a 400 parse error. Because no quotation marks areused in the query,CAST clashes with a reserved keyword.
SELECT s.CAST from S3Object sWithin theWHERE clause and theSELECT list, you canhave SQLscalar expressions, which areexpressions that return scalar values. They have the following form:
literal
An SQL literal.
column_reference
A reference to a column in the formorcolumn_name.alias.column_name
unary_opexpression
In this case,is an SQL unary operator.unary_op
expressionbinary_opexpression
In this case,is an SQL binary operator.binary_op
func_name
In this case, is thename of the scalar function to invoke.func_name
expression[ NOT ] BETWEENexpressionANDexpression
expressionLIKE[expressionESCAPE]expression