Table sampling
Preview
This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
Table sampling lets you query random subsets of data from largeBigQuery tables. Sampling returns a variety of records while avoidingthe costs associated with scanning and processing an entire table.
Using table sampling
To use table sampling in a query, include theTABLESAMPLEclause. For example, the following query selects approximately 10% of a table'sdata:
SELECT*FROMdataset.my_tableTABLESAMPLESYSTEM(10PERCENT)Unlike theLIMIT clause,TABLESAMPLE returns a random subset of data from atable. Also, BigQuery does not cache the results of queries thatinclude aTABLESAMPLE clause, so the query might return different results eachtime.
You can combine theTABLESAMPLE clause with other selection conditions. Thefollowing example samples about 50% of the table and then applies aWHEREclause:
SELECT*FROMdataset.my_tableTABLESAMPLESYSTEM(50PERCENT)WHEREcustomer_id=1The next example combines aTABLESAMPLE clause with aJOIN clause:
SELECT *FROM dataset.table1 T1 TABLESAMPLE SYSTEM (10 PERCENT)JOIN dataset.table2 T2 TABLESAMPLE SYSTEM (20 PERCENT) USING (customer_id)For smaller tables, if you join two samples and none of the sampled rows meetthe join condition, then you might receive an empty result.
You can specify the percentage as aquery parameter. The next example showshow to pass the percentage to a query by using the bq command-line tool:
bqquery--use_legacy_sql=false --parameter=percent:INT64:29 \'SELECT*FROM`dataset.my_table`TABLESAMPLESYSTEM(@percentPERCENT)`BigQuery tables are organized into data blocks. TheTABLESAMPLEclause works by randomly selecting a percentage of data blocks from the tableand reading all of the rows in the selected blocks. The sampling granularityis limited by the number of data blocks.
Typically, BigQuery splits tables or table partitions into blocksif they are larger than about 1 GB. Smaller tables might consist of a singledata block. In that case, theTABLESAMPLE clause reads the entire table. Ifthe sampling percentage is greater than zero and the table is not empty, thentable sampling always returns some results.
Blocks can be different sizes, so the exact fraction of rows that are sampledmight vary. If you want to sample individual rows, rather than data blocks, thenyou can use aWHERE rand() < K clause instead. However, this approach requiresBigQuery to scan the entire table. To save costs but stillbenefit from row-level sampling, you can combine both techniques.
The following example reads approximately 20% of the data blocks from storageand then randomly selects 10% of the rows in those blocks:
SELECT * FROM dataset.my_table TABLESAMPLE SYSTEM (20 PERCENT)WHERE rand() < 0.1External tables
You can use theTABLESAMPLE clause with external tables that store data in acollection of files. BigQuery samples a subset of the externalfiles that the table references. For some file formats, BigQuerycan split individual files into blocks for sampling. Some external data, such asdata in Google Sheets, consists of a single file that is sampled as one blockof data.
Sampling from the write-optimized storage
If you use table sampling withstreaming inserts, thenBigQuery samples data from the write-optimized storage. In some cases,all the data in the write-optimized storage is represented as a single block. When that happens,either all the data in the write-optimized storage appears in the results, or none of it does.
Partitioned and clustered tables
Partitioning and clustering produce blocks where all rows within a specificblock have either the same partitioning key or have clustering attributes withclose values. Therefore, sample sets from these tables tend to be more biasedthan sample sets from non-partitioned, non-clustered tables.
Limitations
- A sampled table can only appear once in a query statement. This restrictionincludes tables that are referenced inside view definitions.
- Sampling data from views is not supported.
- Sampling the results of subqueries or table-valued function calls is notsupported.
- Sampling from an array scan, such as the result of calling the
UNNESToperator, is not supported. - Sampling inside an
INsubquery is not supported. - Sampling from tables with row-level security applied is not supported.
Table sampling pricing
If you useon-demand billing, then youare charged for reading the data that is sampled. BigQuery doesnot cache the results of a query that includes aTABLESAMPLE clause, so eachexecution incurs the cost of reading the data from storage.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.