Best practices for spatial analysis

This document describes best practices for optimizing geospatial query performancein BigQuery. You can use these best practices to improve performanceand reduce cost and latency.

Datasets can contain large collections of polygons, multipolygon shapes, andlinestrings to represent complex features—for example, roads, land parcels, andflood zones. Each shape can contain thousands of points. In most spatialoperations in BigQuery (for example, intersections and distancecalculations), the underlying algorithm usually visits the majority of points ineach shape to produce a result. For some operations, the algorithm visits allpoints. For complex shapes, visiting each point can increase the cost and duration ofthe spatial operations. You can use the strategies and methods presented in thisguide to optimize these common spatial operations for improved performance andreduced cost.

This document assumes that your BigQuery geospatial tables areclustered on a geography column.

Simplify shapes

Best practice: Use simplify and snap-to-grid functions to store a simplifiedversion of your original dataset as a materialized view.

Many complex shapes with large numbers of points can be simplified without muchloss in precision. Use the BigQueryST_SIMPLIFY andST_SNAPTOGRIDfunctions separately or together to reduce the number of points in complex shapes.Combine these functions with BigQuerymaterialized views to store asimplified version of your original dataset as a materialized view that'sautomatically kept up to date against the base table.

Simplifying shapes is most useful for improving the cost and performance of a datasetin the following use cases:

  • You need to maintain a high degree of similarity to the true shape.
  • You must perform high-precision, high-accuracy operations.
  • You want to speed up visualizations without visible loss in shape detail.

The following code sample shows how to use theST_SIMPLIFY function on a basetable that has aGEOGRAPHY column namedgeom. The code simplifies shapesand removes points without disturbing any edge of a shape by morethan the given tolerance of 1.0 meters.

CREATEMATERIALIZEDVIEWproject.dataset.base_mvCLUSTERBYgeomAS(SELECT*EXCEPT(geom),ST_SIMPLIFY(geom,1.0)ASgeomFROMbase_table)

The following code sample shows how to use theST_SNAPTOGRID function to snapthe points to a grid with a resolution of 0.00001 degrees:

CREATEMATERIALIZEDVIEWproject.dataset.base_mvCLUSTERBYgeomAS(SELECT*EXCEPT(geom),ST_SNAPTOGRID(geom,-5)ASgeomFROMbase_table)

Thegrid_size argument in this function serves as the exponent, which means10e-5 = 0.00001. This resolution is equivalent to around 1 meter in the worstcase, which occurs at the equator.

After you create these views, query thebase_mv view using the same querysemantics you would use to query the base table. You can use this techniqueto quickly identify a collection of shapes that need to be analyzed more deeply,and then you can perform a second deeper analysis on the base table. Test yourqueries to see which threshold values work best for your data.

Note: TheST_SIMPLIFY function preserves the topology of the input shape to avoid oversimplification.

For measurement use cases, determine the level of accuracy that your use caserequires. When using theST_SIMPLIFY function, set thethreshold_metersparameter to the required level of accuracy. For measuring distances at the scaleof a city or larger, set a threshold of 10 meters. At smaller scales—forexample, when measuring the distance between a building and the nearest body ofwater—consider using a smaller threshold of 1 meter or less. Using smallerthreshold values results in removing fewer points from the given shape.

When serving map layers from a web service, you can precalculate materializedviews for different zoom levels with thebigquery-geotools project,which is a driver for Geoserver that lets you serve spatial layers fromBigQuery. This driver creates multiple materialized views withdifferentST_SIMPLIFY threshold parameters so that less detail is served athigher zoom levels.

Use points and rectangles

Best practice: Reduce the shape to a point or rectangle to represent itslocation.

You can improve query performance by reducing the shape to a single point ora rectangle. The methods in this section don't accurately represent the detailsand proportions of the shape, but rather optimize for representing the locationof the shape.

You can use the geographic central point of a shape (itscentroid) torepresent the location of the whole shape. Use a rectangle containing the shapeto create the shape'sextent, which you can use to represent the shape'slocation and maintain information about its relative size.

Using points and rectangles is most useful for improving the cost and performanceof a dataset when you need to measure the distance between two points, such asbetween two cities.

For example, consider loading a database of land parcels in the United States intoa BigQuery table and then determining the nearest body of water.In this case, precomputing parcel centroids using theST_CENTROIDfunction in combination with the method described in theSimplify shapes section of this document can reduce thenumber of comparisons performed when using theST_DISTANCE orST_DWITHINfunctions. When using theST_CENTROID function, the parcel centroid needs tobe considered in the calculation. Precomputing the parcel centroids in this waycan also reduce variability in performance, because different parcel shapes arelikely to contain different numbers of points.

A variant of this method is to use theST_BOUNDINGBOXfunction instead of theST_CENTROID function to compute a rectangular envelopearound the input shape. While it's not quite as efficient as using a single point,it can reduce the occurrence of certain edge cases. This variant still offersgood and consistent performance, since the output of theST_BOUNDINGBOXfunction always contains only four points that need to be considered. Thebounding box result will be of the typeSTRUCT, whichmeans you'll need to calculate the distances manually or use thevector index method described later inthis document.

Use hulls

Best practice: Use a hull to optimize for representing the location of a shape.

If you imagine shrink-wrapping a shape and computing the boundary of the shrinkwrap, that boundary is called thehull. In a convex hull, all the angles of theresulting shape are convex. Like a shape's extent, a convex hull retains someinformation about the underlying shape's relative size and proportions. However,using a hull comes at the cost of needing to store and consider more pointsin subsequent analyses.

You can use theST_CONVEXHULL function to optimize for representing thelocation of the shape. Using this function improves accuracy, but this comesat the cost of decreased performance. TheST_CONVEXHULL function is similar totheST_EXTENTfunction, except the output shape contains more points and varies in the numberof points based on the complexity of the input shape. While the performancebenefit is likely negligible for small datasets of non-complex shapes, for verylarge datasets with large and complex shapes, theST_CONVEXHULL function offersa good balance between cost, performance, and accuracy.

Use grid systems

Best practice: Use geospatial grid systems to compare areas.

If your use cases involve aggregating data within localized areas andcomparing statistical aggregations of those areas with each other, you canbenefit from utilizing a standardized grid system to compare different areas.

For example, a retailer might want to analyze demographic changes over time inareas where their stores are located or where they are contemplating building anew store. Or, an insurance company might want to improve their understanding ofproperty risks by analyzing the prevailing natural hazard risks in a particular area.

Usingstandard grid systems such as S2 and H3can speed up such statistical aggregations and spatial analyses. Using these gridsystems can also simplify the development of analytics and improve developmentefficiency.

For example, comparisons usingcensus tracts in the United Statessuffer from inconsistency in size, which means corrective factors need to beapplied to perform like-for-like comparisons between census tracts. Additionally,census tracts and other administrative boundaries change over time and requireeffort to correct for these changes. Using grid systems for spatial analysis canaddress such challenges.

Use vector search and vector indexes

Best practice: Use vector search and vector indexes for nearest-neighborgeospatial queries.

Vector search capabilities wereintroduced in BigQueryto enable machine-learning use cases such as semantic search, similarity detection,and retrieval-augmented generation. The key to enabling these use cases is anindexing method calledapproximate nearest-neighbor search.You can usevector search to speed up and simplify nearest-neighbor geospatial queries by comparing vectors that represent points in space.

You can use vector search to search for features by radius. First, establish aradius for your search. You can discover the optimal radius in the result set ofa nearest-neighbor search. After you establish the radius, use theST_DWITHINfunction to identify nearby features.

For example, consider finding the ten buildings nearest to a particularanchor building that you already have the location of. You can store thecentroids of each building as a vector in a new table, index the table, andsearch using vector search.

For this example, you can also useOverture Maps data in BigQueryto create a separate table of building shapes corresponding to an area ofinterest and a vector calledgeom_vector. The area of interest in this exampleis the city of Norfolk, VA, United States, represented byFIPS code51710, as shown in the following code sample:

CREATETABLEvector_search.norfolk_buildingsAS(SELECT*,[ST_X(ST_CENTROID(building.geometry)),ST_Y(ST_CENTROID(building.geometry))]ASgeom_vectorFROM`bigquery-public-data.overture_maps.building`ASbuildingINNERJOIN`bigquery-public-data.geo_us_boundaries.counties`AScountyON(st_intersects(county.county_geom,building.geometry))WHEREcounty.county_fips_code='51710')

The following code sample shows how to create a vector index on the table:

CREATEvectorindexbuilding_vector_indexONvector_search.norfolk_buildings(geom_vector)OPTIONS(index_type='IVF')

This query identifies the 10 buildings nearest to a particular anchorbuilding:

SELECTbase.*FROMVECTOR_SEARCH(TABLEvector_search.norfolk_buildings,'geom_vector',(SELECTgeom_vectorFROMvector_search.norfolk_buildingsWHEREid='56873794-9873-4fe1-871a-5987bb3a0efb'),top_k=>10,distance_type=>'EUCLIDEAN',options=>'{"fraction_lists_to_search":0.1}')
Note: You might need to adjust some parameters—such astop_k andfraction_lists_to_search—to work with your particular data.

In theQuery results pane, click theVisualization tab. The map showsa cluster of building shapes nearest to the anchor building:

Geospatial data visualized in BigQuery.

When you run this query in the Google Cloud console, clickJob Information andverify thatVector Index Usage Mode is set toFULLY_USED. This indicatesthat the query is leveraging thebuilding_vector_index vector index, which youcreated earlier.

Caution: Because Euclidean distance is used in this vector search, you might getdifferent results than if theST_DISTANCE function was used directly,especially if you are comparing over long distances where the curvature of theearth begins to have a larger effect.

Divide large shapes

Best practice: Divide large shapes with theST_SUBDIVIDE function.

Use theST_SUBDIVIDE function tobreak large shapes or long line strings into smaller shapes.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.