- Notifications
You must be signed in to change notification settings - Fork144
feat(ibis): support listing tables from multiple datasets or projects for BigQuery#1391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
feat(ibis): support listing tables from multiple datasets or projects for BigQuery#1391
Conversation
coderabbitaibot commentedDec 17, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
WalkthroughThis PR splits BigQuery connection info into dataset/project variants, propagates the discriminator union across DTOs and routers, adds multi-project/multi-dataset metadata APIs and v3 metadata endpoints, and adapts connectors, utils, and tests to handle BigQuery-specific filtering, limits, and region/permission validation. Changes
Sequence Diagram(s)sequenceDiagram participant Client as Client participant Router as v3 Router participant Factory as MetadataFactory participant Metadata as BigQueryMetadata participant Util as Timeout Executor participant GCP as BigQuery API Client->>Router: POST /{ds}/metadata/tables (MetadataDTO + filterInfo, limit) Router->>Router: build connection_info from headers Router->>Factory: MetadataFactory.get_metadata(connection_info) Factory->>Metadata: instantiate BigQueryMetadata(connection_info) Router->>Router: compute filter_info via get_filter_info(dto, data_source) Router->>Util: execute_get_table_list_with_timeout(metadata, filter_info, limit) Util->>Util: detect BigQueryMetadata type Util->>Metadata: call get_table_list(filter_info, limit) Metadata->>GCP: issue per-project/dataset metadata queries (dynamic SQL) GCP-->>Metadata: return table/column metadata Metadata->>Metadata: transform rows -> Table/Column models Metadata-->>Util: return list[Table] Util-->>Router: return results (with timeout handling) Router-->>Client: respond 200 list[Table]Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-02-04T12:08:44.141ZApplied to files:
🧬 Code graph analysis (2)ibis-server/app/model/__init__.py (2)
ibis-server/app/model/metadata/bigquery.py (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (19)
Thanks for usingCodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ibis-server/app/routers/v2/connector.py (1)
321-330:Missing guard for BigQueryProjectConnectionInfo.The
get_db_versionendpoint acceptsV2MetadataDTObut lacks the guard thatget_table_listandget_constraintshave. IfBigQueryProjectConnectionInfois not supported by v2, this endpoint should also reject it for consistency.async def get_db_version( data_source: DataSource, dto: V2MetadataDTO, headers: Annotated[Headers, Depends(get_wren_headers)] = None, ) -> str: connection_info = data_source.get_connection_info( dto.connection_info, dict(headers) )+ if isinstance(connection_info, BigQueryProjectConnectionInfo):+ raise WrenError(+ ErrorCode.INVALID_CONNECTION_INFO,+ "BigQuery project-level connection info is only supported by v3 API for metadata version retrieval.",+ ) metadata = MetadataFactory.get_metadata(data_source, connection_info) return await execute_get_version_with_timeout(metadata)
🧹 Nitpick comments (4)
ibis-server/app/model/metadata/dto.py (1)
33-36:Consider adding validation for empty or malformed filter info.The function directly passes
infotoBigQueryFilterInfo(**info)without validation. IfinfoisNoneor missing required fields, this will raise a Pydantic validation error, which may be acceptable, but consider whether you want to handle edge cases more gracefully.ibis-server/tests/routers/v3/connector/bigquery/test_query.py (1)
2-4:Minor import cleanup.The
splitfunction fromreis only used once (line 757, 777). Consider usingre.splitdirectly inline instead of importing justsplit.ibis-server/app/model/__init__.py (1)
113-117:Consider using abstract base class pattern.The base class method raises an exception instead of being abstract. While functional, using
ABCwith@abstractmethodwould provide clearer intent and IDE support.ibis-server/app/model/metadata/bigquery.py (1)
259-270:Potential performance concern with public datasets.Listing all datasets from
bigquery-public-datavialist_datasetscould be slow as noted in the PR objectives. Consider adding a warning log or documentation about this behavior.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
ibis-server/app/model/__init__.py(4 hunks)ibis-server/app/model/connector.py(1 hunks)ibis-server/app/model/data_source.py(3 hunks)ibis-server/app/model/metadata/bigquery.py(2 hunks)ibis-server/app/model/metadata/dto.py(2 hunks)ibis-server/app/model/metadata/metadata.py(1 hunks)ibis-server/app/routers/v2/connector.py(6 hunks)ibis-server/app/routers/v3/connector.py(3 hunks)ibis-server/app/util.py(2 hunks)ibis-server/tests/routers/v2/connector/test_bigquery.py(1 hunks)ibis-server/tests/routers/v3/connector/bigquery/conftest.py(1 hunks)ibis-server/tests/routers/v3/connector/bigquery/test_query.py(3 hunks)ibis-server/tools/query_local_run.py(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-02-06T07:58:58.830Z
Learnt from: goldmedalRepo: Canner/wren-engine PR: 1055File: ibis-server/app/model/connector.py:152-157Timestamp: 2025-02-06T07:58:58.830ZLearning: In BigQueryConnector, credentials with additional scopes (drive and cloud-platform) should only be created in the retry path when handling empty results with special types, not during initialization, to maintain lazy initialization.Applied to files:
ibis-server/app/model/connector.py
📚 Learning: 2025-02-04T12:08:44.141Z
Learnt from: goldmedalRepo: Canner/wren-engine PR: 1053File: ibis-server/app/model/__init__.py:187-196Timestamp: 2025-02-04T12:08:44.141ZLearning: In GcsFileConnectionInfo, both secret-based (key_id/secret_key) and credentials-based (service account) authentication methods are required fields.Applied to files:
ibis-server/app/model/__init__.py
🧬 Code graph analysis (9)
ibis-server/tests/routers/v2/connector/test_bigquery.py (1)
ibis-server/tests/conftest.py (1)
client(18-23)
ibis-server/app/util.py (3)
ibis-server/app/model/metadata/bigquery.py (3)
BigQueryMetadata(46-522)get_table_list(51-187)get_schema_list(199-271)ibis-server/app/model/metadata/dto.py (1)
FilterInfo(14-15)ibis-server/app/model/metadata/metadata.py (3)
get_table_list(10-13)Metadata(6-24)get_schema_list(21-24)
ibis-server/app/model/data_source.py (1)
ibis-server/app/model/__init__.py (2)
BigQueryDatasetConnectionInfo(120-131)BigQueryProjectConnectionInfo(134-156)
ibis-server/app/routers/v3/connector.py (4)
ibis-server/app/model/metadata/dto.py (3)
Catalog(114-116)MetadataDTO(18-21)Table(106-111)ibis-server/app/model/metadata/factory.py (2)
MetadataFactory(43-62)get_metadata(45-62)ibis-server/app/util.py (2)
execute_get_schema_list_with_timeout(377-398)execute_get_table_list_with_timeout(353-374)ibis-server/app/model/metadata/metadata.py (2)
get_table_list(10-13)get_schema_list(21-24)
ibis-server/app/model/connector.py (2)
ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1)
connection_info(25-30)ibis-server/app/model/__init__.py (3)
get_billing_project_id(113-117)get_billing_project_id(127-128)get_billing_project_id(144-145)
ibis-server/tools/query_local_run.py (2)
ibis-server/app/model/__init__.py (1)
BigQueryDatasetConnectionInfo(120-131)ibis-server/app/model/connector.py (1)
BigQueryConnector(506-549)
ibis-server/app/model/__init__.py (1)
ibis-server/app/model/error.py (2)
ErrorCode(19-38)WrenError(65-113)
ibis-server/app/model/metadata/dto.py (1)
ibis-server/app/model/data_source.py (1)
DataSource(63-224)
ibis-server/app/model/metadata/metadata.py (2)
ibis-server/app/model/error.py (2)
ErrorCode(19-38)WrenError(65-113)ibis-server/app/model/metadata/dto.py (4)
Catalog(114-116)Constraint(125-131)FilterInfo(14-15)Table(106-111)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: ci
- GitHub Check: ci
🔇 Additional comments (30)
ibis-server/tests/routers/v2/connector/test_bigquery.py (1)
395-411:LGTM! Well-structured test for v2 project connection guard.The test correctly validates that v2 API rejects BigQuery project-level connection info and returns the expected error message. The connection info structure properly matches BigQueryProjectConnectionInfo with all required fields.
ibis-server/app/model/connector.py (1)
524-530:LGTM! Project-scoped BigQuery client initialization.The explicit project parameter from
connection_info.get_billing_project_id()and default QueryJobConfig with timeout are correctly configured. This aligns with the new BigQuery connection info variants that distinguish dataset vs project contexts.ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1)
33-40:LGTM! Project-level connection fixture properly configured.The new fixture provides project-level BigQuery connection info with appropriate defaults and environment variable sourcing, complementing the existing dataset-level fixture.
ibis-server/tools/query_local_run.py (2)
18-19:Updated imports for BigQuery dataset connection handling.The imports correctly reflect the new BigQuery connection type system with BigQueryDatasetConnectionInfo and BigQueryConnector.
94-95:LGTM! BigQuery connection creation updated to use new types.The code correctly uses
BigQueryDatasetConnectionInfo.model_validate_jsonandBigQueryConnectorinstead of the previous generic types, aligning with the new dataset/project connection distinction.ibis-server/app/model/data_source.py (3)
19-20:Updated imports for BigQuery connection variants.The imports correctly bring in the new BigQueryDatasetConnectionInfo and BigQueryProjectConnectionInfo types that enable dataset vs project-level connection handling.
156-158:LGTM! Conditional validation for BigQuery connection types.The logic correctly distinguishes between project-level and dataset-level connections based on the
bigquery_typefield, routing to the appropriate connection info class.
318-318:Function signature updated for dataset-specific connections.The
get_bigquery_connectionsignature correctly specifiesBigQueryDatasetConnectionInfo, as this function is specifically for dataset-level Ibis backend connections (project-level connections are handled separately via BigQueryConnector).ibis-server/app/routers/v3/connector.py (3)
35-36:New imports for metadata operations.The imports correctly bring in the necessary DTOs (Catalog, MetadataDTO, Table), helper functions (get_filter_info, MetadataFactory), and timeout executors for the new metadata endpoints.
Also applies to: 45-46
556-579:LGTM! Well-structured table metadata endpoint.The endpoint follows the established pattern:
- Proper tracing with span
- Builds connection info with headers
- Uses MetadataFactory for metadata retrieval
- Parses filter info from DTO
- Executes with timeout control
The implementation is clean and consistent with other v3 endpoints.
582-605:LGTM! Schema metadata endpoint mirrors table endpoint pattern.The schema list endpoint follows the same clean pattern as the table endpoint, ensuring consistency in the v3 metadata API design.
ibis-server/app/util.py (3)
45-46:New imports for BigQuery-aware metadata handling.The imports bring in BigQueryMetadata and FilterInfo to enable conditional execution paths for BigQuery-specific metadata operations.
353-374:LGTM! BigQuery-aware table list execution with backward compatibility.The function correctly:
- Accepts optional filter_info and limit parameters
- Conditionally invokes BigQuery-specific get_table_list when metadata is BigQueryMetadata
- Falls back to generic path for other metadata types
- Maintains backward compatibility for non-BigQuery data sources
377-398:LGTM! Schema list execution mirrors table list pattern.The new function follows the same clean pattern as execute_get_table_list_with_timeout, ensuring consistency in BigQuery-aware metadata retrieval.
ibis-server/app/model/metadata/metadata.py (4)
2-3:Updated imports for error handling and new DTOs.The imports correctly bring in ErrorCode, WrenError for proper error raising, and new DTOs (Catalog, FilterInfo) for the expanded metadata interface.
6-6:Class changed from ABC to concrete with NotImplemented methods.The removal of ABC inheritance is appropriate since all methods now have concrete implementations that raise WrenError(ErrorCode.NOT_IMPLEMENTED). This provides better error messages than abstract method errors.
10-13:New get_table_list method with filter and limit support.The method signature correctly accepts optional filter_info and limit parameters, enabling BigQuery's multi-project/dataset filtering while maintaining a clean base interface for subclasses to override.
21-24:New get_schema_list method completes metadata interface.The schema list method follows the same pattern as get_table_list, providing consistent filtering capabilities across metadata operations.
ibis-server/app/routers/v2/connector.py (1)
20-34:LGTM!The imports are well-organized and properly structured for the new BigQuery connection type handling and error management.
ibis-server/app/model/metadata/dto.py (3)
10-11:LGTM!Clean DTO definition for v2 metadata endpoints that correctly uses the field alias.
24-30:LGTM!The BigQueryFilterInfo and ProjectDatasets models are well-structured for the multi-project/dataset filtering use case.
114-116:LGTM!Simple and clean model for representing catalog/schema metadata.
ibis-server/tests/routers/v3/connector/bigquery/test_query.py (4)
168-186:LGTM!Good addition to verify multi-dataset connection info works with the existing manifest.
189-254:LGTM!Excellent test for cross-project dataset queries, including reference to a public dataset. The test validates both the response structure and data integrity.
694-735:LGTM!Good coverage for schema listing with assertions for expected schemas and handling of public datasets.
862-877:Duplicate test case.This block (lines 862-877) is identical to lines 845-860, testing the same "empty projectId" scenario twice. Remove the duplicate.
assert response.status_code == 422 assert "project id should not be empty" in response.text- response = await client.post(- url=f"{base_url}/metadata/tables",- json={- "connectionInfo": project_connection_info,- "filterInfo": {- "projects": [- {- "projectId": "",- }- ],- },- },- )-- assert response.status_code == 422- assert "project id should not be empty" in response.text- response = await client.post(Likely an incorrect or invalid review comment.
ibis-server/app/model/__init__.py (2)
120-131:LGTM!Well-structured dataset connection info with proper field definitions and hash implementation.
159-162:LGTM!Good use of Pydantic's discriminated union pattern for type-safe connection info handling.
ibis-server/app/model/metadata/bigquery.py (2)
189-197:LGTM!Good validation to ensure dataset region matches the connection region for cross-project queries.
468-493:LGTM!Comprehensive validation for project and dataset IDs with clear error messages.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
goldmedal commentedDec 17, 2025
BigQuery has tested locally: |
Uh oh!
There was an error while loading.Please reload this page.
Metadata v3 API
Listing Table
POST /v3/connector/bigquery/metadata/tables{"connectionInfo": {"bigquery_type":"project","billing_project_id":"wrenai","region":"asia-east1","credentials":"..." },"filterInfo": {"projects": [ {"projectId":"wrenai","datasetIds": ["tpch_tiny","tpch_sf1" ] }, {"projectId":"bigquery-public-data","datasetIds": ["austin_311" ] } ] } }[ {"name":"bigquery-public-data.austin_311.311_service_requests","columns": [ {"name":"city","type":"STRING","notNull":false,"description":null,"properties": {"column_order":12 },"nestedColumns":null },... ],"description":null,"properties": { },"primaryKey":"" }, {"name":"wrenai.tpch_tiny_us.orders","columns": [... ],"description":null,"properties": {"schema":"tpch_tiny_us","catalog":"wrenai","table":"orders","path":null },"primaryKey":"" }]List Schema
POST /v3/connector/bigquery/metadata/schemas{"connectionInfo": {"bigquery_type":"project","billing_project_id":"wrenai","region":"asia-east1","credentials":"..." },"filterInfo": {"projects": [ {"projectId":"wrenai" }, {"projectId":"bigquery-public-data" } ] } }[ {"name":"wrenai","schemas": ["rudderstack_setup_test","alex_tide_issue" ] }, {"name":"bigquery-public-data","schemas": ["america_health_rankings","austin_311","austin_bikeshare","austin_crime",... ] }BigQuery Filter Info for listing
{"projects": [ {"projectId":"wrenai","datasetIds": ["tpch_tiny"] }, {"projectId":"bigquery-public-data","datasetIds": ["..."] } ]}It can include multiple projects. Each object has
projectIdanddatasetIdsfields.projectIdis required for all API.datasetIdsis required for listing tables.New Connection Info for BigQuery
Introduce
BigQueryProjectConnectionInfofor querying multiple datasets:{"bigquery_type":"project","billing_project_id":"wrenai","region":"asia-east1","credentials":"..."}For BigQuery, the most important thing is the region. BigQuery only allows joining tables that are located in the same region, even if they might belong to different projects (the service account should have the corresponding permission).
BigQueryProjectConnectionInfofieldsbilling_project_id: the billing project id of your BigQuery connectionregion: the region of your BigQuery connectioncredentials: Base64 encodecredentials.jsonFor listing public datasets
bigquery-public-dataproject. We use the Python SDK of BigQuery to list the datasets of the public data. However, the listing API of the SDK can't apply a filter for the region, and the result doesn't include the region info. We can't ensure the region of the public data when listing datasets.Summary by CodeRabbit
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.