Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat(knowledge): introduce the generic SQL knowledge#1389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
goldmedal wants to merge23 commits intoCanner:main
base:main
Choose a base branch
Loading
fromgoldmedal:feat/generic-knowledge

Conversation

@goldmedal
Copy link
Contributor

@goldmedalgoldmedal commentedDec 16, 2025
edited by coderabbitaibot
Loading

  • Define the generic knowledge of Text to SQL
  • Define the semi-structured type operation (follow Snowflake style)

Summary by CodeRabbit

  • New Features

    • Enhanced JSON handling (path extraction, array/object casts, rich type checks) and expanded BigQuery dialect support.
    • Dynamic, filesystem-backed knowledge loading with dialect-aware instruction selection and a new SQL correction rule.
  • Documentation

    • Comprehensive SQL generation guidelines and correction procedures.
    • New guides: arrays/unnesting, calculated fields, date/time functions, semi-structured and structured types.
  • Refactor

    • Internal module reorganization for cleaner function grouping.
  • Tests

    • Added BigQuery JSON function validation tests.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actionsgithub-actionsbot added core bigquery ibis rustPull requests that update Rust code pythonPull requests that update Python code labelsDec 16, 2025
@coderabbitai
Copy link
Contributor

coderabbitaibot commentedDec 16, 2025
edited
Loading

Warning

Rate limit exceeded

@goldmedal has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait9 minutes and 19 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the@coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see ourFAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between06d3768 and8e07b2a.

📒 Files selected for processing (2)
  • ibis-server/resources/knowledge/sql_correction_rule.txt (1 hunks)
  • ibis-server/resources/knowledge/text_to_sql_rule.txt (1 hunks)

Walkthrough

Refactored Ibis Server to load SQL rules and instructions from filesystem resources, added SQL correction rule and dialect-specific instruction files, expanded Wren Core with a JSON scalar UDF module and BigQuery rewrites, reorganized function utils imports, and added tests for BigQuery JSON transformations.

Changes

Cohort / File(s)Change Summary
Ibis Server Knowledge API
ibis-server/app/mdl/knowledge.py
AddedKNOWLEDGE_RESOURCE_PATH; changedget_text_to_sql_rule() andget_sql_instructions() to load fromresources/knowledge; addedget_sql_correction_rule(); removed hard-coded instruction blocks.
Ibis Server Knowledge Resources
ibis-server/resources/knowledge/text_to_sql_rule.txt,ibis-server/resources/knowledge/sql_correction_rule.txt
Addedtext_to_sql_rule.txt (SQL generation constraints) andsql_correction_rule.txt (SQL planning & correction guidance).
Ibis Server Instruction Files
ibis-server/resources/knowledge/instructions/*.txt,.../dialects/bigquery.txt
Added instruction files:array_usage.txt,calculated_field.txt,date_and_time_functionality.txt,semi_structured_type.txt,structured_type.txt, anddialects/bigquery.txt.
Ibis Server Router
ibis-server/app/routers/v3/connector.py
get_sql_knowledge() now includessql_correction_rule from the knowledge manager in its response payload.
Wren Core: JSON scalar module
wren-core/core/src/mdl/function/scalar/json.rs,wren-core/core/src/mdl/function/scalar/mod.rs
Newjson scalar module with many JSON UDFs (get_path,as_*,is_*); integrated viapub(crate) mod json; andjson_functions() returning the UDF vector.
Wren Core: BigQuery dialect & rewrites
wren-core/core/src/mdl/dialect/inner_dialect.rs,wren-core/core/src/mdl/function/dialect/bigquery/mod.rs,.../bigquery/{aggregate,scalar,window}.rs
Added BigQuery rewrites forget_pathJSON_EXTRACT,as_arrayJSON_EXTRACT_ARRAY (with validation/error path), and mappingas_varchar/as_integer/as_double/as_boolean to lax cast helpers; updated imports to new utils location.
Wren Core: Module reorg
wren-core/core/src/mdl/function/mod.rs,wren-core/core/src/mdl/function/dialect/mod.rs
Movedutils module scope: removedmod utils; underfunction/dialect and addedmod utils; underfunction; adjusted imports accordingly.
Wren Core: Tests & minor refactor
wren-core/core/src/mdl/mod.rs,wren-core/core/src/mdl/lineage.rs
Added async testtest_bigquery_json for JSON transformations; minor ownership change inlineage.rs (removed.cloned() and adjusted argument passing).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Pay attention to: JSON UDF return types and build_document metadata inwren-core/.../scalar/json.rs; BigQueryas_array validation and error messages ininner_dialect.rs; filesystem path resolution and dialect-file selection inibis-server/app/mdl/knowledge.py; import path updates after movingutils module.

Possibly related PRs

Suggested labels

documentation

Suggested reviewers

  • douenergy
  • onlyjackfrost

Poem

🐰 I hopped through files both small and grand,

From hard-coded lore to resources on hand.
JSON whispers now sing in BigQuery's light,
Knowledge files scattered, tidy and bright.
— A rabbit's nibble on code, cheerful and light.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check nameStatusExplanationResolution
Docstring Coverage⚠️ WarningDocstring coverage is 41.67% which is insufficient. The required threshold is 80.00%.You can run@coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check nameStatusExplanation
Description Check✅ PassedCheck skipped - CodeRabbit’s high-level summary is enabled.
Title check✅ PassedThe title accurately describes the main change: introducing a comprehensive generic SQL knowledge system including knowledge rules, instructions, dialect-specific guidelines, and JSON scalar UDFs for text-to-SQL generation.

Thanks for usingCodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment@coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitaicoderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (4)
ibis-server/resources/knowledge/dialects/bigquery.txt (1)

5-7:Inconsistent INTERVAL part naming.

Line 5 uses'7' days (plural) while lines 6-7 use'1' month and'2' year (singular). Standard SQL INTERVAL syntax typically uses singular forms:INTERVAL '7' day,INTERVAL '1' month, etc. Consider standardizing to singular forms throughout for consistency with SQL standards.

-    - Example: `timestamp_with_time_zone_column + INTERVAL '7' days` is valid.+    - Example: `timestamp_with_time_zone_column + INTERVAL '7' day` is valid.
ibis-server/resources/knowledge/instructions/date_and_time_functionality.txt (1)

1-1:Minor typo: extra space in header.

There's a double space between "for" and "DATE" in the header.

-### Instructions for  DATE AND TIME FUNCTIONALITY ###+### Instructions for DATE AND TIME FUNCTIONALITY ###
ibis-server/app/mdl/knowledge.py (1)

16-26:Consider sorting files for deterministic output.

instructions_path.iterdir() does not guarantee a consistent ordering across different filesystems or Python versions. If the order of instructions matters for downstream consumers, consider sorting the files.

-        files = [f for f in instructions_path.iterdir() if f.is_file()]+        files = sorted([f for f in instructions_path.iterdir() if f.is_file()])
wren-core/core/src/mdl/function/scalar/json.rs (1)

250-268:Consider Time64 for microsecond precision.

as_time usesTime32(Millisecond) which limits precision to milliseconds. If the source data contains microsecond precision (common in JSON timestamps), this could silently truncate data. Consider usingTime64(Microsecond) for consistency withas_timestamp which usesMicrosecond precision.

 make_udf_function!(     ByPassScalarUDF::new(         "as_time",-        ReturnType::Specific(DataType::Time32(-            datafusion::arrow::datatypes::TimeUnit::Millisecond+        ReturnType::Specific(DataType::Time64(+            datafusion::arrow::datatypes::TimeUnit::Microsecond         )),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between5949b46 and1924edb.

📒 Files selected for processing (18)
  • ibis-server/app/mdl/knowledge.py (1 hunks)
  • ibis-server/resources/knowledge/dialects/bigquery.txt (1 hunks)
  • ibis-server/resources/knowledge/instructions/array_usage.txt (1 hunks)
  • ibis-server/resources/knowledge/instructions/calculated_field.txt (1 hunks)
  • ibis-server/resources/knowledge/instructions/date_and_time_functionality.txt (1 hunks)
  • ibis-server/resources/knowledge/instructions/semi_structured_type.txt (1 hunks)
  • ibis-server/resources/knowledge/instructions/structured_type.txt (1 hunks)
  • ibis-server/resources/knowledge/text_to_sql_rules.txt (1 hunks)
  • wren-core/core/src/mdl/dialect/inner_dialect.rs (1 hunks)
  • wren-core/core/src/mdl/function/dialect/bigquery/aggregate.rs (1 hunks)
  • wren-core/core/src/mdl/function/dialect/bigquery/mod.rs (2 hunks)
  • wren-core/core/src/mdl/function/dialect/bigquery/scalar.rs (1 hunks)
  • wren-core/core/src/mdl/function/dialect/bigquery/window.rs (1 hunks)
  • wren-core/core/src/mdl/function/dialect/mod.rs (0 hunks)
  • wren-core/core/src/mdl/function/mod.rs (1 hunks)
  • wren-core/core/src/mdl/function/scalar/json.rs (1 hunks)
  • wren-core/core/src/mdl/function/scalar/mod.rs (2 hunks)
  • wren-core/core/src/mdl/mod.rs (1 hunks)
💤 Files with no reviewable changes (1)
  • wren-core/core/src/mdl/function/dialect/mod.rs
🧰 Additional context used
🧬 Code graph analysis (6)
wren-core/core/src/mdl/dialect/inner_dialect.rs (1)
wren-core/core/src/mdl/dialect/utils.rs (2)
  • scalar_function_to_sql_internal (51-76)
  • args (30-48)
wren-core/core/src/mdl/function/dialect/bigquery/aggregate.rs (2)
wren-core/core/src/mdl/function/utils.rs (1)
  • build_document (3-5)
wren-core/core/src/mdl/function/remote_function.rs (1)
  • build_document (228-246)
wren-core/core/src/mdl/function/dialect/bigquery/scalar.rs (2)
wren-core/core/src/mdl/function/utils.rs (1)
  • build_document (3-5)
wren-core/core/src/mdl/function/remote_function.rs (1)
  • build_document (228-246)
wren-core/core/src/mdl/function/dialect/bigquery/window.rs (1)
wren-core/core/src/mdl/function/utils.rs (1)
  • build_document (3-5)
ibis-server/app/mdl/knowledge.py (1)
ibis-server/app/model/data_source.py (1)
  • DataSource (62-221)
wren-core/core/src/mdl/function/scalar/json.rs (2)
wren-core/core/src/mdl/function/utils.rs (1)
  • build_document (3-5)
wren-core/core/src/logical_plan/optimize/simplify_timestamp.rs (1)
  • is_timestamp (160-162)
🪛 LanguageTool
ibis-server/resources/knowledge/text_to_sql_rules.txt

[grammar] ~2-~2: Ensure spelling is correct
Context: ... CONTEXT ### Now, you aren't querying a tranditional database, but rather a semantic layer, ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~4-~4: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... with some differences and limitations. Wren SQL doesn't equal to your database SQL....

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~5-~5: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...SQL doesn't equal to your database SQL. Wren Engine translates your Wren SQL to your...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[grammar] ~6-~6: Use a hyphen to join words.
Context: ...se SQL internally. Avoid to use database specific SQL syntax and functions in you...

(QB_NEW_EN_HYPHEN)


[style] ~8-~8: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...h database is connected to the backend. You can find it in thedataSource field i...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[grammar] ~15-~15: Ensure spelling is correct
Context: ...REFER USING CTEs over subqueries. - The Identifer case sensitivity is case-sensitivity. Y...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~40-~40: Ensure spelling is correct
Context: ...status IS NOT NULL`. - If the column is indcated as the primary key or indexed column, p...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

ibis-server/resources/knowledge/instructions/calculated_field.txt

[grammar] ~41-~41: Ensure spelling is correct
Context: ...rs?" For the first query: First try to intepret the user query, the user wants to know ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~42-~42: Ensure spelling is correct
Context: ...0 reviews. Then, according to the above intepretation about the given schema, the term 'Ratin...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

ibis-server/resources/knowledge/dialects/bigquery.txt

[style] ~4-~4: To form a complete sentence, be sure to include a subject.
Context: ...database. -TIMESTAMP WITH TIME ZONE can't be+/- with INTERVAL with non-zero...

(MISSING_IT_THERE)

ibis-server/resources/knowledge/instructions/date_and_time_functionality.txt

[style] ~13-~13: To form a complete sentence, be sure to include a subject.
Context: ...en timezone is important. -TIMESTAMP can't be directly compared with `TIMESTAMP...

(MISSING_IT_THERE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: ci
  • GitHub Check: test
  • GitHub Check: ci
  • GitHub Check: cargo check
🔇 Additional comments (15)
wren-core/core/src/mdl/function/mod.rs (1)

7-7:LGTM!

Centralizing theutils module at the function level is a sensible refactor that allows sharingbuild_document across both dialect and scalar submodules.

wren-core/core/src/mdl/function/dialect/bigquery/aggregate.rs (1)

9-15:LGTM!

Import paths correctly updated to reference the centralizedbuild_document fromcrate::mdl::function::utils. The reorganization is clean and consistent with the module restructuring inmod.rs.

wren-core/core/src/mdl/function/dialect/bigquery/window.rs (1)

5-8:LGTM! Import path consolidation.

The import path adjustment forbuild_document is a clean refactoring with no functional changes to the window function definitions.

ibis-server/resources/knowledge/instructions/structured_type.txt (1)

1-5:LGTM! Clear STRUCT type documentation.

The syntax and examples for STRUCTURED data types are accurate and well-documented.

wren-core/core/src/mdl/function/dialect/bigquery/scalar.rs (1)

11-14:LGTM! Import path consolidation.

The import reorganization forbuild_document maintains existing functionality while improving code organization.

wren-core/core/src/mdl/dialect/inner_dialect.rs (3)

225-227:LGTM! Straightforward JSON_EXTRACT mapping.

Theget_path toJSON_EXTRACT rewrite is clean and correct.


250-253:LGTM! Type casting helper mappings.

The mappings tolax_* variants for type casting are appropriate for BigQuery's JSON handling.


228-249:This is a deliberate optimization, not an arbitrary constraint.

The strict requirement forAS_ARRAY to wrapGET_PATH is not a limitation but an intentional design that enables the optimizer to fuseAS_ARRAY(GET_PATH(col, path)) into BigQuery's nativeJSON_EXTRACT_ARRAY(col, path). The single test case at line 3881 ofwren-core/core/src/mdl/mod.rs confirms this is the intended and standard usage pattern. The function definition documentsas_array as converting "a json, variant or object to array," with the example showing exactly this GET_PATH pattern. This targeted transformation is appropriate for the implementation's purpose.

Likely an incorrect or invalid review comment.

wren-core/core/src/mdl/function/dialect/bigquery/mod.rs (1)

66-66:LGTM! JSON UDF integration.

The addition of JSON scalar UDFs to the BigQuery function set is well-structured and extends the dialect's capabilities appropriately.

Also applies to: 297-302

wren-core/core/src/mdl/function/scalar/mod.rs (2)

14-14:LGTM! JSON module addition.

The new JSON module provides structured organization for JSON-related scalar UDFs.


195-229:LGTM! JSON functions collection with future use noted.

Thejson_functions() provides a comprehensive set of JSON UDFs. The#[allow(dead_code)] annotation indicates this function isn't currently called but is likely reserved for future dialect integrations or bulk registration scenarios. Currently, BigQuery dialect imports specific JSON functions directly from thejson:: module.

wren-core/core/src/mdl/mod.rs (1)

3855-3887:LGTM! Well-structured test coverage for BigQuery JSON transformations.

The test correctly validates thatGET_PATH is transformed toJSON_EXTRACT andAS_ARRAY(GET_PATH(...)) is transformed toJSON_EXTRACT_ARRAY for BigQuery dialect. The test follows the established patterns in this file.

ibis-server/app/mdl/knowledge.py (1)

5-5:Relative path may be fragile depending on working directory.

TheKNOWLEDGE_RESOURCE_PATH uses a relative path, which assumes the code is always executed from theibis-server directory. If the application is run from a different working directory, the path resolution will fail.

Consider usingPath(__file__).parent.parent / "resources" / "knowledge" to make the path relative to the module's location instead of the current working directory.

Additionally,get_text_to_sql_rule() will raise aFileNotFoundError if the file doesn't exist, but there's no explicit error handling. Verify this is the intended behavior.

-KNOWLEDGE_RESOURCE_PATH = "resources/knowledge"+KNOWLEDGE_RESOURCE_PATH = Path(__file__).parent.parent / "resources" / "knowledge"

Also applies to: 13-14

wren-core/core/src/mdl/function/scalar/json.rs (2)

1-32:Well-structured JSON UDF module following Snowflake conventions.

The module header clearly documents the design choice to follow Snowflake's naming and behavior. Theget_path function is correctly defined with two string arguments and returnsUtf8. Documentation with examples is helpful.


34-50:Comprehensive type checking and casting UDFs.

Theas_* andis_* function pairs provide a complete set of operations for semi-structured data handling. The consistent use ofSignature::coercible withlogical_string() input andImmutable volatility is appropriate for these pure transformation functions.

Also applies to: 296-532

goldmedaland others added6 commitsDecember 16, 2025 10:50
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…ype.txtCo-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…ctionality.txtCo-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitaicoderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
ibis-server/resources/knowledge/instructions/date_and_time_functionality.txt (1)

5-9:Clarify quoting convention for example parts across functions.

Line 7 (EXTRACT example) shows unquoted parts (year, quarter, month, etc.) while lines 5 and 9 (DATE_TRUNC and DATE_DIFF) show quoted parts. While this reflects actual SQL syntax (EXTRACT keywords don't take quotes), it may confuse users. Consider adding a brief note clarifying which functions require quotes around the part parameter.

ibis-server/resources/knowledge/text_to_sql_rules.txt (1)

10-10:Reduce repetitive sentence structure.

Multiple successive sentences begin with "You," creating awkward rhythm and affecting readability.

Consider restructuring to improve flow:

-You should follow the rules below when generating SQL queries for Wren engine.-You should check which database is connected to the backend. You can find it in the `dataSource` field in the context. And check the corresponding instructions for the specific database if needed.+When generating SQL queries for Wren engine, follow the rules below. Check which database is connected to the backend (available in the `dataSource` field in the context) and review the corresponding instructions for that database if needed.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between1924edb and0bb6a7a.

📒 Files selected for processing (7)
  • ibis-server/resources/knowledge/instructions/array_usage.txt (1 hunks)
  • ibis-server/resources/knowledge/instructions/calculated_field.txt (1 hunks)
  • ibis-server/resources/knowledge/instructions/date_and_time_functionality.txt (1 hunks)
  • ibis-server/resources/knowledge/instructions/semi_structured_type.txt (1 hunks)
  • ibis-server/resources/knowledge/text_to_sql_rules.txt (1 hunks)
  • wren-core/core/src/mdl/dialect/inner_dialect.rs (1 hunks)
  • wren-core/core/src/mdl/lineage.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • ibis-server/resources/knowledge/instructions/calculated_field.txt
  • ibis-server/resources/knowledge/instructions/array_usage.txt
🧰 Additional context used
🧬 Code graph analysis (1)
wren-core/core/src/mdl/dialect/inner_dialect.rs (1)
wren-core/core/src/mdl/dialect/utils.rs (2)
  • scalar_function_to_sql_internal (51-76)
  • args (30-48)
🪛 LanguageTool
ibis-server/resources/knowledge/instructions/date_and_time_functionality.txt

[style] ~13-~13: To form a complete sentence, be sure to include a subject.
Context: ...en timezone is important. -TIMESTAMP can't be directly compared with `TIMESTAMP...

(MISSING_IT_THERE)

ibis-server/resources/knowledge/text_to_sql_rules.txt

[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...SQL doesn't equal to your database SQL. Wren Engine translates your Wren SQL to your...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[grammar] ~8-~8: Use a hyphen to join words.
Context: ...se SQL internally. Avoid to use database specific SQL syntax and functions in you...

(QB_NEW_EN_HYPHEN)


[style] ~10-~10: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...h database is connected to the backend. You can find it in thedataSource field i...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: ci
  • GitHub Check: test
  • GitHub Check: cargo check
  • GitHub Check: ci
🔇 Additional comments (6)
ibis-server/resources/knowledge/instructions/semi_structured_type.txt (1)

1-40:Well-structured knowledge document with past issues resolved.

The file provides comprehensive guidance for SEMI-STRUCTURED data types and correctly addresses the two issues flagged in previous reviews:

  • Grammar has been corrected to "SEMI-STRUCTURED data types don't have" (line 6)
  • Table name example correctly uses "users" throughout (line 40)

The instructions are clear, well-organized, and align with the Snowflake-style JSON handling introduced across the codebase. The coverage of GET_PATH, type casting functions (AS_VARCHAR, AS_INTEGER, etc.), json_type metadata rules, and UNNEST examples provides practical guidance for SQL generation on semi-structured data.

ibis-server/resources/knowledge/instructions/date_and_time_functionality.txt (1)

1-20:Well-structured knowledge file for date/time operations.

The file provides clear, practical guidance on date/time functionality with good coverage of truncation, extraction, date arithmetic, timezone handling, and epoch time conversions. Structure and examples are easy to follow.

wren-core/core/src/mdl/dialect/inner_dialect.rs (3)

225-227:LGTM: Clean mapping to BigQuery JSON_EXTRACT.

Theget_path toJSON_EXTRACT rewrite is straightforward and aligns with BigQuery's JSON extraction capabilities.


248-259:Type conversion mappings to BigQuery LAX functions are correct.

The mappings use BigQuery's LAX_STRING, LAX_INT64, LAX_FLOAT64, and LAX_BOOL functions, which handle conversions between mismatched JSON data types. The implementation correctly mapsas_varcharlax_string,as_integerlax_int64,as_doublelax_float64, andas_booleanlax_bool.


228-247:Add a code comment explaining the AS_ARRAY composition pattern.

The implementation intentionally requiresAS_ARRAY to wrap aGET_PATH function call, which it then unwraps to invokeJSON_EXTRACT_ARRAY with the inner arguments. This composition pattern is confirmed by the test case (AS_ARRAY(GET_PATH(json_col, '$.field')) transpiles toJSON_EXTRACT_ARRAY) and documented in the function signature example. Consider adding a code comment explaining why this composition is required in Wren's MDL—for instance, whether it clarifies intent, maintains consistency with the MDL type system, or reflects limitations of the transpiler.

wren-core/core/src/mdl/lineage.rs (1)

167-167:LGTM: Correct reference passing through iterator.

The code correctly passesident (a reference from.iter() onBTreeSet<Column>) toqualify_name_from_column_name, which expects&Column. This is type-safe and avoids unnecessary allocations. Rust's compile-time checks ensure correctness.

However, this change appears unrelated to the PR objectives (introducing generic SQL knowledge and semi-structured type operations). If this is incidental cleanup, consider moving it to a separate refactoring PR.

Copy link
Contributor

@coderabbitaicoderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
ibis-server/resources/knowledge/text_to_sql_rules.txt (3)

4-4:Fix grammatical error in statement (previously flagged).

Line 4 still contains the same grammatical error from the previous review: "No use the reference named" is incorrect phrasing.

Apply this diff:

-You should only use the names of models, views and columns defined in the MDL to generate SQL queries. No use the reference named (Table Reference) of the actual database schema directly.+You should only use the names of models, views and columns defined in the MDL to generate SQL queries. Do not use the reference names (Table Reference) of the actual database schema directly.

7-8:Fix phrasing and hyphenation issues (previously flagged).

These lines remain unfixed from the previous review. Line 7 has repetitive sentence structure, and Line 8 lacks proper hyphenation for the compound adjective.

Apply this diff:

-Wren SQL doesn't equal to your database SQL. Wren Engine translates your Wren SQL to your database SQL internally.-Avoid to use database specific SQL syntax and functions in your Wren SQL.+Wren SQL differs from your database SQL. Wren Engine translates your Wren SQL to your database SQL internally.+Avoid using database-specific SQL syntax and functions in your Wren SQL.

17-17:Clarify redundant statement about identifier case sensitivity (previously flagged).

Line 17 remains unfixed from the previous review. The phrase "The Identifier case sensitivity is case-sensitivity" is tautological and unclear.

Apply this diff:

-- The Identifier case sensitivity is case-sensitivity. You need to quote the identifier if it contains unicode characters, special characters (except _), or starts with a digit.+- Identifiers are case-sensitive. You must quote identifiers that contain unicode characters, special characters (except _), or start with a digit.
🧹 Nitpick comments (1)
ibis-server/resources/knowledge/text_to_sql_rules.txt (1)

10-10:Fix repetitive sentence structure.

Three successive sentences begin with "You," which weakens the clarity of the guidance.

Apply this diff:

-You should check which database is connected to the backend. You can find it in the `dataSource` field in the context. And check the corresponding instructions for the specific database if needed.+You should check which database is connected to the backend. The `dataSource` field in the context shows this information. Check the corresponding instructions for the specific database if needed.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between0bb6a7a and548d4eb.

📒 Files selected for processing (1)
  • ibis-server/resources/knowledge/text_to_sql_rules.txt (1 hunks)
🧰 Additional context used
🪛 LanguageTool
ibis-server/resources/knowledge/text_to_sql_rules.txt

[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...SQL doesn't equal to your database SQL. Wren Engine translates your Wren SQL to your...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[grammar] ~8-~8: Use a hyphen to join words.
Context: ...se SQL internally. Avoid to use database specific SQL syntax and functions in you...

(QB_NEW_EN_HYPHEN)


[style] ~10-~10: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...h database is connected to the backend. You can find it in thedataSource field i...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: ci
  • GitHub Check: ci
  • GitHub Check: cargo check
  • GitHub Check: test

goldmedaland others added4 commitsDecember 16, 2025 15:39
…ctionality.txtCo-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitaicoderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ibis-server/app/routers/v3/connector.py (1)

542-550:Thesql_correction_rule method lacks explicit error handling for missing files.

Theget_sql_correction_rule() method in the Knowledge class callsPath.read_text() without error handling. While resource files are packaged in the repository and guaranteed to exist, explicit error handling would improve robustness. Additionally, the endpoint currently relies on the global exception handler (which returns a generic 500 error) rather than providing specific error handling at the method level. Consider adding try-except blocks to catchFileNotFoundError and return a more informative error response, or document why file existence is guaranteed.

🧹 Nitpick comments (2)
ibis-server/resources/knowledge/instructions/date_and_time_functionality.txt (1)

5-5:Normalize quoting of example parts across functions.

The example parts in DATE_TRUNC (line 5) and DATE_DIFF (line 9) are quoted, but EXTRACT (line 7) uses unquoted parts. For consistency in the knowledge base, either quote all parts uniformly or leave all unquoted.

Apply this diff to quote EXTRACT parts:

- Example parts: year, quarter, month, week, day, hour, minute, second+ Example parts: 'year', 'quarter', 'month', 'week', 'day', 'hour', 'minute', 'second'

Also applies to: 7-7, 9-9

ibis-server/app/mdl/knowledge.py (1)

13-13:Consider using Path.joinpath for cleaner path construction.

While f-strings work, usingPath.joinpath() or the/ operator is more idiomatic and portable.

Example refactor:

# Instead of:rules_path=Path(f"{KNOWLEDGE_RESOURCE_PATH}/text_to_sql_rule.txt")# Consider:rules_path=Path(KNOWLEDGE_RESOURCE_PATH)/"text_to_sql_rule.txt"

Also applies to: 21-22

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between548d4eb and176281e.

📒 Files selected for processing (5)
  • ibis-server/app/mdl/knowledge.py (1 hunks)
  • ibis-server/app/routers/v3/connector.py (1 hunks)
  • ibis-server/resources/knowledge/instructions/date_and_time_functionality.txt (1 hunks)
  • ibis-server/resources/knowledge/sql_correction_rule.txt (1 hunks)
  • ibis-server/resources/knowledge/text_to_sql_rule.txt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
ibis-server/app/routers/v3/connector.py (1)
ibis-server/app/mdl/knowledge.py (1)
  • get_sql_correction_rule (28-30)
ibis-server/app/mdl/knowledge.py (1)
ibis-server/app/model/data_source.py (1)
  • DataSource (62-221)
🪛 LanguageTool
ibis-server/resources/knowledge/sql_correction_rule.txt

[style] ~43-~43: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...an internal bug of Wren engine. You may need to report the issue to the Wren support te...

(REP_NEED_TO_VB)


[style] ~44-~44: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... error is from the execution stage, You need to check the Dialect SQL generated by Wren...

(REP_NEED_TO_VB)


[style] ~45-~45: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... those unsupported features. If a not-found table or columns error occu...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~45-~45: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...stage, check the location of the error. If it occur the source subquery (typically...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~47-~47: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ated issues or runtime constraints. You need to analyze the execution context and data ...

(REP_NEED_TO_VB)


[style] ~48-~48: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...alues, or data type mismatches. You may need to check the sample data in the target dat...

(REP_NEED_TO_VB)


[style] ~49-~49: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ze of the data being processed. You may need to optimize the Wren SQL to reduce its com...

(REP_NEED_TO_VB)


[style] ~51-~51: Consider using a different verb for a more formal wording.
Context: ...dentified error, modify the Wren SQL to correct the issues. This may involve changing t...

(FIX_RESOLVE)


[grammar] ~52-~52: Ensure spelling is correct
Context: ...arget database. 2. You can only modifiy the Wren SQL directly. You are not allo...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~61-~61: Ensure spelling is correct
Context: ... to check their database performance or simpilify their question.

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

ibis-server/resources/knowledge/text_to_sql_rule.txt

[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ren SQL differs from your database SQL. Wren Engine translates your Wren SQL to your...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~10-~10: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...h database is connected to the backend. You can find it in thedataSource field i...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

ibis-server/resources/knowledge/instructions/date_and_time_functionality.txt

[style] ~13-~13: To form a complete sentence, be sure to include a subject.
Context: ...en timezone is important. -TIMESTAMP can't be directly compared with `TIMESTAMP...

(MISSING_IT_THERE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: cargo check
  • GitHub Check: ci
  • GitHub Check: ci
  • GitHub Check: test
🔇 Additional comments (1)
ibis-server/resources/knowledge/text_to_sql_rule.txt (1)

1-75:Well-structured SQL generation guidance.

The documentation provides comprehensive rules for generating Wren SQL, covering context, rules, filtering, data types, aggregation, and subquery patterns.

Copy link
Contributor

@coderabbitaicoderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (4)
ibis-server/resources/knowledge/sql_correction_rule.txt (2)

52-52:Fix spelling error (previously flagged).

"modifiy" should be "modify".

Apply this diff:

-        2. You can only modify the Wren SQL directly. You are not allowed to modify the Planned SQL or Dialect SQL generated by Wren engine. So, you need to think about how to change the Wren SQL to achieve the desired Planned SQL or Dialect SQL.+        2. You can only modify the Wren SQL directly. You are not allowed to modify the Planned SQL or Dialect SQL generated by Wren engine. So, you need to think about how to change the Wren SQL to achieve the desired Planned SQL or Dialect SQL.

61-61:Fix spelling error (previously flagged).

"simpilify" should be "simplify".

Apply this diff:

-        3. If no alternative and a timeout or resource limit error occurs, report the issue to the user as an EXTERNAL ERROR, not a BUG of Wren engine. You may suggest the user to check their database performance or simplify their question.+        3. If no alternative and a timeout or resource limit error occurs, report the issue to the user as an EXTERNAL ERROR, not a BUG of Wren engine. You may suggest the user to check their database performance or simplify their question.
ibis-server/app/mdl/knowledge.py (2)

37-43:Add error handling for file read operations.

Similar toget_text_to_sql_rule, line 43 needs error handling for potential read failures beyond file existence.

Apply this diff:

 def get_sql_correction_rule(self) -> str:     rules_path = Path(f"{KNOWLEDGE_RESOURCE_PATH}/sql_correction_rule.txt")     if not rules_path.exists():         raise WrenError(             ErrorCode.GENERIC_INTERNAL_ERROR, "SQL correction rule not found."         )-    return rules_path.read_text()+    try:+        return rules_path.read_text()+    except (PermissionError, OSError) as e:+        raise WrenError(+            ErrorCode.GENERIC_INTERNAL_ERROR,+            f"Failed to read SQL correction rule: {str(e)}"+        )

21-35:Handle individual file read errors in dictionary comprehension.

The dictionary comprehension on line 35 callsfile.read_text() for each file without error handling. If any single file is unreadable (due to permissions, encoding issues, etc.), the entire method fails. As noted in past reviews, this can crash the knowledge system unnecessarily.

Apply this diff to gracefully skip unreadable files:

-        return {file.stem: file.read_text() for file in files}+        result = {}+        for file in files:+            try:+                result[file.stem] = file.read_text()+            except (PermissionError, OSError, UnicodeDecodeError) as e:+                # Log warning and skip unreadable files+                # Consider adding logging here+                continue+        return result
🧹 Nitpick comments (1)
ibis-server/resources/knowledge/sql_correction_rule.txt (1)

38-62:Consider varying sentence structure for improved readability.

The static analysis tool suggests reducing repetitive phrasing (e.g., "You need to", "You may need to") and avoiding consecutive sentences starting with "If". While the content is clear, varying the sentence structure could enhance readability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between176281e and06d3768.

📒 Files selected for processing (2)
  • ibis-server/app/mdl/knowledge.py (1 hunks)
  • ibis-server/resources/knowledge/sql_correction_rule.txt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/app/mdl/knowledge.py (2)
ibis-server/app/model/data_source.py (1)
  • DataSource (62-221)
ibis-server/app/model/error.py (2)
  • ErrorCode (19-38)
  • WrenError (65-113)
🪛 LanguageTool
ibis-server/resources/knowledge/sql_correction_rule.txt

[style] ~43-~43: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...an internal bug of Wren engine. You may need to report the issue to the Wren support te...

(REP_NEED_TO_VB)


[style] ~44-~44: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... error is from the execution stage, You need to check the Dialect SQL generated by Wren...

(REP_NEED_TO_VB)


[style] ~45-~45: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... those unsupported features. If a not-found table or columns error occu...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~45-~45: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...stage, check the location of the error. If it occur the source subquery (typically...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~47-~47: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ated issues or runtime constraints. You need to analyze the execution context and data ...

(REP_NEED_TO_VB)


[style] ~48-~48: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...alues, or data type mismatches. You may need to check the sample data in the target dat...

(REP_NEED_TO_VB)


[style] ~49-~49: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ze of the data being processed. You may need to optimize the Wren SQL to reduce its com...

(REP_NEED_TO_VB)


[style] ~51-~51: Consider using a different verb for a more formal wording.
Context: ...dentified error, modify the Wren SQL to correct the issues. This may involve changing t...

(FIX_RESOLVE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: ci
  • GitHub Check: test
  • GitHub Check: cargo check
  • GitHub Check: ci

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
goldmedaland others added5 commitsDecember 16, 2025 16:40
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@goldmedal
Copy link
ContributorAuthor

CI failures are flaky. Let's ignore them

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@coderabbitaicoderabbitai[bot]coderabbitai[bot] left review comments

@douenergydouenergyAwaiting requested review from douenergy

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

bigquerycoreibispythonPull requests that update Python coderustPull requests that update Rust code

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

@goldmedal

[8]ページ先頭

©2009-2025 Movatter.jp