- Notifications
You must be signed in to change notification settings - Fork126
Description
As we all know, Databricks has aVARIANT data type that is more performant than standard JSON strings for nested data. However, I don't understand how I am supposed to insert into such a column using databricks-sql-connector==4.0.5 though.
If you want to write to aPostgreSQL JSON column in psycopg2, you use json dump. This trick did not seem to work here. I also tried custom escaped JSON and simply dict, but also no.
importdatabricks.sqlimportjson# Replace with your Databricks SQL warehouse detailsserver_hostname="<SERVER_HOSTNAME>"http_path="<HTTP_PATH>"access_token="<ACCESS_TOKEN>"# Example data to insertcontent= {"age":29,"city":"New York"}data= [ (1,"Alice",json.dumps(content)),2,"Bob",None),# (2, "Bob", data) # Raises error (3,"Charlie","""{"age": 29, "city": "New York"}"""),]# Connect to Databricks SQLwithdatabricks.sql.connect(server_hostname=server_hostname,http_path=http_path,access_token=access_token)asconnection:withconnection.cursor()ascursor:# Insert data into Delta tablecursor.execute("CREATE OR REPLACE TABLE test_catalog.ad_hoc.variant_test (id INT, name STRING, content VARIANT) USING DELTA")cursor.executemany("INSERT INTO prod_catalog.ad_hoc.variant_test (id, name, content) VALUES (?, ?, ?)",data )
If I query the resulting table,
SELECT*, content:age, parse_json("{\"age\": 29, \"city\": \"New York\"}")FROMad_hoc.variant_testORDER BY id;
1 and 3 succeeds, but it seems to end up as a "flat"STRING (schema_of_variant(content)) instead ofOBJECT<age: BIGINT, city: STRING>.

Attempt 2 (insert a dict) raises the error
databricks.sql.exc.ServerOperationError: [DATATYPE_MISMATCH.CAST_WITHOUT_SUGGESTION] Cannot resolve "content" due to data type mismatch: cannot cast "MAP<VOID, VOID>" to "VARIANT". SQLSTATE: 42K09; line 1 pos 0
Could we add documentation for how this is supposed to work? I guess MAP, STRUCT and VARIANT all map to Python type dict in some sense. It is only that VARIANT don't have a predefined schema like MAP or STRUCT, and unlike MAP can be nested?