- Notifications
You must be signed in to change notification settings - Fork138
feat: STRUCT and ARRAY support#318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
jimfulton commentedAug 31, 2021
FTR, WRT superset, once I finally got it working :), it behaves the same with and without these changes. BTW, we have logic that tries to unpack sub-structs, I think so that there would eventually be scalars for superset to work with. If you have an array of structs, we still create columns for the fields of the struct in the array. This causes superset to error, because it has no way to get at structs in an array. We should probably not unpack structs in arrays. |
Otherwise, the BQ doesn't handle arrays of structs.
snippet-botbot commentedSep 1, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Here is the summary of changes. You are about to add 10 region tags.
This comment is generated bysnippet-bot.
|
I want it narrow to avoid horizonal scrolling
jimfulton commentedSep 2, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Some notes for reviewers:
|
tswast left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I haven't quite digested everything in this PR yet, but I figured I'd share the feedback I have so far.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Tim Swast <swast@google.com>
tswast left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I like it! Just a few things I think we should clarify before merging.
Uh oh!
There was an error while loading.Please reload this page.
sqlalchemy_bigquery/_struct.py Outdated
| globaltype_compiler | ||
| try: | ||
| process=type_compiler.process | ||
| exceptAttributeError: | ||
| type_compiler=base.dialect.type_compiler(base.dialect()) | ||
| process=type_compiler.process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Could we put this in a_get_type_compiler /_get_process function? I don't see anywhere else we initializetype_compiler, but I'd be more comfortable having this logic closer to the# We have to delay getting the type compiler, because of circular imports. :( comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I refactored so this is combined and isolated in one place using a new, better named_get_subtype_col_spec function.
sqlalchemy_bigquery/_struct.py Outdated
| type_compiler=base.dialect.type_compiler(base.dialect()) | ||
| process=type_compiler.process | ||
| fields=", ".join(f"{name}{process(type_)}"forname,type_inself.__fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I assumeprocess is able to handle nested arrays/structs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
yes
Uh oh!
There was an error while loading.Please reload this page.
sqlalchemy_bigquery/_struct.py Outdated
| f"STRUCT fields can only be accessed with strings field names," | ||
| f" not{name}." | ||
| ) | ||
| subtype=self.expr.type._STRUCT__byname.get(name.lower()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Where does_STRUCT__byname come from? I'm assuming somewhere from SQLAlchemy, but I'm not getting any results when searching forbyname.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Oh, I think I figured it out:https://docs.python.org/3/tutorial/classes.html#private-variables
Any identifier of the form
__spam(at least two leading underscores, at most one trailing underscore) is textually replaced with_classname__spam, where classname is the current class name with leading underscore(s) stripped.
Can we comment about this? I assume we have to do it because we knowself.expr.type is aSTRUCT, but it's notself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I refactored this to make name mangling more explicit and consistent, so I don't think comments are needed anymore. See if you agree. :)
I mainly use "private" variables, which aren't :), to avoid namespace conflicts when subclassing across responsibility boundaries. Arguably, explicit naming is better.
sqlalchemy_bigquery/_struct.py Outdated
| returnoperator,index,subtype | ||
| def__getattr__(self,name): | ||
| ifname.lower()inself.expr.type._STRUCT__byname: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm a bit confused whyself.__byname doesn't work in this case.
Edit: I see now that it's part of theComparator class. Still probably worth a similar comment to the one I recommend in_setup_getitem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
See my response on name mangling
Uh oh!
There was an error while loading.Please reload this page.
sqlalchemy_bigquery/_types.py Outdated
| forfinfield.fields | ||
| ] | ||
| results+=_get_transitive_schema_fields(sub_fields,cur_fields) | ||
| cur_fields.pop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Since we pop these off, does that mean we don't get the top-level struct field, just the leaf fields? I suspect this might hide some ARRAY columns if a parent node has modeREPEATED, but is not included.
Edit: I see the top field is added to results on line 83. Might be worth a comment as to why we pop here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I got rid of cur_fields. It wasn't needed (anymore).
sqlalchemy_bigquery/_types.py Outdated
| forfieldinfields: | ||
| results+= [field] | ||
| iffield.field_type=="RECORD": | ||
| cur_fields.append(field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't quite understand whatcur_fields is doing. Is there a better name we can pick for this? Maybe it's referring toancestors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Haha, it's not doing anything.
This is based on
python-bigquery-sqlalchemy/pybigquery/sqlalchemy_bigquery.py
Lines 503 to 523 indb64424
| def_get_columns_helper(self,columns,cur_columns): | |
| """ | |
| Recurse into record type and return all the nested field names. | |
| As contributed by @sumedhsakdeo on issue #17 | |
| """ | |
| results= [] | |
| forcolincolumns: | |
| results+= [ | |
| SchemaField( | |
| name=".".join(col.nameforcolincur_columns+ [col]), | |
| field_type=col.field_type, | |
| mode=col.mode, | |
| description=col.description, | |
| fields=col.fields, | |
| ) | |
| ] | |
| ifcol.field_type=="RECORD": | |
| cur_columns.append(col) | |
| results+=self._get_columns_helper(col.fields,cur_columns) | |
| cur_columns.pop() | |
| returnresults |
I've refactored it quite a bit and failed to notice that this wasn't needed any more. Fixed.
tests/unit/test__struct.py Outdated
| (_col().NAME,"`t`.`person`.NAME"), | ||
| (_col().children,"`t`.`person`.children"), | ||
| ( | ||
| _col().children[0].label("anon_1"),# SQLAlchemy doesn't add the label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Should we file an issue for this to investigate later? If so, let's addTODO and link to the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Done
…Alchemy 1.3 and 1/4
Also, check for both RECORD and STRUCT fild types, in case the APIever starts returning STRUCT.
…accessing array items
Co-authored-by: Tim Swast <swast@google.com>
| global_get_subtype_col_spec | ||
| type_compiler=base.dialect.type_compiler(base.dialect()) | ||
| _get_subtype_col_spec=type_compiler.process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Fancy! I didn't realize a function could replace itself. I like it.
🤖 I have created a release \*beep\* \*boop\*---## [1.2.0](https://www.github.com/googleapis/python-bigquery-sqlalchemy/compare/v1.1.0...v1.2.0) (2021-09-09)### Features* STRUCT and ARRAY support ([#318](https://www.github.com/googleapis/python-bigquery-sqlalchemy/issues/318)) ([6624b10](https://www.github.com/googleapis/python-bigquery-sqlalchemy/commit/6624b10ded73bbca6f40af73aaeaceb95c381b63))### Bug Fixes* the unnest function lost needed type information ([#298](https://www.github.com/googleapis/python-bigquery-sqlalchemy/issues/298)) ([1233182](https://www.github.com/googleapis/python-bigquery-sqlalchemy/commit/123318269876e7f76c7f0f2daa5f5b365026cd3f))---This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
| defvisit_getitem_binary(self,binary,operator_,**kw): | ||
| left=self.process(binary.left,**kw) | ||
| right=self.process(binary.right,**kw) | ||
| returnf"{left}[OFFSET({right})]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
SAFE_OFFSET support would be nice too
Uh oh!
There was an error while loading.Please reload this page.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes#293
Fixes#314
Fixed#233
Fixes#37
🦕