Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: STRUCT and ARRAY support#318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
jimfulton merged 37 commits intogoogleapis:mainfromjimfulton:struct
Sep 9, 2021
Merged

Conversation

@jimfulton
Copy link
Contributor

@jimfultonjimfulton commentedAug 30, 2021
edited
Loading

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as abug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes#293
Fixes#314
Fixed#233
Fixes#37
🦕

@product-auto-labelproduct-auto-labelbot added the api: bigqueryIssues related to the googleapis/python-bigquery-sqlalchemy API. labelAug 30, 2021
@google-clagoogle-clabot added the cla: yesThis human has signed the Contributor License Agreement. labelAug 30, 2021
@jimfulton
Copy link
ContributorAuthor

FTR, WRT superset, once I finally got it working :), it behaves the same with and without these changes.

BTW, we have logic that tries to unpack sub-structs, I think so that there would eventually be scalars for superset to work with.

If you have an array of structs, we still create columns for the fields of the struct in the array. This causes superset to error, because it has no way to get at structs in an array. We should probably not unpack structs in arrays.

@snippet-bot
Copy link

snippet-botbot commentedSep 1, 2021
edited
Loading

Here is the summary of changes.

You are about to add 10 region tags.

This comment is generated bysnippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, addsnippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@jimfultonjimfulton marked this pull request as ready for reviewSeptember 2, 2021 13:30
@jimfultonjimfulton requested review froma team ascode ownersSeptember 2, 2021 13:30
@jimfulton
Copy link
ContributorAuthor

jimfulton commentedSep 2, 2021
edited
Loading

Some notes for reviewers:

  • Heart of change is_struct.py, which isn't large, but also isn't obvious. :( I cribbed from the built-in JSON and ARRAY types. When reviewing, it's probably helpful to look at those. The "Comparator" framework is confusing, in large part because the name doesn't make sense.

    Having said that, the core logic is in_setop_getitem, which is a hook used by the base class ofSTRUCTs comparator.

    The__getattr__ method just delegates to (the inherited)__getitem__.

  • This PR also has 2 other small changes:

    • Machinery for mapping BQ types to SQLAlchemy types has been factored into a separate_types module, both to avoid clutteringbase more and to partially avoid circular imports. (There's still a circular import issue that isn't fixable without a bigger refactoring that I deemed unwarranted.)
    • Implementation of ARRAY indexing, which wasn't implemented. A number of my tests (seetest__struct.py in both unit and system tests) used an example that has nested structs and arrays.

@tswasttswast self-requested a reviewSeptember 7, 2021 20:03
Copy link
Collaborator

@tswasttswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I haven't quite digested everything in this PR yet, but I figured I'd share the feedback I have so far.

Copy link
Collaborator

@tswasttswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I like it! Just a few things I think we should clarify before merging.

Comment on lines 73 to 79
globaltype_compiler

try:
process=type_compiler.process
exceptAttributeError:
type_compiler=base.dialect.type_compiler(base.dialect())
process=type_compiler.process
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Could we put this in a_get_type_compiler /_get_process function? I don't see anywhere else we initializetype_compiler, but I'd be more comfortable having this logic closer to the# We have to delay getting the type compiler, because of circular imports. :( comment.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I refactored so this is combined and isolated in one place using a new, better named_get_subtype_col_spec function.

type_compiler=base.dialect.type_compiler(base.dialect())
process=type_compiler.process

fields=", ".join(f"{name}{process(type_)}"forname,type_inself.__fields)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I assumeprocess is able to handle nested arrays/structs?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

yes

f"STRUCT fields can only be accessed with strings field names,"
f" not{name}."
)
subtype=self.expr.type._STRUCT__byname.get(name.lower())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Where does_STRUCT__byname come from? I'm assuming somewhere from SQLAlchemy, but I'm not getting any results when searching forbyname.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Oh, I think I figured it out:https://docs.python.org/3/tutorial/classes.html#private-variables

Any identifier of the form__spam (at least two leading underscores, at most one trailing underscore) is textually replaced with_classname__spam, where classname is the current class name with leading underscore(s) stripped.

Can we comment about this? I assume we have to do it because we knowself.expr.type is aSTRUCT, but it's notself.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I refactored this to make name mangling more explicit and consistent, so I don't think comments are needed anymore. See if you agree. :)

I mainly use "private" variables, which aren't :), to avoid namespace conflicts when subclassing across responsibility boundaries. Arguably, explicit naming is better.

returnoperator,index,subtype

def__getattr__(self,name):
ifname.lower()inself.expr.type._STRUCT__byname:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm a bit confused whyself.__byname doesn't work in this case.

Edit: I see now that it's part of theComparator class. Still probably worth a similar comment to the one I recommend in_setup_getitem

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

See my response on name mangling

forfinfield.fields
]
results+=_get_transitive_schema_fields(sub_fields,cur_fields)
cur_fields.pop()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Since we pop these off, does that mean we don't get the top-level struct field, just the leaf fields? I suspect this might hide some ARRAY columns if a parent node has modeREPEATED, but is not included.

Edit: I see the top field is added to results on line 83. Might be worth a comment as to why we pop here.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I got rid of cur_fields. It wasn't needed (anymore).

forfieldinfields:
results+= [field]
iffield.field_type=="RECORD":
cur_fields.append(field)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't quite understand whatcur_fields is doing. Is there a better name we can pick for this? Maybe it's referring toancestors?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Haha, it's not doing anything.

This is based on

def_get_columns_helper(self,columns,cur_columns):
"""
Recurse into record type and return all the nested field names.
As contributed by @sumedhsakdeo on issue #17
"""
results= []
forcolincolumns:
results+= [
SchemaField(
name=".".join(col.nameforcolincur_columns+ [col]),
field_type=col.field_type,
mode=col.mode,
description=col.description,
fields=col.fields,
)
]
ifcol.field_type=="RECORD":
cur_columns.append(col)
results+=self._get_columns_helper(col.fields,cur_columns)
cur_columns.pop()
returnresults
, which I inherited.

I've refactored it quite a bit and failed to notice that this wasn't needed any more. Fixed.

(_col().NAME,"`t`.`person`.NAME"),
(_col().children,"`t`.`person`.children"),
(
_col().children[0].label("anon_1"),# SQLAlchemy doesn't add the label
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Should we file an issue for this to investigate later? If so, let's addTODO and link to the issue.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done

global_get_subtype_col_spec

type_compiler=base.dialect.type_compiler(base.dialect())
_get_subtype_col_spec=type_compiler.process
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Fancy! I didn't realize a function could replace itself. I like it.

@jimfultonjimfulton merged commit6624b10 intogoogleapis:mainSep 9, 2021
@jimfultonjimfulton deleted the struct branchSeptember 9, 2021 15:50
gcf-merge-on-greenbot pushed a commit that referenced this pull requestSep 9, 2021
🤖 I have created a release \*beep\* \*boop\*---## [1.2.0](https://www.github.com/googleapis/python-bigquery-sqlalchemy/compare/v1.1.0...v1.2.0) (2021-09-09)### Features* STRUCT and ARRAY support ([#318](https://www.github.com/googleapis/python-bigquery-sqlalchemy/issues/318)) ([6624b10](https://www.github.com/googleapis/python-bigquery-sqlalchemy/commit/6624b10ded73bbca6f40af73aaeaceb95c381b63))### Bug Fixes* the unnest function lost needed type information ([#298](https://www.github.com/googleapis/python-bigquery-sqlalchemy/issues/298)) ([1233182](https://www.github.com/googleapis/python-bigquery-sqlalchemy/commit/123318269876e7f76c7f0f2daa5f5b365026cd3f))---This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
defvisit_getitem_binary(self,binary,operator_,**kw):
left=self.process(binary.left,**kw)
right=self.process(binary.right,**kw)
returnf"{left}[OFFSET({right})]"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

SAFE_OFFSET support would be nice too

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@tswasttswasttswast approved these changes

@tmatsuotmatsuoAwaiting requested review from tmatsuo

+1 more reviewer

@AnastasiaTamazlykarAnastasiaTamazlykarAnastasiaTamazlykar left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

api: bigqueryIssues related to the googleapis/python-bigquery-sqlalchemy API.cla: yesThis human has signed the Contributor License Agreement.

Projects

None yet

Milestone

No milestone

3 participants

@jimfulton@tswast@AnastasiaTamazlykar

[8]ページ先頭

©2009-2025 Movatter.jp