- Notifications
You must be signed in to change notification settings - Fork2.7k
Addstruct_values function to return field values of STRUCT#19968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
struct_values function to return field values of STRUCT as Unna…struct_values function to return field values of STRUCTThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks! It looks great already, I don't see any glaring mistakes, just had a few comments
| } else { | ||
| result.SetVectorType(VectorType::FLAT_VECTOR); | ||
| // Make result validity to mirror input's nulls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We can use AllValid on the input to determine if the validity is entirely clean, in which case we can avoid this step of setting the result validity (and we also don't need the EnsureWritable() call in that case)
| // Ensure input is a STRUCT, set return type to an unnamed STRUCT with same child types | ||
| static unique_ptr<FunctionData> StructValuesBind(ClientContext &context, ScalarFunction &bound_function, | ||
| vector<unique_ptr<Expression>> &arguments) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is there a test with a prepared parameter? If there isn't, can we add one? I suspect there's logic missing here to handle that (checking for UNKNOWN type in the input - see other bind methods)
| ScalarFunction StructValuesFun::GetFunction() { | ||
| ScalarFunction func({LogicalType::ANY}, LogicalTypeId::STRUCT, StructValuesFunction, StructValuesBind); | ||
| func.SetNullHandling(FunctionNullHandling::SPECIAL_HANDLING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can you make the case for using SPECIAL_HANDLING?
Perhaps have a look at the comments on this enum, because I think this resembles the "default" handling
| # Constant vector as input (same result for every row) | ||
| statement ok | ||
| CREATE TABLE t_struct_values_constant(x INTEGER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm not really following the creation of this table? The best way to test constant vectors is to not even use a table.
Easiest isselect <constant expression> from range(3)
| statement ok | ||
| INSERT INTO t_struct_values_flat VALUES | ||
| (ROW(1, 'x')::STRUCT(a INT, b VARCHAR), 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
You can remove the cast, should be added by the insert already, makes the test a bit more readable :)
I'm even pretty sure the use of ROW can be removed as well, just(1, 'x') should be enough to create the ROW value and let it cast to the STRUCT type of the column
| static void StructValuesFunction(DataChunk &args, ExpressionState &state, Vector &result) { | ||
| D_ASSERT(args.ColumnCount() == 1); | ||
| auto &input = args.data[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Since we accept unnamed structs, we could have an early out for that case, by detecting theStructType::IsUnnamed(input.GetType()) and just directly referencing the input
| input.ToUnifiedFormat(count, input_data); | ||
| auto &validity = FlatVector::Validity(result); | ||
| validity.EnsureWritable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think we don't need this,SetInvalid will ensure this already
In the scenario where the input rows turn out to all be valid, if we use this,AllValid() will not be true for the result, because it checks whether EnsureWritable was ever called (explicitly or through methods likeSetInvalid)
So we would be missing out on fast paths in downstream function calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Happy to elaborate if this is too vague of an explanation btw 👍
maiadegraaf left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks, looking good already! Just a few comments from my side.
| if (arguments[0]->return_type.id() != LogicalTypeId::STRUCT) { | ||
| throw InvalidInputException("struct_values() expects a STRUCT argument"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Out of curriosity what happens if you don't check this? I think it should be picked up by the binder.
Eg,struct_contains doesn't explicitly check for structs but still throws an error if a non-struct is passed as a parameter.
Dselect struct_contains(1);Binder Error:No function matches the given nameand argument types'struct_contains(INTEGER_LITERAL)'. You might need to add explicit type casts. Candidate functions: struct_contains(STRUCT, ANY)->BOOLEANLINE1:select struct_contains(1); ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Could we add some more tests?
- struct with a null value (
{a: NULL, b: 2}) - struct with a nested struct
- nested
struct_valuecalls
This PR adds a new scalar function
struct_valuesthat complements the existingstruct_keysfunction. Whilestruct_keysreturns the names of a struct's fields,struct_valuesreturns the corresponding values as an unnamed struct.Example