- Notifications
You must be signed in to change notification settings - Fork1.2k
more extractor#2274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
more extractor#2274
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This is the bare minimum implementation of this idea.
adding benchmark to extractor
extractor PR with clangcl tweaks
the-moisrex commentedOct 10, 2024
:100644 100644 33973a51 00000000 Minclude/simdjson/generic/ondemand/object-inl.hdiff --git a/include/simdjson/generic/ondemand/object-inl.h b/include/simdjson/generic/ondemand/object-inl.hindex 33973a51..69863b9a 100644--- a/include/simdjson/generic/ondemand/object-inl.h+++ b/include/simdjson/generic/ondemand/object-inl.h@@ -35,7 +35,7 @@ simdjson_inline error_code object::extract(Funcs&&... endpoints) { #else template <endpoint ...Funcs> simdjson_inline error_code object::extract(Funcs&&... endpoints) noexcept((nothrow_endpoint<Funcs> && ...)) {- return iter.on_field_raw([&](auto field_key, error_code& error) noexcept((nothrow_endpoint<Funcs> && ...)) {+ return iter.on_field_raw([&](auto field_key, error_code& error) __attribute__((always_inline)) { std::ignore = ((field_key.unsafe_is_equal(endpoints.key()) ? (error = endpoints(value(iter.child()))) == SUCCESS : true) && ...); if (error) { return true;@@ -66,7 +66,7 @@ public: return m_key; }- [[nodiscard]] constexpr error_code operator()(simdjson_result<value> val) noexcept(+ [[nodiscard]] simdjson_inline constexpr error_code operator()(simdjson_result<value> val) noexcept( std::is_nothrow_assignable_v<T, simdjson_result<value>>) { return val.get<T>(*pointer); }@@ -95,7 +95,7 @@ public: return m_key; }- [[nodiscard]] constexpr error_code operator()(simdjson_result<value> val) noexcept(+ [[nodiscard]] simdjson_inline constexpr error_code operator()(simdjson_result<value> val) noexcept( std::is_nothrow_invocable_v<Func, simdjson_result<value>>) { if constexpr (std::is_invocable_r_v<error_code, Func, simdjson_result<value>>) { return func(val);@@ -133,12 +133,12 @@ public: constexpr sub &operator=(sub &&) = default; constexpr ~sub() = default;- [[nodiscard]] constexpr error_code operator()(simdjson_result<value> val) noexcept((nothrow_endpoint<Tos> && ...)) {+ [[nodiscard]] simdjson_inline constexpr error_code operator()(simdjson_result<value> val) noexcept((nothrow_endpoint<Tos> && ...)) { object obj; if (auto const err = val.get_object().get(obj); err) { return err; }- return std::apply([&obj]<typename... T>(T &&...app_tos) {+ return std::apply([&obj]<typename... T>(T &&...app_tos) __attribute__((always_inline)) { return obj.extract(std::forward<T>(app_tos)...); }, tos); } GCC before change: Clang before change: GCC, after change: Clang after patch: Seems like the inlining does help; specially in Clang; Also // this:auto& t = result.emplace_back();// instead of: results.push_back(t); is more apples to apples but seems like the compilers are already smart enough to do the right thing anyway. Also, the cost ofconstructing |
the-moisrex commentedOct 10, 2024
@lemire Things we could try:
But these (except the last one if possible) may not bring much to the table. |
lemire commentedOct 11, 2024
@the-moisrex Thanks for the analysis. |
lemire commentedSep 17, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
What you can do easily with C++20, is this... object.extract<"myfirstkey","mysecondkey","mythirdkey">(lambda1, lambda2, lambda3); In this case, the keys become compile-time constant. This could be beneficial. |
the-moisrex commentedSep 18, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I don't wanna propose a compile time query language because that's a lot of work, and adds learning curve for the users, but feels like we're trying to do that here. The keys becoming a compile time constant may not be as beneficial as we think they'd be, we'd have zero pre-processing for them in the case of The problem with We need to separate the keys and where we want them to be put. And also we need to make sure we're not storing temporary structures so we wouldn't be making the compiler's job harder.
I still don't like the syntax for What I really like to be able to do is this, but I'm not sure if we can do it with reflection or not, even if we could, the compiler support is not there. object.extract(car.make);// it would know to check for "make" auto-magically// heck, if we could do that, we could do this:object >> car.make; Some brain-storming: object.extract("one","two","key")(one, two, value);object["one","two","three"](one, two, three);auto [one, two, three] = object["one","two","three"]auto res = object["one","two","three"]res.get(one);res.get(two);res.get(three);// I suspect these will have performance penalties:object.extract("one","two","key") >> one >> two >> value;object >>"one" >>"two" >>"key" >> one >> two >> value;// this is cool too, but the keys are not gonna be available at compile time unless the references are too, which they will not.object.extract("one","two","key", one, two, value); But to be honest, none of them are as powerful as the And we have a bunch of JSON query languages like |
lemire commentedSep 18, 2025
@the-moisrex Reflection will be around soon. The prototypes we have right now were implemented relatively quickly. Right now, with simdjson 4.0, you can just automatically deserialize a You could even have it generate a structure on demand: auto z = object.extract<"key1",int,"key2", std::string>() (or some nicer variant) This would return a structure with two attributes.
Your implementation is nice, but it has a significant cost. I believe that it is algorithmic. It is too expensive to check the keys in loops. So having compile-time strings might allow more clever implementations which might be needed to get your approach to be highly efficient. |
the-moisrex commentedSep 18, 2025
With utilities like structCar { std::string make; std::string model;} car;object.extract_into<"make","model">(car); Or we could do it like this: structure_tie(car) = object["make", "model"];tie(car.make, car.model) = object["make", "model"];tie(car.make, car.model) = object.extract<"make", "model">();// or with structured_tie Of course, structure_tie has limitations. I'm sure I even can pull this off, but the syntax is even worse (Imagine,this is possible): object.extract<"make", &Car::make,"model", &Car::model>(car); if we wanted to extract sub-objects, maybe we could do these, but I'm not 100% sure yet: structCar { std::string make;structDriver { std::string name; } driver;} car;// with a query language for example:tie(car.make, car.driver.name) = object["make", "driver.name"];// without a query language, maybe we could pull this off, but it would be ugly:tie(car.make, car.driver.name) = object["make", "driver"][self, "name"];// Maybe this?tie(car.make, car.driver.name) = object["make", sub("driver","name")];tie(car.make, car.driver.name) = object.extract<"make", sub("driver","name")>(); I'm not sure, but maybe we even could pull off sub-object of an array like these: // get the first driver:tie(car.make, car.driver.name) = object.extract<"make", sub("drivers",0,"name")>();tie(car.make, car.driver.name) = object["make", sub("drivers",0,"name")];tie(car.make, car.driver.name) = object["make", "drivers"][selfobj, 0][selfobj, "name"]; But error handling in these is gonna be something to figure out. Or maybe we could remove the references from what we have in this PR, and do it like this: tie(car.make, car.driver.name, error) = obj.extract( to{"make"}, to{"drivers", sub{ to{0, sub{"name"}}, }},);// or this:tie(car.make, car.driver.name, error) = obj.extract("make", sub{"drivers", sub{0, sub{"name"}});// maybe:tie(car.make, car.driver.name, error) = obj.extract("make",sub("drivers",0,"name"));// if we could do that, we definitely can do it at compile time:tie(car.make, car.driver.name, error) = obj.extract<"make",sub("drivers",0,"name")>(); Do you think we could make a utility to run a lambda in the tie? Because it this PR's implementation auto set_it = [](string& name) { car.driver.name = name; };tie(car.make, invoke_on_equal(set_it), error) = obj.extract< "make", sub("drivers",0,"name")>(); Oh wait, if we can pull that one off, we can do this as well for sub-objects: auto set_it = [](auto& drivers) { car.driver.name = drivers[0].extract<"name">(); };tie(car.make, invoke_on_equal(set_it), error) = obj.extract<"make", "drivers">();// and we can create a utility for that as well which we could:tie(car.make, sub_to<0,"name">(car.driver.name), error) = obj.extract<"make", "drivers">(); For error handling with if (tie_up(car.make, sub_to<0,"name">(car.driver.name)) = obj.extract<"make","drivers">()) {// failure?} @lemire, what do you think? which syntax would be the best? |
lemire commentedSep 18, 2025
Don't forget that you can run clang with static reflection. Please see
I don't think we want to bundle boost in simdjson.
I think that what can provide high performance and efficiency should be the driving force. This being said... Something like this... object.extract_into<"make","model">(car); is very nice and it should be easy work in simdjson 4.0 with reflection. It is better than having to annotate the |
the-moisrex commentedSep 18, 2025
We don't have to, I don't think it would be that hard to implement. It's a 2 liner for C++26, but a bit more for older versions. |
This is a small variation on PR#2247
I am observing a very significant performance regression compared to manually provided code. This is with GCC 12 on a recent x64 processor, but I see same effect with LLVM and ARM.