Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork9.7k
[HttpFoundation] AddStreamedJsonResponse for efficient JSON streaming#47709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
[HttpFoundation] AddStreamedJsonResponse for efficient JSON streaming#47709
Uh oh!
There was an error while loading.Please reload this page.
Conversation
alexander-schranz commentedSep 27, 2022
The error in the tests of |
9a34c82 tobdd5babCompareUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
80f75ec toaba67faCompareStreamedJsonResponse for efficient JSON streamingUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
src/Symfony/Component/HttpFoundation/Tests/StreamedJsonResponseTest.php OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
ro0NL commentedSep 29, 2022
would it be reasonable to consider a "compute json inline" approach, rather than end-users taking care of unique identifiers $lazyJson = ['key' =>fn() =>yieldfrom$heavy]; |
stof commentedSep 29, 2022
@ro0NL this would force to re-implement the whole json encoding in userland |
ro0NL commentedSep 29, 2022
we could array walk the structure first, thus keeping the unique placeholders an implementation detail. |
stof commentedSep 29, 2022
@ro0NL if you do that, you are not streaming json anymore, defeating the whole purpose of this PR. |
ro0NL commentedSep 29, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
the idea is to split the generators from the structure, preserving remaining logic. But this is an extra step yes, thus less ideal perhaps. |
alexander-schranz commentedSep 29, 2022
@ro0NL interesting input. As I think the structure array is mostly small it could be possible. But we would need to have a look at what difference this would be in the performance. I hacked something together using |
stof commentedSep 29, 2022
@alexander-schranz be careful when implementing this. |
alexander-schranz commentedSep 29, 2022
@stof great hint think |
stof commentedSep 29, 2022
now that we have first class callables, I would say yes. You can convert any callable to a closure using this feature. |
alexander-schranz commentedSep 29, 2022
Okay I don't need to check for returnnewStreamedJsonResponse( ['_embedded' => ['articles' =>$this->findArticles('Article'),// returns a \Generator which will generate a list of data ], ],); The diff between old and new implementation is not big it just takes about I also update the example repository using the new class under |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
2480746 to7d8700fCompareUh oh!
There was an error while loading.Please reload this page.
3453946 to3de6fc7CompareOskarStark commentedOct 25, 2022
I propose to add the content from the README of your prototype application to the PR header 👍🏻 |
alexander-schranz commentedNov 3, 2022
@OskarStark added. Think PR is blocked until 6.3 branch is created? |
OskarStark commentedNov 3, 2022
thanks
Yes |
3de6fc7 to626eafeComparedunglas commentedNov 25, 2022
alexander-schranz commentedNov 25, 2022
@dunglas that sounds very interesting. I think currently I would stay with the implementation how it is for now, this gives a very low resource solution without the need that the http foundation package has additional requirements to any kind of serializer and so on. Still a serializer/normalizer is possible be used inside the Generator already, which will be they current implementation of this class also be very low on resources usage as it don't try to serialize all objects at once just one after the other and so don't need to keep more then one object in the memory aslong as the ORM loading allows that. |
chalasr commentedDec 29, 2022
Shall we move forward on this one? |
626eafe toa3ee766Comparealexander-schranz commentedDec 29, 2022
@chalasr rebased. Not sure what is open or required to get this merged :) |
a3ee766 toecc5355Comparechalasr commentedDec 29, 2022
Let's iterate, thanks@alexander-schranz! |
alexander-schranz commentedDec 29, 2022
🎉 Thx you all for the great feedback and ideas. Think we got a great solution out of it with a better DX as I could think of when created the Pull request. @chalasr that sounds great :) |
…medJsonResponse (alexander-schranz)This PR was merged into the 6.3 branch.Discussion----------[HttpFoundation] Fix problem with empty generator in StreamedJsonResponse| Q | A| ------------- | ---| Branch? | 6.3 (Feature `StreamedJsonResponse`:#47709)| Bug fix? | yes| New feature? | no <!-- please update src/**/CHANGELOG.md files -->| Deprecations? | no <!-- please update UPGRADE-*.md and src/**/CHANGELOG.md files -->| Tickets | Fix - was reported to me on Slack by `@norkunas`| License | MIT| Doc PR | symfony/symfony-docs#... <!-- required for new features -->Currently when the Generator is empty the return is invalid JSON which should not happen. So adding a testcase and a fix to the problem with the empty generator.Commits-------39bb6b6 Fix problem with empty generator in StreamedJsonResponse
…onse` (alexander-schranz)This PR was squashed before being merged into the 6.3 branch.Discussion----------[HttpFoundation] Add documentation for `StreamedJsonResponse`Docs for:symfony/symfony#47709# TODO- [x] Example of Flush HandlingCommits-------8a285e3 [HttpFoundation] Add documentation for `StreamedJsonResponse`
…medJsonResponse (Jeroeny)This PR was merged into the 6.4 branch.Discussion----------[HttpFoundation] Support root-level Generator in StreamedJsonResponse| Q | A| ------------- | ---| Branch? | 6.4| Bug fix? | no| New feature? | yes| Deprecations? | no| License | MITCurrently the `StreamedJsonResponse` only supports streaming nested Generators within an array data structure.However if a response is a list of items (for example database entities) on the root level, this isn't usable.I think both usecases can be supported with the change in this PR.The root level generator doesn't account for additional nested generators yet. I could add that by doing `is_array($item)` and the call the recursive placeholder logic.Link to first PR that introduced StreamedJsonResponse:#47709~~Also something I noticed is I only got intermediate output, when adding a `flush()` call after each item has been echo'd (with a `sleep(1)` after each item to see it output the parts individually).~~ Edit: I see the class' PhpDoc describes this and it's probably expected to be done in userland implementations.Commits-------05e582f support root-level Generator in StreamedJsonResponse
…medJsonResponse (Jeroeny)This PR was merged into the 6.4 branch.Discussion----------[HttpFoundation] Support root-level Generator in StreamedJsonResponse| Q | A| ------------- | ---| Branch? | 6.4| Bug fix? | no| New feature? | yes| Deprecations? | no| License | MITCurrently the `StreamedJsonResponse` only supports streaming nested Generators within an array data structure.However if a response is a list of items (for example database entities) on the root level, this isn't usable.I think both usecases can be supported with the change in this PR.The root level generator doesn't account for additional nested generators yet. I could add that by doing `is_array($item)` and the call the recursive placeholder logic.Link to first PR that introduced StreamedJsonResponse:symfony/symfony#47709~~Also something I noticed is I only got intermediate output, when adding a `flush()` call after each item has been echo'd (with a `sleep(1)` after each item to see it output the parts individually).~~ Edit: I see the class' PhpDoc describes this and it's probably expected to be done in userland implementations.Commits-------05e582f1a3 support root-level Generator in StreamedJsonResponse
Uh oh!
There was an error while loading.Please reload this page.
When big data are streamed via JSON API it can sometimes be difficult to keep the resources usages low. For this I experimented with a different way of streaming data for JSON responses. It uses combination of
structured arrayandgenericswhich did result in a lot better result.More can be read about here:https://github.com/alexander-schranz/efficient-json-streaming-with-symfony-doctrine.
I thought it maybe can be a great addition to Symfony itself to make this kind of responses easier and that APIs can be made more performant.
Usage
First Version (replaced)
Update Version (thx to@ro0NL for the idea):
As proposed by@OskarStark the Full Content of Blog about"Efficient JSON Streaming with Symfony and Doctrine":
Efficient JSON Streaming with Symfony and Doctrine
After reading a tweet about we provide only a few items (max. 100) over our
JSON APIs but providing 4k images for our websites. I did think about why is
this the case.
The main difference first we need to know about how images are streamed.
On webservers today is mostly the sendfile feature used. Which is very
efficient as it can stream a file chunk by chunk and don't need to load
the whole data.
So I'm asking myself how we can achieve the same mechanisms for our
JSON APIs, with a little experiment.
As an example we will have a look at a basic entity which has the
following fields defined:
The response of our API should look like the following:
{"_embedded": {"articles": [ {"id":1,"title":"Article 1","description":"Description 1\nMore description text ...", },... ] } }Normally to provide this API we would do something like this:
In most cases we will add some pagination to the endpoint so our response are not too big.
Making the api more efficient
But there is also a way how we can stream this response in an efficient way.
First of all we need to adjust how we load the articles. This can be done by replace
the
getResultwith the more efficienttoIterable:Still the whole JSON need to be in the memory to send it. So we need also refactoring
how we are creating our response. We will replace our
JsonResponsewith theStreamedResponseobject.But the
jsonformat is not the best format for streaming, so we need to add some hacksso we can make it streamable.
First we will create will define the basic structure of our JSON this way:
Instead of the
$articleswe are using a placeholder which we use to split the string intoa
$beforeand$aftervariable:Now we are first sending the
$before:Then we stream the articles one by one to it here we need to keep the comma in mind which
we need to add after every article but not the last one:
Also we will add an additional
flushafter every 500 elements:After that we will also send the
$afterpart:The result
So at the end the whole action looks like the following:
The metrics for 100000 Articles (nginx + php-fpm 7.4 - Macbook Pro 2013):
This way we did not only reduce the memory usage on our server
also we did make the response faster. The memory usage was
measured here with
memory_get_usageandmemory_get_peak_usage.The "Time to first Byte" by the browser value and response times
over curl.
Updated 2022-10-02 - (symfony serve + php-fpm 8.1 - Macbook Pro 2021)
While there is not much different for a single response in the time,
the real performance is the lower memory usage. Which will kick in when
you have a lot of simultaneously requests. On my machine >150 simultaneously
requests - which is a high value but will on a normal server be a lot lower.
While 150 simultaneously requests crashes in the old implementation
the new implementation still works with 220 simultaneously requests. Which
means we got about ~46% more requests possible.
Reading Data in javascript
As we stream the data we should also make our JavaScript on the other
end the same way - so data need to read in streamed way.
Here I'm just following the example from theFetch API Processing a text file line by line
So if we look at our
script.jswe split the objectline by line and append it to our table. This method is definitely not the
way how JSON should be read and parsed. It should only be shown as example
how the response could be read from a stream.
Conclusion
The implementation looks a little hacky for maintainability it could
be moved into its own Factory which creates this kind of response.
Example:
The JavaScript part something is definitely not ready for production
and if used you should probably creating your own content-type e.g.:
application/json+stream. So you are parsing the json this wayonly when you know it is really in this line by line format.
There maybe better libraries like
JSONStreamto read data but at current state did test them out. Let me know
if somebody has experience with that and has solutions for it.
Atleast what I think everybody should use for providing lists
is to use
toIterablewhen possible for your lists when loadingyour data via Doctrine and and select specific fields instead
of using the
ORMto avoid hydration process to object.Let me know what you think about this experiment and how you currently are
providing your JSON data.
The whole experiment here can be checked out and test yourself viathis repository.
Attend the discussion about this onTwitter.
Update 2022-09-27
Added aStreamedJsonRepsonse class and
try to contribute this implementation to the Symfony core.
#47709
Update 2022-10-02
Updated some statistics with new machine and apache benchmark tests for concurrency requests.