Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork9.7k
[Cache] serialize objects using native arrays when possible#27543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
palex-fpt commentedJun 8, 2018
https://gist.github.com/palex-fpt/82fc3deed09c2de2a9023080a9613ce3 |
palex-fpt commentedJun 8, 2018
There is native support to var_export serialization: Classes implemented this method should be able to restore from var_export-ed form. |
nicolas-grekas commentedJun 8, 2018
I think I can make the marshaller handle these. Let me give it a try. |
nicolas-grekas commentedJun 8, 2018
Here we are, the generated PHP files now use |
| }catch (\Exception$e) { | ||
| } | ||
| if (null !==$e) { | ||
| }catch (\Throwable$e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
this change should actually be done in the 4.0 branch (lots of places were cleaned already, but it looks like a bunch of them were missed)
| //Store arrays serialized ifthey contain any objects or references | ||
| if ($unserialized !==$value || (false !==strpos($serialized,';R:')&&preg_match('/;R:[1-9]/',$serialized))) { | ||
| //Keep value serialized ifit contains any"Serializable"objects or any internal references | ||
| if (0 ===strpos($serialized,'C:')||preg_match('/;[CRr]:[1-9]/',$serialized)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
do we need to check the first 2 chars, or would$serialized[0] be enough ? the second char would always be a colon AFAICT (all PHP types are represented using a single char in the serialization format AFAIK)
| * file that was distributed with this source code. | ||
| */ | ||
| namespaceSymfony\Component\Cache\Traits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
having a class in theTraits namespace looks confusing to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I get that, yet that's internal and I don't think this deserves a dedicated namespace (nor being at the root), so here it is :)
| $class =\get_class($value); | ||
| $data =array(self::COOKIE =>$class); | ||
| if (self::$sleep[$class] ??self::$sleep[$class] =\method_exists($class,'__sleep')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
is the key access in the static property faster than\method_exists($class, '__sleep') with its special opcode ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
confirmed: isset is 3x faster than method_exists here
nicolas-grekas commentedJun 8, 2018
Now green, comments addressed@stof, thanks. |
palex-fpt commentedJun 8, 2018
Is any chance to have extension point to switch between php serialization and igbinary? |
006af24 to4953591Comparenicolas-grekas commentedJun 8, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I managed to remove the unmarshaller part of this PR, and leverage Combined with#27549, this results in a significantly faster value loader that benefits from OPcache shared memory as much as possible.
What would be the benefit? |
467f0ae toeea30e0Comparepalex-fpt commentedJun 9, 2018
I wrote bench for measure object load times:https://gist.github.com/palex-fpt/121a24e53ded4a0f6bb72307fc823a50 |
palex-fpt commentedJun 9, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
TBH I would like to have PhpFilesTrait looks like: This way we would move all responsibility to prepare 'the best' serialization form to serializer. Leaving PhpFilesTrait with file handling and using opcache. |
nicolas-grekas commentedJun 9, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I was surprised by the speed of That leaves us with the marshaller as the best solution, great :) |
palex-fpt commentedJun 9, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Oops. My mistake. I updated gist. Igbinary looks like the winner. And it uses three times lesser space. |
palex-fpt commentedJun 9, 2018
I tested with range of options. When data use a lot of object instances - it looses to igbinary. When data is composition of scalars - PhpMarshaller is winner. It'll be good have option to choose marshaller based on used data. |
nicolas-grekas commentedJun 9, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
actually, it doesn't to me, because this is missing a very important property of the native php format: it leverages php shared memory. This means the php format uses zero extra memory past the first request. This also means zero memory transfer to access the data: you just manipulate pointers to shared memory under the hood. There is a pathological test case that illustrates this: error_reporting(-1);require'vendor/autoload.php';useSymfony\Component\Cache\Adapterasp;$cache =newp\PhpFilesAdapter();//$cache = new p\ApcuAdapter();$mem =memory_get_usage();$start =microtime(true);$i =10000;$values =array();while (--$i) {$values[] =$cache->get('foo',function ($item) {returnstr_repeat('-',10000); });}echo1000*(microtime(true) -$start),"ms\n";echomemory_get_peak_usage() -$mem,"\n"; Takes 23ms + 123MB with Apcu (~mimics serialize/igbinary) |
nicolas-grekas commentedJun 13, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
PR now tested and green. Ready. @palex-fpt now that this is faster than igbinary there is no reason anymore to allow any sort of extensibility here. On the contrary, I prefer this to be closed and remain internal details. |
palex-fpt commentedJun 14, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Do you have benchmarks? tree depth 3, values: 50, iterations: 1000
With my typical scenario (one access to key-value pair per request) it does not looks good. https://gist.github.com/palex-fpt/913fbe1b1def170c2785d3e26ce4f77e |
palex-fpt commentedJun 14, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
it looks like first call to 'get' method is performed against empty opcache. 'set' method should populate opcache. |
nicolas-grekas commentedJun 14, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
To me, this looks really good: store |
7845a55 tod45b23eComparenicolas-grekas commentedJun 14, 2018
done |
palex-fpt commentedJun 15, 2018
Opcache is fastest available cache with node locality. We reset opcache on node on 'switch to new build' deployment. It happens at least once per day. Any frequently accessed data that has change rate lower than deployment rate is perfect candidate to be stored in opcache. Limit opcache to append only is forcing to use slower caches or to do unnecessary redeployment on data change. Ex. does use-case that allows to change title of site directory (which happens once per year or never) should forbid use opcache for site directory attributes? Caching retrieved data adds some burden to backend services. Service that enumerates over large portion of opcached objects can go oom. But it can be handled with $cache->reset() on each iteration. I'm ok with current PR. It is big step in performance from opcaching serialized strings. Further improvements can be done in separate PRs. |
8c5cb74 to45b6e35Comparenicolas-grekas commentedJun 15, 2018
Great, PR is ready then :) |
d7b1a72 tof67f900Comparefabpot commentedJun 18, 2018
Thank you@nicolas-grekas. |
…sible (nicolas-grekas)This PR was merged into the 4.2-dev branch.Discussion----------[Cache] serialize objects using native arrays when possible| Q | A| ------------- | ---| Branch? | master| Bug fix? | no| New feature? | no| BC breaks? | no| Deprecations? | no| Tests pass? | yes| Fixed tickets | -| License | MIT| Doc PR | -This PR allows leveraging OPCache shared memory when storing objects in `Php*` pool storages (as done by default for all system caches). This improves performance a bit further when loading e.g. annotations, etc. (bench coming);Instead of using native php serialization, this uses a marshaller that represents objects in plain static arrays. Unmarshalling these arrays is faster than unserializing the corresponding PHP strings (because it works with copy-on-write, while unserialize cannot.)php-serialization is still a possible format because we have to use it when serializing structures with internal references or with objects implementing `Serializable`. The best serialization format is selected automatically so this is completely seamless.ping@palex-fpt since you gave me the push to work on this, and are pursuing a similar goal in#27484. I'd be thrilled to get some benchmarks on your scenarios.Commits-------866420e [Cache] serialize objects using native arrays when possible
palex-fpt commentedJun 19, 2018
As it does not support objects with __set_sate and cannot be overridden to made that support I can not benchmark it against my configuration. I hope it can be done later. |
Uh oh!
There was an error while loading.Please reload this page.
This PR allows leveraging OPCache shared memory when storing objects in
Php*pool storages (as done by default for all system caches). This improves performance a bit further when loading e.g. annotations, etc. (bench coming);Instead of using native php serialization, this uses a marshaller that represents objects in plain static arrays. Unmarshalling these arrays is faster than unserializing the corresponding PHP strings (because it works with copy-on-write, while unserialize cannot.)
php-serialization is still a possible format because we have to use it when serializing structures with internal references or with objects implementing
Serializable. The best serialization format is selected automatically so this is completely seamless.ping@palex-fpt since you gave me the push to work on this, and are pursuing a similar goal in#27484. I'd be thrilled to get some benchmarks on your scenarios.