Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue4868

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:Faster utf-8 decoding
Type:performanceStage:patch review
Components:Interpreter CoreVersions:Python 3.1
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To:Nosy List: amaury.forgeotdarc, ezio.melotti, kevinwatters, lemburg, loewis, pitrou
Priority:normalKeywords:patch

Created on2009-01-07 15:25 bypitrou, last changed2022-04-11 14:56 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
utf8decode3.patchpitrou,2009-01-08 03:03
utf8decode4.patchamaury.forgeotdarc,2009-01-08 13:11
decode5.patchpitrou,2009-01-08 19:20
decode6.patchpitrou,2009-01-08 20:37
Messages (14)
msg79338 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2009-01-07 15:24
Here is a patch to speedup utf8 decoding. On a 64-bit build, the maximumspeedup is around 30%, and on a 32-bit build around 15%. (*)The patch may look disturbingly trivial, and I haven't studied theassembler output, but I think it is explained by the fact that having aseparate loop counter breaks the register dependencies (when the 's'pointer was incremented, other operations had to wait for theincrementation to be committed).[side note: utf8 encoding is still much faster than decoding, but it maybe because it allocates a smaller object, regardless of the iteration count]The same principle can probably be applied to the other decodingfunctions in unicodeobject.c, but first I wanted to know whether theprinciple is ok to apply. Marc-André, what is your take?(*) the benchmark I used is:./python -m timeit -s "importcodecs;c=codecs.utf_8_decode;s=b'abcde'*1000" "c(s)"More complex input also gets a speedup, albeit a smaller one (~10%):./python -m timeit -s "importcodecs;c=codecs.utf_8_decode;s=b'\xc3\xa9\xe7\xb4\xa2'*1000" "c(s)"
msg79353 -(view)Author: Martin v. Löwis (loewis)*(Python committer)Date: 2009-01-07 17:45
Can you please upload it to Rietveld?I'm skeptical about changes that merely rely on the compiler's registerallocator to do a better job. This kind of change tends to pessimize thecode for other compilers, and also may pessimize it for future versionsof the same compiler.
msg79356 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2009-01-07 18:05
As I said I don't think it's due to register allocation, but simplyavoiding register write-to-read dependencies by using separate variablesfor the loop count and the pointer. I'm gonna try under Windows (in avirtual machine, but it shouldn't make much difference since theworkload is CPU-bound).I've open a Rietveld issue here:http://codereview.appspot.com/11681
msg79358 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2009-01-07 18:30
Ha, the patch makes things slower on MSVC. The patch can probably berejected, then.(and interestingly, MSVC produces 40% faster code than gcc on mymini-bench, despite the virtual machine overhead)
msg79360 -(view)Author: Marc-Andre Lemburg (lemburg)*(Python committer)Date: 2009-01-07 18:35
On 2009-01-07 16:25, Antoine Pitrou wrote:> New submission from Antoine Pitrou <pitrou@free.fr>:> > Here is a patch to speedup utf8 decoding. On a 64-bit build, the maximum> speedup is around 30%, and on a 32-bit build around 15%. (*)> > The patch may look disturbingly trivial, and I haven't studied the> assembler output, but I think it is explained by the fact that having a> separate loop counter breaks the register dependencies (when the 's'> pointer was incremented, other operations had to wait for the> incrementation to be committed).> > [side note: utf8 encoding is still much faster than decoding, but it may> be because it allocates a smaller object, regardless of the iteration count]> > The same principle can probably be applied to the other decoding> functions in unicodeobject.c, but first I wanted to know whether the> principle is ok to apply. Marc-André, what is your take?I'm +1 on anything that makes codecs faster :-)However, the patch should be checked with some other compilersas well, e.g. using MS VC++.> (*) the benchmark I used is:> > ./python -m timeit -s "import> codecs;c=codecs.utf_8_decode;s=b'abcde'*1000" "c(s)"> > More complex input also gets a speedup, albeit a smaller one (~10%):> > ./python -m timeit -s "import> codecs;c=codecs.utf_8_decode;s=b'\xc3\xa9\xe7\xb4\xa2'*1000" "c(s)"
msg79397 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2009-01-08 03:03
Reopening and attaching a more ambitious patch, based on theoptimization of runs of ASCII characters. This time the speedup is muchmore impressive, up to 75% faster on pure ASCII input -- actually fasterthan latin1.The worst case (tight interleaving of ASCII and non-ASCII chars) shows a8% slowdown.(performance measured with gcc and MSVC)
msg79409 -(view)Author: Amaury Forgeot d'Arc (amaury.forgeotdarc)*(Python committer)Date: 2009-01-08 13:11
Very nice! It seems that you can get slightly faster by not copying theinitial char first: 's' is often already aligned at the beginning of thestring, but not after the first copy... Attached patch(utf8decode4.patch) changes this and may enter the fast loop on thefirst character.Does this idea apply to the encode function as well?
msg79416 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2009-01-08 15:22
> Attached patch> (utf8decode4.patch) changes this and may enter the fast loop on the> first character.Thanks!> Does this idea apply to the encode function as well?Probably, although with less efficiency (a long can hold 1, 2 or 4unicode characters depending on the build).The unrolling part also applies to simple codecs such as latin1.Unrolling PyUnicode_DecodeLatin1 a bit (4 copies per iteration) makes ittwice faster on non-tiny strings. I'll experiment with utf16.
msg79430 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2009-01-08 19:20
Attached patch adds acceleration for latin1 and utf16 decoding as well. All three codecs (utf8, utf16, latin1) are now in the same ballparkperformance-wise on favorable input: on my machine, they are able todecode at almost 1GB/s.(unpatched, it is between 150 and 500MB/s. depending on the codec)
msg79431 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2009-01-08 19:27
(PS : performance measured on UCS-2 and UCS-4 builds with gcc, and underWindows with MSVC)
msg79432 -(view)Author: Marc-Andre Lemburg (lemburg)*(Python committer)Date: 2009-01-08 19:37
Antoine Pitrou wrote:> Antoine Pitrou <pitrou@free.fr> added the comment:> > Attached patch adds acceleration for latin1 and utf16 decoding as well. > > All three codecs (utf8, utf16, latin1) are now in the same ballpark> performance-wise on favorable input: on my machine, they are able to> decode at almost 1GB/s.> > (unpatched, it is between 150 and 500MB/s. depending on the codec)> > Added file:http://bugs.python.org/file12655/decode5.patchA few style comments: * please use indented #pre-processor directives whenever possible, e.g.   #if   # define   #else   # define   #endif * the conditions should only accept SIZE_OF_LONG == 4 and 8 and   fail with an #error for any other value * you should use unsigned longs instead of signed ones * please use spaces around arithmetic operators, e.g. not a+b, but   a + b * when calling functions with lots of parameters, put each parameter on   a new line (e.g. for unicode_decode_call_errorhandler())Please also add a comment somewhere to the bit masks explaining whatthey do and how they are used. Verbose comments are always good tohave when doing optimizations such as these. Have a look at thedictionary implementation for what I mean by this.Thanks,-- Marc-Andre LemburgeGenix.com________________________________________________________________________::: Try our new mxODBC.Connect Python Database Interface for free ! ::::   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg           Registered at Amtsgericht Duesseldorf: HRB 46611http://www.egenix.com/company/contact/
msg79434 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2009-01-08 20:37
Marc-Andre, this patch should address your comments.
msg79447 -(view)Author: Marc-Andre Lemburg (lemburg)*(Python committer)Date: 2009-01-08 22:06
Antoine Pitrou wrote:> Antoine Pitrou <pitrou@free.fr> added the comment:> > Marc-Andre, this patch should address your comments.> > Added file:http://bugs.python.org/file12656/decode6.patchThanks. Much better !BTW: I'd also change the variable name "word" to somethingdifferent, e.g. bitmap or just data. It looks too much likea reserved word (which it isn't) ;-)
msg79549 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2009-01-10 15:46
I committed the patch with the last suggested change (word -> data) inpy3k (r68483). I don't intend to backport it to trunk, but I suppose itwouldn't be too much work to do.
History
DateUserActionArgs
2022-04-11 14:56:43adminsetgithub: 49118
2010-04-04 03:26:17ezio.melottisetnosy: +ezio.melotti
2009-01-10 15:46:28pitrousetstatus: open -> closed
resolution: fixed
messages: +msg79549
2009-01-08 22:06:30lemburgsetmessages: +msg79447
2009-01-08 20:38:00pitrousetfiles: +decode6.patch
messages: +msg79434
2009-01-08 19:37:37lemburgsetmessages: +msg79432
2009-01-08 19:27:38pitrousetmessages: +msg79431
2009-01-08 19:20:19pitrousetfiles: +decode5.patch
messages: +msg79430
2009-01-08 17:06:39kevinwatterssetnosy: +kevinwatters
2009-01-08 15:22:31pitrousetmessages: +msg79416
2009-01-08 13:11:21amaury.forgeotdarcsetfiles: +utf8decode4.patch
nosy: +amaury.forgeotdarc
messages: +msg79409
2009-01-08 03:03:52pitrousetfiles: -utf8decode.patch
2009-01-08 03:03:42pitrousetstatus: closed -> open
resolution: rejected -> (no value)
messages: +msg79397
files: +utf8decode3.patch
2009-01-07 18:35:13lemburgsetmessages: +msg79360
2009-01-07 18:30:50pitrousetstatus: open -> closed
resolution: rejected
messages: +msg79358
2009-01-07 18:05:18pitrousetmessages: +msg79356
2009-01-07 17:45:57loewissetnosy: +loewis
messages: +msg79353
2009-01-07 15:25:03pitroucreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp