
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2012-07-07 13:38 bymandel, last changed2022-04-11 14:57 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| less_isinstance.patch | mandel,2012-07-07 13:38 | Patch that reduces the number of isinstance calls performed. | review | |
| ntpath_cleanup.diff | serhiy.storchaka,2014-07-22 10:09 | review | ||
| Repositories containing patches | |||
|---|---|---|---|
| http://bitbucket.org/mandel/ntpath-performance | |||
| Messages (10) | |||
|---|---|---|---|
| msg164842 -(view) | Author: Manuel de la Pena (mandel) | Date: 2012-07-07 13:38 | |
The problem is simple, the code that allows to use binary strings and unicode is making more calls that needed to isinstance(path, bytes) since the result of the code is not shared. For example, the following calls are present in the module:def _get_empty(path): if isinstance(path, bytes): return b'' else: return ''def _get_sep(path): if isinstance(path, bytes): return b'\\' else: return '\\'def _get_altsep(path): if isinstance(path, bytes): return b'/' else: return '/'def _get_bothseps(path): if isinstance(path, bytes): return b'\\/' else: return '\\/'def _get_dot(path): if isinstance(path, bytes): return b'.' else: return '.'...And then something similar to the following is found in the code:def normpath(path): """Normalize path, eliminating double slashes, etc.""" sep = _get_sep(path) dotdot = _get_dot(path) * 2 special_prefixes = _get_special(path) if path.startswith(special_prefixes): # in the case of paths with these prefixes: # \\.\ -> device names # \\?\ -> literal paths # do not do any normalization, but return the path unchanged return path path = path.replace(_get_altsep(path), sep) prefix, path = splitdrive(path)As you can see the isinstance call is performed more than needed which certainly affects the performance of the path operations. The attached patch removes the number of calls to isinstance(obj, bytes) and also ensures that the function that returns the correct literal is as fast as possible by using a dict. | |||
| msg165409 -(view) | Author: Ezio Melotti (ezio.melotti)*![]() | Date: 2012-07-13 19:25 | |
Have you tried doing some benchmarks before and after the patch?If this patch is applied I think it would be good to change posixpath too.Also make sure that the changes you made are covered by the tests. | |||
| msg166267 -(view) | Author: Manuel de la Pena (mandel) | Date: 2012-07-24 09:29 | |
Tests indeed cover the changes made. I don't know about a decent way of doing benchmarks for the changes. Any recommendation?> If this patch is applied I think it would be good to change posixpath too.I agree and I'd love to do it but in a diff bug to make things self-contained, what do you think? | |||
| msg166390 -(view) | Author: Ezio Melotti (ezio.melotti)*![]() | Date: 2012-07-25 12:33 | |
> I don't know about a decent way of doing benchmarks for the changes.> Any recommendation?You could make a script that uses the timeit module.>> If this patch is applied I think it would be good to change>> posixpath too.> I agree and I'd love to do it but in a diff bug to make things> self-contained, what do you think?Having a single patch that fixes both is OK. | |||
| msg220033 -(view) | Author: Mark Lawrence (BreamoreBoy)* | Date: 2014-06-08 13:14 | |
@Manuel do you intend picking this up? | |||
| msg223656 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2014-07-22 10:09 | |
Here is alternative patch. I believe it makes a code simpler.Microbenchmarks:$ ./python -m timeit -n 100000 -s "from ntpath import splitdrive" "splitdrive('c:foo')"Before: 100000 loops, best of 3: 20 usec per loopAfter: 100000 loops, best of 3: 11.5 usec per loop$ ./python -m timeit -n 100000 -s "from ntpath import splitext" "splitext('python.exe')"Before: 100000 loops, best of 3: 23.6 usec per loopAfter: 100000 loops, best of 3: 18 usec per loop$ ./python -m timeit -s "from ntpath import join" "join('foo', 'bar')"Before: 10000 loops, best of 3: 50.9 usec per loopAfter: 10000 loops, best of 3: 32.3 usec per loop$ ./python -m timeit -s "from ntpath import normpath" "normpath('/foo/bar/baz')"Before: 10000 loops, best of 3: 67.5 usec per loopAfter: 10000 loops, best of 3: 40.3 usec per loop$ ./python -m timeit -s "from ntpath import relpath" "relpath('foo', 'bar')"Before: 1000 loops, best of 3: 695 usec per loopAfter: 1000 loops, best of 3: 456 usec per loop | |||
| msg223657 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-07-22 10:12 | |
I like ntpath_cleanup.diff, I don't think that it makes the code worse.FYI os.fsencode() accepts str too, you can simplify: if isinstance(path, bytes):- userhome = userhome.encode(sys.getfilesystemencoding())+ userhome = os.fsencode(userhome)to+ userhome = os.fsencode(userhome) | |||
| msg223672 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2014-07-22 16:36 | |
No, if *path* is not bytes, *userhome* shouldn't be converted to bytes. | |||
| msg223698 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-07-22 21:15 | |
Oh you're right sorry. | |||
| msg223752 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2014-07-23 17:43 | |
New changesetb22aaa59d24f by Serhiy Storchaka in branch 'default':Issue#15275: Clean up and speed up the ntpath module.http://hg.python.org/cpython/rev/b22aaa59d24f | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:57:32 | admin | set | github: 59480 |
| 2014-07-23 17:46:54 | serhiy.storchaka | set | status: open -> closed assignee:serhiy.storchaka resolution: fixed stage: patch review -> resolved |
| 2014-07-23 17:43:24 | python-dev | set | nosy: +python-dev messages: +msg223752 |
| 2014-07-22 21:15:58 | vstinner | set | messages: +msg223698 |
| 2014-07-22 16:36:49 | serhiy.storchaka | set | messages: +msg223672 |
| 2014-07-22 10:12:08 | vstinner | set | nosy: +vstinner messages: +msg223657 |
| 2014-07-22 10:09:20 | serhiy.storchaka | set | files: +ntpath_cleanup.diff versions: + Python 3.5, - Python 3.4 nosy: +serhiy.storchaka messages: +msg223656 |
| 2014-06-08 13:14:03 | BreamoreBoy | set | nosy: +BreamoreBoy messages: +msg220033 |
| 2012-07-25 12:33:59 | ezio.melotti | set | messages: +msg166390 |
| 2012-07-24 09:29:31 | mandel | set | messages: +msg166267 |
| 2012-07-13 23:40:11 | terry.reedy | set | nosy: +terry.reedy |
| 2012-07-13 19:25:43 | ezio.melotti | set | versions: + Python 3.4, - Python 3.3 nosy: +ezio.melotti messages: +msg165409 type: performance stage: patch review |
| 2012-07-08 09:59:58 | mandel | set | nosy: +brian.curtin |
| 2012-07-08 02:54:52 | jcea | set | nosy: +jcea |
| 2012-07-07 13:41:22 | mandel | set | files: -f5c57ba1124b.diff |
| 2012-07-07 13:40:14 | mandel | set | files: +f5c57ba1124b.diff |
| 2012-07-07 13:38:18 | mandel | create | |