Movatterモバイル変換
[0]ホーム
[Python-Dev] requirements for moving __import__ over toimportlib?
Brett Cannonbrett at python.org
Fri Feb 10 21:38:02 CET 2012
On Fri, Feb 10, 2012 at 15:07, PJ Eby <pje at telecommunity.com> wrote:> On Fri, Feb 10, 2012 at 1:05 PM, Brett Cannon <brett at python.org> wrote:>>>>>>> On Thu, Feb 9, 2012 at 17:00, PJ Eby <pje at telecommunity.com> wrote:>>>>> I did some crude timeit tests on frozenset(listdir()) and trapping>>> failed stat calls. It looks like, for a Windows directory the size of the>>> 2.7 stdlib, you need about four *failed* import attempts to overcome the>>> initial caching cost, or about 8 successful bytecode imports. (For Linux,>>> you might need to double these numbers; my tests showed a different ratio>>> there, perhaps due to the Linux stdib I tested having nearly twice as many>>> directory entries as the directory I tested on Windows!)>>>>>>>> However, the numbers are much better for application directories than>>> for the stdlib, since they are located earlier on sys.path. Every>>> successful stdlib import in an application is equal to one failed import>>> attempt for every preceding directory on sys.path, so as long as the>>> average directory on sys.path isn't vastly larger than the stdlib, and the>>> average application imports at least four modules from the stdlib (on>>> Windows, or 8 on Linux), there would be a net performance gain for the>>> application as a whole. (That is, there'd be an improved per-sys.path>>> entry import time for stdlib modules, even if not for any application>>> modules.)>>>>>>> Does this comment take into account the number of modules required to>> load the interpreter to begin with? That's already like 48 modules loaded>> by Python 3.2 as it is.>>>> I didn't count those, no. So, if they're loaded from disk *after*> importlib is initialized, then they should pay off the cost of caching even> fairly large directories that appear earlier on sys.path than the stdlib.> We still need to know about NFS and other ratios, though... I still worry> that people with more extreme directory sizes or slow-access situations> will run into even worse trouble than they have now.>It's possible. No way to make it work for everyone. This is why I didn'tworry about some crazy perf optimization.>>>>> First is that if this were used on Windows or OS X (i.e. the OSs we>> support that typically have case-insensitive filesystems), then this>> approach would be a massive gain as we already call os.listdir() when>> PYTHONCASEOK isn't defined to check case-sensitivity; take your 5 stat>> calls and add in 5 listdir() calls and that's what you get on Windows and>> OS X right now. Linux doesn't have this check so you would still be>> potentially paying a penalty there.>>>> Wow. That means it'd always be a win for pre-stdlib sys.path entries,> because any successful stdlib import equals a failed pre-stdlib lookup.> (Of course, that's just saving some of the overhead that's been *added* by> importlib, not a new gain, but still...)>How so? import.c does a listdir() as well (this is not special toimportlib).>>> Second is variance in filesystems. Are we guaranteed that the stat of a>> directory is updated before a file change is made?>>>> Not quite sure what you mean here. The directory stat is used to ensure> that new files haven't been added, old ones removed, or existing ones> renamed. Changes to the files themselves shouldn't factor in, should they?>Changes in any fashion to the directory. Do filesystems atomically updatethe mtime of a directory when they commit a change? Otherwise we have apotential race condition.>>>>> Else there is a small race condition there which would suck. We also have>> the issue of granularity; Antoine has already had to add the source file>> size to .pyc files in Python 3.3 to combat crappy mtime granularity when>> generating bytecode. If we get file mod -> import -> file mod -> import,>> are we guaranteed that the second import will know there was a modification>> if the first three steps occur fast enough to fit within the granularity of>> an mtime value?>>>> Again, I'm not sure how this relates. Automatic code reloaders monitor> individual files that have been previously imported, so the directory> timestamps aren't relevant.>>Don't care about automatic reloaders. I'm just asking about the case wherethe mtime granularity is coarse enough to allow for a directory change, animport to execute, and then another directory change to occur all within asingle mtime increment. That would lead to the set cache to be out of date.> Of course, I could be confused here. Are you saying that if somebody> makes a new .py file and saves it, that it'll be possible to import it> before it's finished being written? If so, that could happen already, and> again caching the directory doesn't make any difference.>> Alternately, you could have a situation where the file is deleted after we> load the listdir(), but in that case the open will fail and we can fall> back... heck, we can even force resetting the cache in that event.>>> I was going to say something about __pycache__, but it actually doesn't>> affect this. Since you would have to stat the directory anyway, you might>> as well just stat directory for the file you want to keep it simple. Only>> if you consider __pycache__ to be immutable except for what the interpreter>> puts in that directory during execution could you optimize that step (in>> which case you can stat the directory once and never care again as the set>> would be just updated by import whenever a new .pyc file was written).>>>> Having said all of this, implementing this idea would be trivial using>> importlib if you don't try to optimize the __pycache__ case. It's just a>> question of whether people are comfortable with the semantic change to>> import. This could also be made into something that was in importlib for>> people to use when desired if we are too worried about semantic changes.>>>> Yep. I was actually thinking this could be backported to 2.x, even> without importlib, as a module to be imported in sitecustomize or via a> .pth file. All it needs is a path hook, after all, and a subclass of the> pkgutil importer to test it. And if we can get some people with huge NFS> libraries and/or zillions of .egg directories on sys.path to test it, we> could find out whether it's a win, lose, or draw for those scenarios.>You can do that if you want, obviously I don't want to bother since itwon't make it into Python 2.7.>>-------------- next part --------------An HTML attachment was scrubbed...URL: <http://mail.python.org/pipermail/python-dev/attachments/20120210/2b660269/attachment.html>
More information about the Python-Devmailing list
[8]ページ先頭