Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.4k
gh-107369: optimize textwrap.indent()#107374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks good! Usingstr.split
for the predicate instead ofline.strip
might change something for input that is notstr
, but I think this is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
lstrip
is faster for non-indented lines.
I wonder whether the following variants can be faster for some input and for how wide category of input.
defpredicate(line):returnlineand (notline[0].isspace()orline.lstrip())
or
predicate=re.compile(r'\S').search
methane commentedJul 28, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Since we use In case of unicodeobject.c, rstrip is bit faster. But it may be because most lines are indented already. So I chose str.lstrip here, as Serhiy suggested. |
Now that you mention it, I can see that using We want to test whether the line has any non-space character. Algorithmically, |
Uh oh!
There was an error while loading.Please reload this page.
Because
lstrip() is slow when every line has long indent. But With4c6a46a andhttps://gist.github.com/methane/5c6153c564d9508199a81c48d33161eb
If I add
|
To maximize performance, we can stop using lambda by...:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thank you for your research Inada-san. Which to use here, lstrip or isspace, I leave up to you. It does not really matter in most cases.
picnixz commentedJul 29, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
For very long texts, I think changing prefixed_lines= []forlineintext.splitlines(True):ifnotline.isspace():prefixed_lines.append(prefix)prefixed_lines.append(line) into the following may improve the overall performances prefixed_lines= []append_line=prefixed_lines.appendforlineintext.splitlines(True):ifnotline.isspace():append_line(prefix)append_line(line) EDIT: After a more careful benchmarking, this does not seem to bring more improvements. However, not using a lambda function seems to be better. |
Uh oh!
There was an error while loading.Please reload this page.
indent()-ing
Object/unicodeobject.c
(15332 lines) about 25% faster.