Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-107369: optimize textwrap.indent()#107374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
methane merged 8 commits intopython:mainfrommethane:opt-textwrap-indent
Jul 29, 2023

Conversation

methane
Copy link
Member

@methanemethane commentedJul 28, 2023
edited by bedevere-bot
Loading

indent()-ingObject/unicodeobject.c (15332 lines) about 25% faster.

gh640 reacted with thumbs up emoji
@methanemethane added performancePerformance or resource usage stdlibPython modules in the Lib dir labelsJul 28, 2023
Copy link
Contributor

@eendebakpteendebakpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks good! Usingstr.split for the predicate instead ofline.strip might change something for input that is notstr, but I think this is ok.

Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

lstrip is faster for non-indented lines.

I wonder whether the following variants can be faster for some input and for how wide category of input.

defpredicate(line):returnlineand (notline[0].isspace()orline.lstrip())

or

predicate=re.compile(r'\S').search

@methane
Copy link
MemberAuthor

methane commentedJul 28, 2023
edited
Loading

  • _has_nonspace = re.compile(r'\S').search in global andpredicate = _has_nonspace -- 3.5ms
  • str.rstrip = 1.95ms
  • str.lstrip = 2.03ms
  • lambda x: not x.isspace() = 2.07ms

Since we usesplitlines(keepends=True), we can use justnot x.isspace(). (no empty line is guaranteed."".splitlines(keepends=True) == [] and"foo\n".splitlines(True) == ['foo\n']).
But it is a bit tricky and has relatively high cognitive load.

In case of unicodeobject.c, rstrip is bit faster. But it may be because most lines are indented already.

So I chose str.lstrip here, as Serhiy suggested.

@serhiy-storchaka
Copy link
Member

Now that you mention it, I can see that usingisspace() is the most obvious way to do this. Why I did not see it earlier?

We want to test whether the line has any non-space character.bool(line.strip()) is actually a tricky way -- we strips the line from spaces and if the rest is not empty string, then the original line has non-space characters too.not line.isspace() is a straightforward way -- it asks the opposite question (is the line only contains space characters?) and negates the result.

Algorithmically,isspace() looks more preferable, because it does not create a string. But on practice it may not matter in common cases. Did you compare variants with different inputs? For exampleMisc/NEWS.d/3.8.0a1.rst may show a very different result.

@methane
Copy link
MemberAuthor

Now that you mention it, I can see that usingisspace() is the most obvious way to do this. Why I did not see it earlier?

Because"".isspace() is False. We need to guarantee that "" is not used here.
x and not x.isspace() would be bit obvious, but little slower.

Algorithmically,isspace() looks more preferable, because it does not create a string. But on practice it may not matter in common cases. Did you compare variants with different inputs? For exampleMisc/NEWS.d/3.8.0a1.rst may show a very different result.

lstrip() is slow when every line has long indent. ButMisc/NEWS.d/3.8.0a1.rst has almost no indents.

With4c6a46a andhttps://gist.github.com/methane/5c6153c564d9508199a81c48d33161eb

> ./python.exe bench_indent.py Misc/NEWS.d/3.8.0a1.rstfilename='Misc/NEWS.d/3.8.0a1.rst' 8978 lines.                   lstrip: 0.736msec          not x.isspace(): 0.877msec    x and not x.isspace(): 0.929msec> ./python.exe bench_indent.py Objects/unicodeobject.cfilename='Objects/unicodeobject.c' 15332 lines.                   lstrip: 1.812msec          not x.isspace(): 1.877msec    x and not x.isspace(): 1.970msec

If I addtext = textwrap.indent(text, " "*32) before bench:

> ./python.exe bench_indent.py Objects/unicodeobject.cfilename='Objects/unicodeobject.c' 15332 lines.                   lstrip: 2.259msec          not x.isspace(): 2.356msec    x and not x.isspace(): 2.437msec

@methane
Copy link
MemberAuthor

To maximize performance, we can stop using lambda by...:

    if predicate is None:        for line in text.splitlines(True):            if not line.isspace():                prefixed_lines.append(prefix)            prefixed_lines.append(line)    else:        for line in text.splitlines(True):            if predicate(line):                prefixed_lines.append(prefix)            prefixed_lines.append(line)
filename='Objects/unicodeobject.c' 15332 lines.                     None: 1.604msec                   lstrip: 1.826msec          not x.isspace(): 1.883msec

Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you for your research Inada-san. Which to use here, lstrip or isspace, I leave up to you. It does not really matter in most cases.

@methanemethaneenabled auto-merge (squash)July 29, 2023 06:03
@methanemethane merged commit37551c9 intopython:mainJul 29, 2023
@methanemethane deleted the opt-textwrap-indent branchJuly 29, 2023 06:37
@picnixz
Copy link
Member

picnixz commentedJul 29, 2023
edited
Loading

For very long texts, I think changing

prefixed_lines= []forlineintext.splitlines(True):ifnotline.isspace():prefixed_lines.append(prefix)prefixed_lines.append(line)

into the following may improve the overall performances

prefixed_lines= []append_line=prefixed_lines.appendforlineintext.splitlines(True):ifnotline.isspace():append_line(prefix)append_line(line)

EDIT: After a more careful benchmarking, this does not seem to bring more improvements. However, not using a lambda function seems to be better.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@eendebakpteendebakpteendebakpt approved these changes

@serhiy-storchakaserhiy-storchakaserhiy-storchaka approved these changes

Assignees
No one assigned
Labels
performancePerformance or resource usagestdlibPython modules in the Lib dir
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

5 participants
@methane@serhiy-storchaka@picnixz@eendebakpt@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp