Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit75a6fad

Browse files
gh-91524: Speed up the regular expression substitution (#91525)
Functions re.sub() and re.subn() and corresponding re.Pattern methodsare now 2-3 times faster for replacement strings containing group references.Closes#91524Primarily authored by serhiy-storchaka Serhiy StorchakaMinor-cleanups-by: Gregory P. Smith [Google] <greg@krypto.org>
1 parent176b6c5 commit75a6fad

File tree

9 files changed

+358
-91
lines changed

9 files changed

+358
-91
lines changed

‎Doc/whatsnew/3.12.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,11 @@ Optimizations
205205
process, which improves performance by 1-5%.
206206
(Contributed by Kevin Modzelewski in:gh:`90536`.)
207207

208+
* Speed up the regular expression substitution (functions:func:`re.sub` and
209+
:func:`re.subn` and corresponding:class:`re.Pattern` methods) for
210+
replacement strings containing group references by 2--3 times.
211+
(Contributed by Serhiy Storchaka in:gh:`91524`.)
212+
208213

209214
CPython bytecode changes
210215
========================

‎Lib/re/__init__.py

Lines changed: 4 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,7 @@
124124
importenum
125125
from .import_compiler,_parser
126126
importfunctools
127+
import_sre
127128

128129

129130
# public symbols
@@ -230,7 +231,7 @@ def purge():
230231
"Clear the regular expression caches"
231232
_cache.clear()
232233
_cache2.clear()
233-
_compile_repl.cache_clear()
234+
_compile_template.cache_clear()
234235

235236
deftemplate(pattern,flags=0):
236237
"Compile a template pattern, returning a Pattern object, deprecated"
@@ -328,24 +329,9 @@ def _compile(pattern, flags):
328329
returnp
329330

330331
@functools.lru_cache(_MAXCACHE)
331-
def_compile_repl(repl,pattern):
332+
def_compile_template(pattern,repl):
332333
# internal: compile replacement pattern
333-
return_parser.parse_template(repl,pattern)
334-
335-
def_expand(pattern,match,template):
336-
# internal: Match.expand implementation hook
337-
template=_parser.parse_template(template,pattern)
338-
return_parser.expand_template(template,match)
339-
340-
def_subx(pattern,template):
341-
# internal: Pattern.sub/subn implementation helper
342-
template=_compile_repl(template,pattern)
343-
ifnottemplate[0]andlen(template[1])==1:
344-
# literal replacement
345-
returntemplate[1][0]
346-
deffilter(match,template=template):
347-
return_parser.expand_template(template,match)
348-
returnfilter
334+
return_sre.template(pattern,_parser.parse_template(repl,pattern))
349335

350336
# register myself for pickling
351337

‎Lib/re/_constants.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313

1414
# update when constants are added or removed
1515

16-
MAGIC=20220615
16+
MAGIC=20221023
1717

1818
from_sreimportMAXREPEAT,MAXGROUPS
1919

‎Lib/re/_parser.py

Lines changed: 16 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -984,24 +984,28 @@ def parse(str, flags=0, state=None):
984984

985985
returnp
986986

987-
defparse_template(source,state):
987+
defparse_template(source,pattern):
988988
# parse 're' replacement string into list of literals and
989989
# group references
990990
s=Tokenizer(source)
991991
sget=s.get
992-
groups= []
993-
literals= []
992+
result= []
994993
literal= []
995994
lappend=literal.append
995+
defaddliteral():
996+
ifs.istext:
997+
result.append(''.join(literal))
998+
else:
999+
# The tokenizer implicitly decodes bytes objects as latin-1, we must
1000+
# therefore re-encode the final representation.
1001+
result.append(''.join(literal).encode('latin-1'))
1002+
delliteral[:]
9961003
defaddgroup(index,pos):
997-
ifindex>state.groups:
1004+
ifindex>pattern.groups:
9981005
raises.error("invalid group reference %d"%index,pos)
999-
ifliteral:
1000-
literals.append(''.join(literal))
1001-
delliteral[:]
1002-
groups.append((len(literals),index))
1003-
literals.append(None)
1004-
groupindex=state.groupindex
1006+
addliteral()
1007+
result.append(index)
1008+
groupindex=pattern.groupindex
10051009
whileTrue:
10061010
this=sget()
10071011
ifthisisNone:
@@ -1063,22 +1067,5 @@ def addgroup(index, pos):
10631067
lappend(this)
10641068
else:
10651069
lappend(this)
1066-
ifliteral:
1067-
literals.append(''.join(literal))
1068-
ifnotisinstance(source,str):
1069-
# The tokenizer implicitly decodes bytes objects as latin-1, we must
1070-
# therefore re-encode the final representation.
1071-
literals= [NoneifsisNoneelses.encode('latin-1')forsinliterals]
1072-
returngroups,literals
1073-
1074-
defexpand_template(template,match):
1075-
g=match.group
1076-
empty=match.string[:0]
1077-
groups,literals=template
1078-
literals=literals[:]
1079-
try:
1080-
forindex,groupingroups:
1081-
literals[index]=g(group)orempty
1082-
exceptIndexError:
1083-
raiseerror("invalid group reference %d"%index)fromNone
1084-
returnempty.join(literals)
1070+
addliteral()
1071+
returnresult
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Speed up the regular expression substitution (functions:func:`re.sub` and
2+
:func:`re.subn` and corresponding:class:`re.Pattern` methods) for
3+
replacement strings containing group references by 2--3 times.

‎Modules/_sre/clinic/sre.c.h

Lines changed: 40 additions & 1 deletion
Some generated files are not rendered by default. Learn more aboutcustomizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp