Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue846388

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:Check for signals during regular expression matches
Type:Stage:
Components:Interpreter CoreVersions:Python 2.4
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To: facundobatistaNosy List: effbot, facundobatista, gvanrossum, joshhoyt, loewis, schmir
Priority:normalKeywords:patch

Created on2003-11-21 06:29 byjoshhoyt, last changed2022-04-11 14:56 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
_sre.c.patchjoshhoyt,2003-11-21 06:29Allow signal handlers to run during _sre searches
patchschmir,2007-11-02 21:35
patch.speedupschmir,2007-11-02 22:04
sre_exception.difffacundobatista,2008-01-04 18:06
Messages (12)
msg44910 -(view)Author: Josh Hoyt (joshhoyt)Date: 2003-11-21 06:29
This patch adds a call to PyErr_CheckSignals toSRE_MATCH so that signal handlers can be invoked duringlong regular expression matches. It also adds a newerror return value indicating that an exceptionoccurred in a signal handler during the match, allowingexceptions in the signal handler to propagate up to themain loop.Rationale:Regular expressions can run away inside of the C code.There is no way for Python code to stop the C code fromrunning away, so we attempted to use setrlimit tointerrupt the process when the CPU usage exceeded a limit.When the signal was received, the signal function wastriggered. The sre code does not allow the main loop torun, so the triggered handlers were not called untilthe regular expression finished, if ever. Thisbehaviour makes the interruption by the signal uselessfor the purposes of constraining the running time ofregular expression matches.I am unsure whether the PyErr_CheckSignals islightweight enough to be called inside of the for loopin SRE_MATCH, so I took the conservative approach andonly checked on recursion or match invocation. Ibelieve that the performance hit from this check wouldnot be prohibitive inside of the loop, sincePyErr_CheckSignals does very little work unless thereis a signal to handle.
msg44911 -(view)Author: Martin v. Löwis (loewis)*(Python committer)Date: 2003-11-22 15:14
Logged In: YES user_id=21627Can you give an example for a SRE matching that is so slowthat the user may press Ctrl-C?
msg44912 -(view)Author: Martin v. Löwis (loewis)*(Python committer)Date: 2006-11-13 23:28
Logged In: YES user_id=21627Fredrik, what do you think about this patch?
msg57056 -(view)Author: Ralf Schmitt (schmir)Date: 2007-11-02 16:51
here is an example (fromhttp://swtch.com/~rsc/regexp/regexp1.html)python -c 'import re; num=25; r=re.compile("a?"*num+"a"*num);r.match("a"*num)'At work I have seen a real world case of a regular expression which ranfor minutes rendering the application unresponsive. We would have beenglad to be able to interrupt the regular expression engine and getting atraceback.
msg57067 -(view)Author: Ralf Schmitt (schmir)Date: 2007-11-02 21:35
I'm attaching a working patch against 2.5.1 and a short test program.#! /usr/bin/env pythonimport signalimport reimport timedef main():    num=28 # need more than 60s on a 2.4Ghz core 2    r=re.compile("a?"*num+"a"*num)    signal.signal(signal.SIGALRM, signal.default_int_handler)    signal.alarm(1)    stime = time.time()    try:        r.match("a"*num)    except KeyboardInterrupt:        assert time.time()-stime<3    else:        raise RuntimeError("no keyboard interrupt")if __name__=='__main__':    main()
msg57068 -(view)Author: Ralf Schmitt (schmir)Date: 2007-11-02 22:04
hm. just noticed that calling PyErr_CheckSignals slows down regularexpression matching considerably (50%).I'm adding another patch, which only checks every 4096th iteration forsignals.
msg59246 -(view)Author: Facundo Batista (facundobatista)*(Python committer)Date: 2008-01-04 18:06
Couldn't apply cleanly the patch, as it appears to be a diff in otherformat.Anyway, applied it by hand, and now I attach the correct svn diff.The test cases run ok with this change, and the problem is solved.Regarding the delay introduced, I tested it with:  $ ./python timeit.py -s "import re;r=re.compile('a?a?a?a?a?aaaaa')""r.match('aaaaa')"Trunk:  100000 loops, best of 3: 5.4 usec per loop  100000 loops, best of 3: 5.32 usec per loop  100000 loops, best of 3: 5.41 usec per loopPatch applied:  100000 loops, best of 3: 7.28 usec per loop  100000 loops, best of 3: 6.79 usec per loop  100000 loops, best of 3: 7.00 usec per loopI don't like that. Anyway, I do NOT trust for timing the system whereI'm making the timing, so you may get different results.Suggestions?
msg59247 -(view)Author: Ralf Schmitt (schmir)Date: 2008-01-04 18:48
./pythonLib/timeit.py -n 1000000 -s "importre;r=re.compile('a?a?a?a?a?aaaaa')" "r.match('aaaaa')" gives me forTrunk:1000000 loops, best of 3: 3.02 usec per loop1000000 loops, best of 3: 2.99 usec per loop1000000 loops, best of 3: 3.01 usec per loopPatched:1000000 loops, best of 3: 3.04 usec per loop1000000 loops, best of 3: 3.04 usec per loop1000000 loops, best of 3: 3.14 usec per loopwhich would be ok, I guess.(This is on a 64bit debian testing with gcc 4.2.3).Can you test with the following:if ((0 == (sigcount & 0xffffffff)) && PyErr_CheckSignals())(i.e. the code will (nearly) not even call PyErr_CheckSignals).I guess this is some c compiler optimization issue (seems like mine doesa better job at optimizing :)
msg59267 -(view)Author: Guido van Rossum (gvanrossum)*(Python committer)Date: 2008-01-04 23:11
Mind if I assign this to Facundo? Facundo, if you wish to pass this on,just unassign it.
msg59600 -(view)Author: Facundo Batista (facundobatista)*(Python committer)Date: 2008-01-09 14:22
Retried it in a platform where I trust timing, and it proved ok.So, problem solved, no performance impact, all tests pass ok. Commitedinr59862.Thank you all!
msg61925 -(view)Author: Guido van Rossum (gvanrossum)*(Python committer)Date: 2008-01-31 20:07
I think this is worth backporting to 2.5.2.  This andr60054 are the*only* differences between _sre.c in 2.5.2 and 2.6.
msg62062 -(view)Author: Guido van Rossum (gvanrossum)*(Python committer)Date: 2008-02-05 04:13
Backported to 2.5.2 asr60576.  (The other deltas are not backported.)
History
DateUserActionArgs
2022-04-11 14:56:01adminsetgithub: 39573
2008-02-05 04:13:26gvanrossumsetmessages: +msg62062
2008-01-31 20:07:17gvanrossumsetmessages: +msg61925
2008-01-10 01:55:51christian.heimeslinkissue1448325 superseder
2008-01-09 14:22:46facundobatistasetstatus: open -> closed
resolution: fixed
messages: +msg59600
2008-01-04 23:11:28gvanrossumsetassignee:effbot ->facundobatista
messages: +msg59267
nosy: +gvanrossum
2008-01-04 18:48:22schmirsetmessages: +msg59247
2008-01-04 18:06:30facundobatistasetfiles: +sre_exception.diff
nosy: +facundobatista
type: enhancement ->
messages: +msg59246
versions: + Python 2.4, - Python 2.6, Python 2.5
2008-01-04 15:49:29christian.heimessettype: enhancement
versions: + Python 2.6, Python 2.5, - Python 2.4
2007-11-02 22:04:29schmirsetfiles: +patch.speedup
messages: +msg57068
2007-11-02 21:35:40schmirsetfiles: +patch
messages: +msg57067
2007-11-02 16:51:27schmirsetnosy: +schmir
messages: +msg57056
2003-11-21 06:29:21joshhoytcreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp