
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2003-11-21 06:29 byjoshhoyt, last changed2022-04-11 14:56 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| _sre.c.patch | joshhoyt,2003-11-21 06:29 | Allow signal handlers to run during _sre searches | ||
| patch | schmir,2007-11-02 21:35 | |||
| patch.speedup | schmir,2007-11-02 22:04 | |||
| sre_exception.diff | facundobatista,2008-01-04 18:06 | |||
| Messages (12) | |||
|---|---|---|---|
| msg44910 -(view) | Author: Josh Hoyt (joshhoyt) | Date: 2003-11-21 06:29 | |
This patch adds a call to PyErr_CheckSignals toSRE_MATCH so that signal handlers can be invoked duringlong regular expression matches. It also adds a newerror return value indicating that an exceptionoccurred in a signal handler during the match, allowingexceptions in the signal handler to propagate up to themain loop.Rationale:Regular expressions can run away inside of the C code.There is no way for Python code to stop the C code fromrunning away, so we attempted to use setrlimit tointerrupt the process when the CPU usage exceeded a limit.When the signal was received, the signal function wastriggered. The sre code does not allow the main loop torun, so the triggered handlers were not called untilthe regular expression finished, if ever. Thisbehaviour makes the interruption by the signal uselessfor the purposes of constraining the running time ofregular expression matches.I am unsure whether the PyErr_CheckSignals islightweight enough to be called inside of the for loopin SRE_MATCH, so I took the conservative approach andonly checked on recursion or match invocation. Ibelieve that the performance hit from this check wouldnot be prohibitive inside of the loop, sincePyErr_CheckSignals does very little work unless thereis a signal to handle. | |||
| msg44911 -(view) | Author: Martin v. Löwis (loewis)*![]() | Date: 2003-11-22 15:14 | |
Logged In: YES user_id=21627Can you give an example for a SRE matching that is so slowthat the user may press Ctrl-C? | |||
| msg44912 -(view) | Author: Martin v. Löwis (loewis)*![]() | Date: 2006-11-13 23:28 | |
Logged In: YES user_id=21627Fredrik, what do you think about this patch? | |||
| msg57056 -(view) | Author: Ralf Schmitt (schmir) | Date: 2007-11-02 16:51 | |
here is an example (fromhttp://swtch.com/~rsc/regexp/regexp1.html)python -c 'import re; num=25; r=re.compile("a?"*num+"a"*num);r.match("a"*num)'At work I have seen a real world case of a regular expression which ranfor minutes rendering the application unresponsive. We would have beenglad to be able to interrupt the regular expression engine and getting atraceback. | |||
| msg57067 -(view) | Author: Ralf Schmitt (schmir) | Date: 2007-11-02 21:35 | |
I'm attaching a working patch against 2.5.1 and a short test program.#! /usr/bin/env pythonimport signalimport reimport timedef main(): num=28 # need more than 60s on a 2.4Ghz core 2 r=re.compile("a?"*num+"a"*num) signal.signal(signal.SIGALRM, signal.default_int_handler) signal.alarm(1) stime = time.time() try: r.match("a"*num) except KeyboardInterrupt: assert time.time()-stime<3 else: raise RuntimeError("no keyboard interrupt")if __name__=='__main__': main() | |||
| msg57068 -(view) | Author: Ralf Schmitt (schmir) | Date: 2007-11-02 22:04 | |
hm. just noticed that calling PyErr_CheckSignals slows down regularexpression matching considerably (50%).I'm adding another patch, which only checks every 4096th iteration forsignals. | |||
| msg59246 -(view) | Author: Facundo Batista (facundobatista)*![]() | Date: 2008-01-04 18:06 | |
Couldn't apply cleanly the patch, as it appears to be a diff in otherformat.Anyway, applied it by hand, and now I attach the correct svn diff.The test cases run ok with this change, and the problem is solved.Regarding the delay introduced, I tested it with: $ ./python timeit.py -s "import re;r=re.compile('a?a?a?a?a?aaaaa')""r.match('aaaaa')"Trunk: 100000 loops, best of 3: 5.4 usec per loop 100000 loops, best of 3: 5.32 usec per loop 100000 loops, best of 3: 5.41 usec per loopPatch applied: 100000 loops, best of 3: 7.28 usec per loop 100000 loops, best of 3: 6.79 usec per loop 100000 loops, best of 3: 7.00 usec per loopI don't like that. Anyway, I do NOT trust for timing the system whereI'm making the timing, so you may get different results.Suggestions? | |||
| msg59247 -(view) | Author: Ralf Schmitt (schmir) | Date: 2008-01-04 18:48 | |
./pythonLib/timeit.py -n 1000000 -s "importre;r=re.compile('a?a?a?a?a?aaaaa')" "r.match('aaaaa')" gives me forTrunk:1000000 loops, best of 3: 3.02 usec per loop1000000 loops, best of 3: 2.99 usec per loop1000000 loops, best of 3: 3.01 usec per loopPatched:1000000 loops, best of 3: 3.04 usec per loop1000000 loops, best of 3: 3.04 usec per loop1000000 loops, best of 3: 3.14 usec per loopwhich would be ok, I guess.(This is on a 64bit debian testing with gcc 4.2.3).Can you test with the following:if ((0 == (sigcount & 0xffffffff)) && PyErr_CheckSignals())(i.e. the code will (nearly) not even call PyErr_CheckSignals).I guess this is some c compiler optimization issue (seems like mine doesa better job at optimizing :) | |||
| msg59267 -(view) | Author: Guido van Rossum (gvanrossum)*![]() | Date: 2008-01-04 23:11 | |
Mind if I assign this to Facundo? Facundo, if you wish to pass this on,just unassign it. | |||
| msg59600 -(view) | Author: Facundo Batista (facundobatista)*![]() | Date: 2008-01-09 14:22 | |
Retried it in a platform where I trust timing, and it proved ok.So, problem solved, no performance impact, all tests pass ok. Commitedinr59862.Thank you all! | |||
| msg61925 -(view) | Author: Guido van Rossum (gvanrossum)*![]() | Date: 2008-01-31 20:07 | |
I think this is worth backporting to 2.5.2. This andr60054 are the*only* differences between _sre.c in 2.5.2 and 2.6. | |||
| msg62062 -(view) | Author: Guido van Rossum (gvanrossum)*![]() | Date: 2008-02-05 04:13 | |
Backported to 2.5.2 asr60576. (The other deltas are not backported.) | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:56:01 | admin | set | github: 39573 |
| 2008-02-05 04:13:26 | gvanrossum | set | messages: +msg62062 |
| 2008-01-31 20:07:17 | gvanrossum | set | messages: +msg61925 |
| 2008-01-10 01:55:51 | christian.heimes | link | issue1448325 superseder |
| 2008-01-09 14:22:46 | facundobatista | set | status: open -> closed resolution: fixed messages: +msg59600 |
| 2008-01-04 23:11:28 | gvanrossum | set | assignee:effbot ->facundobatista messages: +msg59267 nosy: +gvanrossum |
| 2008-01-04 18:48:22 | schmir | set | messages: +msg59247 |
| 2008-01-04 18:06:30 | facundobatista | set | files: +sre_exception.diff nosy: +facundobatista type: enhancement -> messages: +msg59246 versions: + Python 2.4, - Python 2.6, Python 2.5 |
| 2008-01-04 15:49:29 | christian.heimes | set | type: enhancement versions: + Python 2.6, Python 2.5, - Python 2.4 |
| 2007-11-02 22:04:29 | schmir | set | files: +patch.speedup messages: +msg57068 |
| 2007-11-02 21:35:40 | schmir | set | files: +patch messages: +msg57067 |
| 2007-11-02 16:51:27 | schmir | set | nosy: +schmir messages: +msg57056 |
| 2003-11-21 06:29:21 | joshhoyt | create | |