Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork7k
Speed up ProhibitSurrogateCharactersValidator#9492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
For the record, here's the benchmark: importtimeitfromrest_framework.validatorsimportProhibitSurrogateCharactersValidatorvalidator=ProhibitSurrogateCharactersValidator()foriinrange(0,9):length=10**istring="a"*lengthtimer=timeit.Timer("validator(string)",globals=globals())loops,duration=timer.autorange()print(f"{length}\t{duration/loops}") |
Okay, so this is a good exercise case... It's a neat and properly defined little PR that'sclearly an improvement. Well... because having a steadfast "no" policy is really uncomplicated, and means we don't have continual PR pressure & creep, whichon balance just results in unnecessary busy-work and risk. I'd suggest we need to be more clear in our language in Suggested language might. be...
Yes, there's clearly areas where there's potential performance impacts. There's alsoplenty of areas where there's loosely defined behaviour. However being really clear that this is essentiallya finished project (excepting the points above) would probably be a net benefit. (The alternative is similar to above, but we allow ourselvessome leeway with a "We'll only consider pull requests that meet an unambiguously high bar of quality". 🤔) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I've noticed that this validator is using a per-character loop. Replacing itwith a regex results in a pretty significant speedup. Here are results frommy benchmark: String length Old implementation New implementation time (sec) time (sec) 1 2.833e-07 1.765e-07 10 5.885e-07 2.030e-07 100 3.598e-06 4.144e-07 1000 3.329e-05 2.463e-06 10000 0.0003338 2.449e-05 100000 0.003338 0.0002284 1000000 0.03333 0.002278 10000000 0.3389 0.02377 100000000 3.250 0.2365For large strings, the speedups are more than an order of magnitude.
I have rebased this in case you change your mind. If not, just close the PR. FWIW, thecurrent wording in
Since this is neither a new feature nor a code formatting change, it seems like it should be in scope. Of course, it's your project, so I understand if you want to reject it anyway. |
Description
I've noticed that this validator is using a per-character loop. Replacing it with a regex results in a pretty significant speedup. Here are results from my benchmark:
For large strings, the speedups are more than an order of magnitude.