Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Added a rate limiter for load reduction on the website#52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
Bash- wants to merge3 commits intoc4software:master
base:master
Choose a base branch
Loading
fromBash-:master

Conversation

@Bash-
Copy link

We found that websites found the scraper too resource consuming. Therefore I added this configurable rate limiter, to be able to decrease the number of requests per time period.

@c4software
Copy link
Owner

Hi,

It's a nice addition, however ratelimit seems not present in the default library. It's an additional library (https://pypi.org/project/ratelimit/) ?

@c4softwarec4software self-requested a reviewDecember 13, 2018 15:58
@Bash-
Copy link
Author

Hi,

It's a nice addition, however ratelimit seems not present in the default library. It's an additional library (https://pypi.org/project/ratelimit/) ?

Thank you. Yes that is indeed the library which is needed

crawler_user_agent='Sitemap crawler'

number_calls=1# number of requests per call period
call_period=15# time in seconds per number of requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The default should be no rate limiting, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No limit would be in line with how the original code works, yes. I could not find a default option in the ratelimit package, you could set the default to a very high number as a workaround though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

One possibility would be having something like this incrawler.py:

ifnumber_callsisNoneorcall_periodisNone:rate_limit_decorator=lambdafunc:funcelse:rate_limit_decorator=limits(calls=config.number_calls,period=config.call_period)

Haven't dealt with@sleep_and_retry, but I believe should be possible to combine that intorate_limit_decorator. Of course, this strategy comes at the cost of increasing the complexity of the code.

@Garrett-R
Copy link
Contributor

Coincidentally, I'm adding arequirements.txt file inthis PR. If the decision is made that that is desirable and the PR is merged, then you could add a line to that file:

ratelimit==2.2.1

Without this, I reckon this PR can't be merged, otherwise, it won't work for people who don't happen to haveratelimite pre-installed.

@Bash-
Copy link
Author

@Garrett-R I think it still would be a nice addition
@c4software Shall we include the ratelimiter?

@c4software
Copy link
Owner

Hi,

Ratelimiting is a good addition, but i'm not a big fan of a tierce library. Not because its a tierce library, but I really like the idea of a « bare metal » tool.

What do you think@Garrett-R@Bash- having to rely to a tierce library is not a problem for you ?

@Garrett-R
Copy link
Contributor

Yeah, it's an interesting point and wasn't sure what you'd think of introducing arequirements.txt file. There's definitely some benefit to having no dependencies. Going from 0 to 1 dependencies is a big difference (while going from 1 to 2 is not). I think it's really up to you and your vision for this already successful project.

A couple factors I'd be weighing if I were you:

  • Watch out for "boiling the frog". Given it currently has 0 dependencies, it might seem adding the first one is not worth it for that particular change, but if that decision is made multiple times, it may be that all the changes in aggregate are worth adding dependencies.
  • A bit harder to build the repo from source and use / contribute (favors not adding dependencies)
  • How many more dependencies might we benefit from? If these are kind of the last 2, then we have an idea of total benefits of adding dependencies. On the other hand, if we think there could be more in the future, it'll make those contributions easier. I don't have a good sense here
  • How hard is it to do without the dependencies (definitely seems doable to make both this change and my change without BTW)

On the point about how tough the package is to use for folks, I actually think we should put this on PYPI (made issuehere). In that case folks would just have to do:

pipinstallpython-sitemap

and the extra dependencies will automatically be handled.

I'd probably cast my vote for having dependencies, but I can definitely see benefits to both approaches, so I support whatever you think is best.

@Garrett-R
Copy link
Contributor

You know, in light of thexz backdoor, I have a new appreciation for avoiding dependencies...

Maybe close this one?

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@c4softwarec4softwareAwaiting requested review from c4software

1 more reviewer

@Garrett-RGarrett-RGarrett-R left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@Bash-@c4software@Garrett-R

[8]ページ先頭

©2009-2025 Movatter.jp