- Notifications
You must be signed in to change notification settings - Fork523
Description
I am deploying crawlee in a kubernetes pod. It gets recurrently OOMKilled because crawlee increases the desired concurrency continuously. I don't want to decrease the max_concurrency because I am crawling domains that are super lightweight while crawling others that aren't, and I'd like crawlee to maximize the throughput. I also could increase the RAM for the pod, but I think there is an underlying issue that would come up later (or I would just underuse my resources)
I am seeing this log which makes me suspicious crawlee doesn't actually know the memory and cpu it is using:current_concurrency = 21; desired_concurrency = 21; cpu = 0.0; mem = 0.0; event_loop = 0.148; client_info = 0.0
Cpu is not a big problem because the kernel throttles cpu for this pod, but mem is a hardlimit and kubernetes kills the pod.
For more context I am using the Playwright adaptative crawler with beautiful soup and headless firefox and my concurrency settings are: concurrency_settings = ConcurrencySettings(max_concurrency=45, desired_concurrency=10, min_concurrency=10)
and I am giving the pod
resources: limits: cpu: "12" memory: "12Gi" requests: cpu: "8" memory: "8Gi"