Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork8.3k
Help with 503 upstream timeouts on Kubernetes + FastAPI (HPA not scaling)#14417
-
First Check
Commit to Help
Example Code. DescriptionHey all! My setup:
Running with:
The application seems to not handle bursts of requests well and also not hit enough resource usage to trigger HPA scaling which leads to the 503s. What’s the recommended approach for handling sync endpoints that may block workers but don’t consume enough CPU/ memory to trigger HPA? Any recommended Gunicorn/Uvicorn worker configurations? I understand from the documentation that when working with pods, the application should run with the fewest workers and be scaled up or down based on workload. Are there known best practices for HPA when applications are I/O bound but not CPU bound (so autoscaling doesn’t trigger)? Any help or tips is more than welcome. Thanks! Operating SystemWindows Operating System DetailsNo response FastAPI Version0.79.0 Pydantic Version1.10.24 Python Version3.11.12 Additional ContextNo response |
BetaWas this translation helpful?Give feedback.