Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Help with 503 upstream timeouts on Kubernetes + FastAPI (HPA not scaling)#14417

Unanswered
gcarrascoro asked this question inQuestions
Discussion options

First Check

  • I added a very descriptive title here.
  • I used the GitHub search to find a similar question and didn't find it.
  • I searched the FastAPI documentation, with the integrated search.
  • I already searched in Google "How to X in FastAPI" and didn't find any information.
  • I already read and followed all the tutorial in the docs and didn't find an answer.
  • I already checked if it is not related to FastAPI but toPydantic.
  • I already checked if it is not related to FastAPI but toSwagger UI.
  • I already checked if it is not related to FastAPI but toReDoc.

Commit to Help

  • I commit to help with one of those options 👆

Example Code

.

Description

Hey all!
I’m running a FastAPI application on Kubernetes and encountering 503 upstream timeout errors under load. I suspect my issue is related to a mismatch between sync endpoints, worker capacity, and autoscaling not triggering (HPA).

My setup:

  • FastAPI app
  • Endpoints are currently sync (def)

Running with:

  • autoscaling/v2
  • Ingress: projectcontour.io/v1 (Contour Ingress)

The application seems to not handle bursts of requests well and also not hit enough resource usage to trigger HPA scaling which leads to the 503s.

What’s the recommended approach for handling sync endpoints that may block workers but don’t consume enough CPU/ memory to trigger HPA? Any recommended Gunicorn/Uvicorn worker configurations? I understand from the documentation that when working with pods, the application should run with the fewest workers and be scaled up or down based on workload.

Are there known best practices for HPA when applications are I/O bound but not CPU bound (so autoscaling doesn’t trigger)?
Moreover, what's the best approach mark the overloaded pod so no more requests are sent and fail with 503?

Any help or tips is more than welcome. Thanks!

Operating System

Windows

Operating System Details

No response

FastAPI Version

0.79.0

Pydantic Version

1.10.24

Python Version

3.11.12

Additional Context

No response

You must be logged in to vote

Replies: 0 comments

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Labels
questionQuestion or problem
1 participant
@gcarrascoro

[8]ページ先頭

©2009-2025 Movatter.jp