Scaling Self Hosted¶
Logfire is designed to be horizontally scalable, and can handle a lot of traffic. Depending on your usage patterns, however, you may be required to scale certain pods in order to maintain performance.
Please use thearchitecture diagram as reference.
PostgreSQL Configuration¶
PostgreSQL is not managed by the logfire helm chart. It is assumed that an existing cluster is available in your environment. If not, a good solution to deploying PostgreSQL in Kubernetes isCloudNativePG.
Note: No telemetry data is stored within PostgreSQL. We use PostgreSQL to manage organisations, projects, dashboards etc. and for tracking/compacting files within object storage.
A recommended starting size would be 4 vCPUs and 16gb RAM.
Here are some parameters you can use to start tuning:
Parameter | Value |
---|---|
autovacuum_analyze_scale_factor | 0.05 |
autovacuum_analyze_threshold | 50 |
autovacuum_max_workers | 6 |
autovacuum_naptime | 30 |
autovacuum_vacuum_cost_delay | 1 |
autovacuum_vacuum_cost_limit | 2000 |
autovacuum_vacuum_scale_factor | 0.1 |
autovacuum_vacuum_threshold | 50 |
idle_in_transaction_session_timeout | 60000 |
log_autovacuum_min_duration | 600000 |
log_lock_waits | on |
log_min_duration_statement | 1000 |
maintenance_work_mem | 4000000 |
max_connections | 2048 |
max_wal_size | 16000000 |
max_slot_wal_keep_size | 8000000 |
random_page_cost | 1.1 |
work_mem | 128000 |
Scaling Configuration¶
Each service can have standard kubernetes replicas, resource limits and autoscaling configured:
<service_name>:# -- Resource limits and allocationsresources:cpu:"1"memory:"1Gi"# -- Autoscaler settingsautoscaling:minReplicas:2maxReplicas:4hpa:enabled:truememAverage:65cpuAverage:20# -- POD Disruption Budgetpdb:maxUnavailable:1minAvaliable:1
Recommended Starting Values¶
By default, the helm chart only includes a single replica for all pods, and no configured resource limits. When bringing self hosted to production, you will need to adjust the scaling of each service. This is depenent on the usage patterns of your instance.
I.e, if a lot of querying is going on, or there are a high number of dashboards, then you may need to scale up the query api and cache. Conversely, if you are write heavy, but don't query as much, you may need to scale up ingest. You can use the CPU and memory resources to gauge how busy certain aspects of Logfire are.
In the event that the system is not performing well, and there is no obvious CPU/Memory spikes, then please have a look at accessing the meta project in thetroubleshooting section to understand internally what's going on.
Here are some recommended values to get you started:
logfire-backend:resources:cpu:"600m"memory:"1Gi"autoscaling:minReplicas:2maxReplicas:4hpa:enabled:truememAverage:65cpuAverage:40pdb:minAvailable:1logfire-ff-query-api:resources:cpu:"500m"memory:"2Gi"autoscaling:minReplicas:2maxReplicas:8hpa:enabled:truememAverage:70cpuAverage:60pdb:minAvailable:1logfire-ff-cache-byte:resources:cpu:"2"memory:"8Gi"autoscaling:minReplicas:1maxReplicas:2hpa:enabled:truememAverage:65cpuAverage:20scratchVolume:storageClassName:my-storage-classstorage:256Gipdb:minAvailable:1logfire-ff-cache-ipc:resources:cpu:"2"memory:"8Gi"autoscaling:minReplicas:1maxReplicas:3hpa:enabled:truememAverage:65cpuAverage:40pdb:minAvailable:1logfire-ff-ingest:volumeClaimTemplates:storageClassName:my-storage-classstorage:"16Gi"resources:cpu:"2"memory:"4Gi"autoscaling:minReplicas:2maxReplicas:24hpa:enabled:truememAverage:40cpuAverage:60pdb:minAvailable:1logfire-ff-ingest-processor:resources:cpu:"2"memory:"4Gi"autoscaling:minReplicas:2maxReplicas:24hpa:enabled:truememAverage:40cpuAverage:60pdb:minAvailable:1logfire-ff-compaction-worker:resources:cpu:"2"memory:"8Gi"autoscaling:minReplicas:1maxReplicas:5hpa:enabled:truememAverage:50cpuAverage:80pdb:minAvailable:1logfire-ff-maintenance-worker:resources:cpu:"2"memory:"8Gi"autoscaling:minReplicas:1maxReplicas:4hpa:enabled:truememAverage:50cpuAverage:50pdb:minAvailable:1