This happens when the jobs-emailer pod in the jobs-emailer namespace of tools/toolsbeta k8s cluster is down or can't be reached.
This usually comes in the form of analert in alertmanager.
There you will get which project (tools, toolsbeta, ...) is the one it's failing for.
The first most likely step is to ssh to tools/toolsbeta (depending on the project the alert is from) k8s-control servers (i.e toolsbeta-test-k8s-control-4.toolsbeta.eqiad1.wikimedia.cloud). From there you can:
dcaro@tools-bastion-13:~$kubectl-sudogetpods-njobs-emailerNAMEREADYSTATUSRESTARTSAGEjobs-emailer-5946fb7cd5-6nhrm1/1Running043m
kubectl logs -n jobs-emailer deploy/jobs-emailer.kubectl rollout restart -n jobs-emailer deployment/jobs-emailerYou can try doing a curl directly to the pods for the statisticts, by checking the configuration of prometheus, you'll get the cert, key and url:
root@tools-prometheus-7:~#grep'job_name.*jobs-emailer'-A40/srv/prometheus/tools/prometheus.yml-job_name:jobs-emailerscheme:httpstls_config:insecure_skip_verify:truecert_file:"/etc/ssl/localcerts/toolforge-k8s-prometheus.crt"key_file:"/etc/ssl/private/toolforge-k8s-prometheus.key"kubernetes_sd_configs:-api_server:https://k8s.tools.eqiad1.wikimedia.cloud:6443role:podtls_config:insecure_skip_verify:truecert_file:"/etc/ssl/localcerts/toolforge-k8s-prometheus.crt"key_file:"/etc/ssl/private/toolforge-k8s-prometheus.key"namespaces:names:-jobs-emailer...
Then you can curl directly the pods by name, like:
root@tools-prometheus-6:~#curl\--insecure\--cert/etc/ssl/localcerts/toolforge-k8s-prometheus.crt\--key/etc/ssl/private/toolforge-k8s-prometheus.key\'https://k8s.tools.eqiad1.wikimedia.cloud:6443/api/v1/namespaces/jobs-emailer/pods/jobs-emailer-5946fb7cd5-6nhrm/proxy/metrics'....
Add new issues here when you encounter them!
If jobs-emailer seems up, you can check if the certificates that prometheus uses to connect to k8s have expired (there should have been another alert though)Portal:Toolforge/Admin/Runbooks/PrometheusK8sCertExpirySoon.
Add any incident tasks here!