Posted onDec 29, 2023 • Edited onAug 13, 2024

Troubleshooting: EKS + Helm + Prometheus + Grafana

#eks #fargate #prometheus #grafana

In this post, I've compiled a list of issues along with their corresponding solutions that I encountered while configuring Prometheus and Grafana with Helm in the existing EKS Fargate cluster setup.

Error 1: Could not determine region from any metadata service. The region can be manually supplied via the AWS_REGION environment variable.

panic: did not find aws instance ID in node providerID string

$ k logs ebs-csi-controller-7f5c959c75-j92jf -n kube-system -c ebs-pluginI1228 04:31:45.536047       1 driver.go:78] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.25.0"I1228 04:31:45.536144       1 metadata.go:85] "retrieving instance data from ec2 metadata"I1228 04:31:58.152468       1 metadata.go:88] "ec2 metadata is not available"I1228 04:31:58.152491       1 metadata.go:96] "retrieving instance data from kubernetes api"I1228 04:31:58.153081       1 metadata.go:101] "kubernetes api is available"E1228 04:31:58.175387       1 controller.go:86] "Could not determine region from any metadata service. The region can be manually supplied via the AWS_REGION environment variable." err="did not find aws instance ID in node providerID string"panic: did not find aws instance ID in node providerID string

$ kubectl get pod -n kube-system -l "app.kubernetes.io/name=aws-ebs-csi-driver,app.kubernetes.io/instance=aws-ebs-csi-driver"NAME                                  READY   STATUS             RESTARTS       AGEebs-csi-controller-7f5c959c75-j92jf   0/6     CrashLoopBackOff   36 (9s ago)    10mebs-csi-controller-7f5c959c75-xpv9x   0/6     CrashLoopBackOff   36 (23s ago)   10mebs-csi-node-969qs                    3/3     Running            0              10m

Solution:
If you don't specify theregion of your cluster when installing aws-ebs-csi-driver will result in the ebs-csi-controller pods crashing, as the default region will be set to 'us-east-1'.

helm upgrade--install aws-ebs-csi-driver\--namespace kube-system\--set controller.region=eu-north-1\--set controller.serviceAccount.create=false\--set controller.serviceAccount.name=ebs-csi-controller-sa\  aws-ebs-csi-driver/aws-ebs-csi-driver

This is because of the ebs-plugin container "Could not determine region from any metadata service. The region can be manually supplied via the AWS_REGION environment variable."

Error 2: Values don't meet the specifications of the schema(s) in the following chart(s)

Error: values don't meet the specifications of the schema(s) in the following chart(s):prometheus:- server.remoteRead: Invalid type. Expected: array, given: objectalertmanager:- extraEnv: Invalid type. Expected: array, given: object

Solution:
The errors are a result of a version mismatch between the Prometheus version and the file used in the Helm installation. If you are using a customized prometheus_values.yml file, ensure you specify the precise version of Prometheus. Alternatively, if you do not use a customized file, make sure to use the latest version of the Prometheus file.

helm upgrade-i prometheus prometheus-community/prometheus\--namespace prometheus\--set alertmanager.persistentVolume.storageClass="gp2",server.persistentVolume.storageClass="gp2"\--version 15

I have used Prometheus version 15 here.

Error 3: 0/17 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/17 nodes are available: 17 Preemption is not helpful for scheduling..

$ k get events -n prometheusLAST SEEN   TYPE      REASON               OBJECT                                                MESSAGE2m13s       Warning   FailedScheduling     pod/prometheus-alertmanager-c7644896-td8xv            0/17 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/17 nodes are available: 17 Preemption is not helpful for scheduling..47m         Normal    SuccessfulCreate     replicaset/prometheus-alertmanager-c7644896           Created pod: prometheus-alertmanager-c7644896-td8xv2m30s       Warning   ProvisioningFailed   persistentvolumeclaim/prometheus-alertmanager         storageclass.storage.k8s.io "prometheus" not found

The pods, prometheus-alertmanager and prometheus-server, will remain in a pending status.

$ k get po -n prometheusNAME                                             READY   STATUS    RESTARTS   AGEprometheus-alertmanager-c7644896-q2nzm           0/2     Pending   0          74sprometheus-kube-state-metrics-8476bdcc64-f984p   1/1     Running   0          75sprometheus-node-exporter-r82k7                   1/1     Running   0          74sprometheus-pushgateway-665779d98f-zh2pf          1/1     Running   0          75sprometheus-server-6fd8bc8576-csqt8               0/2     Pending   0          75s

Solution:
This is due to missing storage class of 'prometheus' as clearly shown in the events logs. So go ahead and create the storage class shown below.

EBS_AZ=$(kubectl get nodes\-o=jsonpath="{.items[0].metadata.labels['topology\.kubernetes\.io\/zone']}")

echo"kind: StorageClassapiVersion: storage.k8s.io/v1metadata:  name: prometheus  namespace: prometheusprovisioner: ebs.csi.aws.comparameters:  type: gp2reclaimPolicy: RetainallowedTopologies:- matchLabelExpressions:  - key: topology.ebs.csi.aws.com/zone    values:    -$EBS_AZ" | kubectl apply-f -

Error 4: Failed to provision volume with StorageClass "prometheus": rpc error: code = Internal desc = Could not create volume "pvc-48b7c3d8-d46a-47be-90e7-3d59eb3f5844": could not create volume in EC2: NoCredentialProviders: no valid providers in chain...

$ kubectl get events --sort-by=.metadata.creationTimestamp -n prometheusLAST SEEN   TYPE      REASON                 OBJECT                                                MESSAGE30s         Normal    Provisioning           persistentvolumeclaim/prometheus-alertmanager         External provisioner is provisioning volume for claim "prometheus/prometheus-alertmanager"30s         Normal    Provisioning           persistentvolumeclaim/prometheus-server               External provisioner is provisioning volume for claim "prometheus/prometheus-server"5s          Warning   ProvisioningFailed     persistentvolumeclaim/prometheus-alertmanager         failed to provision volume with StorageClass "prometheus": rpc error: code = Internal desc = Could not create volume "pvc-b7373f3b-3da9-47ac-8bfb-ad396816ce88": could not create volume in EC2: NoCredentialProviders: no valid providers in chain...17s         Warning   ProvisioningFailed     persistentvolumeclaim/prometheus-server               failed to provision volume with StorageClass "prometheus": rpc error: code = Internal desc = Could not create volume "pvc-48b7c3d8-d46a-47be-90e7-3d59eb3f5844": could not create volume in EC2: NoCredentialProviders: no valid providers in chain...

Solution:
This issue arises from insufficient permissions assigned to the service account in the cluster, preventing it from provisioning the required persistent volumes.

You need to set service account details (with required IAM policies for the role) while install aws-ebs-csi-driver with Helm as shown here.

helm upgrade--install aws-ebs-csi-driver\--namespace kube-system\--set controller.region=eu-north-1\--set controller.serviceAccount.create=false\--set controller.serviceAccount.name=ebs-csi-controller-sa\  aws-ebs-csi-driver/aws-ebs-csi-driver

Error 5: The service account is absent in the EKS cluster setup, yet it is visible through theeksctl get command.

Solution:
Check whether you have added '--role-only' option while creating service account using eksctl.
If yes, delete the service account and recreate it without '--role-only' option as shown below.

eksctl create iamserviceaccount\--name ebs-csi-controller-sa\--namespace kube-system\--cluster api-dev\--role-name AmazonEKS_EBS_CSI_DriverRole\--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy\--override-existing-serviceaccounts--approve

Here, 'api-dev' is the cluster name. Replace it with your cluster name before running the command.

Thank you for taking the time to read 👏😊! I will continue to update this post as I encounter new issues. Feel free to mention any unlisted issues in the comment section. 🤝❤️

Check my post on setting upPrometheus and Grafana with existing EKS Fargate cluster - Monitoring

Top comments(0)

For further actions, you may consider blocking this person and/orreporting abuse

Movatterモバイル変換

DEV Community

Troubleshooting: EKS + Helm + Prometheus + Grafana

Top comments(0)

More fromAWS Community Builders