Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for Troubleshooting: EKS + Helm + Prometheus + Grafana
AWS Community Builders  profile imageNowsath
Nowsath forAWS Community Builders

Posted on • Edited on

     

Troubleshooting: EKS + Helm + Prometheus + Grafana

In this post, I've compiled a list of issues along with their corresponding solutions that I encountered while configuring Prometheus and Grafana with Helm in the existing EKS Fargate cluster setup.

Error 1: Could not determine region from any metadata service. The region can be manually supplied via the AWS_REGION environment variable.

panic: did not find aws instance ID in node providerID string

$ k logs ebs-csi-controller-7f5c959c75-j92jf -n kube-system -c ebs-pluginI1228 04:31:45.536047       1 driver.go:78] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.25.0"I1228 04:31:45.536144       1 metadata.go:85] "retrieving instance data from ec2 metadata"I1228 04:31:58.152468       1 metadata.go:88] "ec2 metadata is not available"I1228 04:31:58.152491       1 metadata.go:96] "retrieving instance data from kubernetes api"I1228 04:31:58.153081       1 metadata.go:101] "kubernetes api is available"E1228 04:31:58.175387       1 controller.go:86] "Could not determine region from any metadata service. The region can be manually supplied via the AWS_REGION environment variable." err="did not find aws instance ID in node providerID string"panic: did not find aws instance ID in node providerID string
Enter fullscreen modeExit fullscreen mode
$ kubectl get pod -n kube-system -l "app.kubernetes.io/name=aws-ebs-csi-driver,app.kubernetes.io/instance=aws-ebs-csi-driver"NAME                                  READY   STATUS             RESTARTS       AGEebs-csi-controller-7f5c959c75-j92jf   0/6     CrashLoopBackOff   36 (9s ago)    10mebs-csi-controller-7f5c959c75-xpv9x   0/6     CrashLoopBackOff   36 (23s ago)   10mebs-csi-node-969qs                    3/3     Running            0              10m
Enter fullscreen modeExit fullscreen mode

Solution:
If you don't specify theregion of your cluster when installing aws-ebs-csi-driver will result in the ebs-csi-controller pods crashing, as the default region will be set to 'us-east-1'.

helm upgrade--install aws-ebs-csi-driver\--namespace kube-system\--set controller.region=eu-north-1\--set controller.serviceAccount.create=false\--set controller.serviceAccount.name=ebs-csi-controller-sa\  aws-ebs-csi-driver/aws-ebs-csi-driver
Enter fullscreen modeExit fullscreen mode

This is because of the ebs-plugin container "Could not determine region from any metadata service. The region can be manually supplied via the AWS_REGION environment variable."


Error 2: Values don't meet the specifications of the schema(s) in the following chart(s)

Error: values don't meet the specifications of the schema(s) in the following chart(s):prometheus:- server.remoteRead: Invalid type. Expected: array, given: objectalertmanager:- extraEnv: Invalid type. Expected: array, given: object
Enter fullscreen modeExit fullscreen mode

Solution:
The errors are a result of a version mismatch between the Prometheus version and the file used in the Helm installation. If you are using a customized prometheus_values.yml file, ensure you specify the precise version of Prometheus. Alternatively, if you do not use a customized file, make sure to use the latest version of the Prometheus file.

helm upgrade-i prometheus prometheus-community/prometheus\--namespace prometheus\--set alertmanager.persistentVolume.storageClass="gp2",server.persistentVolume.storageClass="gp2"\--version 15
Enter fullscreen modeExit fullscreen mode

I have used Prometheus version 15 here.


Error 3: 0/17 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/17 nodes are available: 17 Preemption is not helpful for scheduling..

$ k get events -n prometheusLAST SEEN   TYPE      REASON               OBJECT                                                MESSAGE2m13s       Warning   FailedScheduling     pod/prometheus-alertmanager-c7644896-td8xv            0/17 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/17 nodes are available: 17 Preemption is not helpful for scheduling..47m         Normal    SuccessfulCreate     replicaset/prometheus-alertmanager-c7644896           Created pod: prometheus-alertmanager-c7644896-td8xv2m30s       Warning   ProvisioningFailed   persistentvolumeclaim/prometheus-alertmanager         storageclass.storage.k8s.io "prometheus" not found
Enter fullscreen modeExit fullscreen mode

The pods, prometheus-alertmanager and prometheus-server, will remain in a pending status.

$ k get po -n prometheusNAME                                             READY   STATUS    RESTARTS   AGEprometheus-alertmanager-c7644896-q2nzm           0/2     Pending   0          74sprometheus-kube-state-metrics-8476bdcc64-f984p   1/1     Running   0          75sprometheus-node-exporter-r82k7                   1/1     Running   0          74sprometheus-pushgateway-665779d98f-zh2pf          1/1     Running   0          75sprometheus-server-6fd8bc8576-csqt8               0/2     Pending   0          75s
Enter fullscreen modeExit fullscreen mode

Solution:
This is due to missing storage class of 'prometheus' as clearly shown in the events logs. So go ahead and create the storage class shown below.

EBS_AZ=$(kubectl get nodes\-o=jsonpath="{.items[0].metadata.labels['topology\.kubernetes\.io\/zone']}")
Enter fullscreen modeExit fullscreen mode
echo"kind: StorageClassapiVersion: storage.k8s.io/v1metadata:  name: prometheus  namespace: prometheusprovisioner: ebs.csi.aws.comparameters:  type: gp2reclaimPolicy: RetainallowedTopologies:- matchLabelExpressions:  - key: topology.ebs.csi.aws.com/zone    values:    -$EBS_AZ" | kubectl apply-f -
Enter fullscreen modeExit fullscreen mode

Error 4: Failed to provision volume with StorageClass "prometheus": rpc error: code = Internal desc = Could not create volume "pvc-48b7c3d8-d46a-47be-90e7-3d59eb3f5844": could not create volume in EC2: NoCredentialProviders: no valid providers in chain...

$ kubectl get events --sort-by=.metadata.creationTimestamp -n prometheusLAST SEEN   TYPE      REASON                 OBJECT                                                MESSAGE30s         Normal    Provisioning           persistentvolumeclaim/prometheus-alertmanager         External provisioner is provisioning volume for claim "prometheus/prometheus-alertmanager"30s         Normal    Provisioning           persistentvolumeclaim/prometheus-server               External provisioner is provisioning volume for claim "prometheus/prometheus-server"5s          Warning   ProvisioningFailed     persistentvolumeclaim/prometheus-alertmanager         failed to provision volume with StorageClass "prometheus": rpc error: code = Internal desc = Could not create volume "pvc-b7373f3b-3da9-47ac-8bfb-ad396816ce88": could not create volume in EC2: NoCredentialProviders: no valid providers in chain...17s         Warning   ProvisioningFailed     persistentvolumeclaim/prometheus-server               failed to provision volume with StorageClass "prometheus": rpc error: code = Internal desc = Could not create volume "pvc-48b7c3d8-d46a-47be-90e7-3d59eb3f5844": could not create volume in EC2: NoCredentialProviders: no valid providers in chain...
Enter fullscreen modeExit fullscreen mode

Solution:
This issue arises from insufficient permissions assigned to the service account in the cluster, preventing it from provisioning the required persistent volumes.

You need to set service account details (with required IAM policies for the role) while install aws-ebs-csi-driver with Helm as shown here.

helm upgrade--install aws-ebs-csi-driver\--namespace kube-system\--set controller.region=eu-north-1\--set controller.serviceAccount.create=false\--set controller.serviceAccount.name=ebs-csi-controller-sa\  aws-ebs-csi-driver/aws-ebs-csi-driver
Enter fullscreen modeExit fullscreen mode

Error 5: The service account is absent in the EKS cluster setup, yet it is visible through theeksctl get command.

Solution:
Check whether you have added '--role-only' option while creating service account using eksctl.
If yes, delete the service account and recreate it without '--role-only' option as shown below.

eksctl create iamserviceaccount\--name ebs-csi-controller-sa\--namespace kube-system\--cluster api-dev\--role-name AmazonEKS_EBS_CSI_DriverRole\--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy\--override-existing-serviceaccounts--approve
Enter fullscreen modeExit fullscreen mode

Here, 'api-dev' is the cluster name. Replace it with your cluster name before running the command.


Thank you for taking the time to read 👏😊! I will continue to update this post as I encounter new issues. Feel free to mention any unlisted issues in the comment section. 🤝❤️

Check my post on setting upPrometheus and Grafana with existing EKS Fargate cluster - Monitoring

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Build On!

Would you like to become an AWS Community Builder? Learn more about the program and apply to join when applications are open next.

More fromAWS Community Builders

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp