ReplicaSet
A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. As such, it is oftenused to guarantee the availability of a specified number of identical Pods.
How a ReplicaSet works
A ReplicaSet is defined with fields, including a selector that specifies how to identify Pods it can acquire, a numberof replicas indicating how many Pods it should be maintaining, and a pod template specifying the data of new Podsit should create to meet the number of replicas criteria. A ReplicaSet then fulfills its purpose by creatingand deleting Pods as needed to reach the desired number. When a ReplicaSet needs to create new Pods, it uses its Podtemplate.
A ReplicaSet is linked to its Pods via the Pods'metadata.ownerReferencesfield, which specifies what resource the current object is owned by. All Pods acquired by a ReplicaSet have their owningReplicaSet's identifying information within their ownerReferences field. It's through this link that the ReplicaSetknows of the state of the Pods it is maintaining and plans accordingly.
A ReplicaSet identifies new Pods to acquire by using its selector. If there is a Pod that has noOwnerReference or the OwnerReference is not aController and itmatches a ReplicaSet's selector, it will be immediately acquired by said ReplicaSet.
When to use a ReplicaSet
A ReplicaSet ensures that a specified number of pod replicas are running at any giventime. However, a Deployment is a higher-level concept that manages ReplicaSets andprovides declarative updates to Pods along with a lot of other useful features.Therefore, we recommend using Deployments instead of directly using ReplicaSets, unlessyou require custom update orchestration or don't require updates at all.
This actually means that you may never need to manipulate ReplicaSet objects:use a Deployment instead, and define your application in the spec section.
Example
apiVersion:apps/v1kind:ReplicaSetmetadata:name:frontendlabels:app:guestbooktier:frontendspec:# modify replicas according to your casereplicas:3selector:matchLabels:tier:frontendtemplate:metadata:labels:tier:frontendspec:containers:-name:php-redisimage:us-docker.pkg.dev/google-samples/containers/gke/gb-frontend:v5Saving this manifest intofrontend.yaml and submitting it to a Kubernetes cluster willcreate the defined ReplicaSet and the Pods that it manages.
kubectl apply -f https://kubernetes.io/examples/controllers/frontend.yamlYou can then get the current ReplicaSets deployed:
kubectl get rsAnd see the frontend one you created:
NAME DESIRED CURRENT READY AGEfrontend 3 3 3 6sYou can also check on the state of the ReplicaSet:
kubectl describe rs/frontendAnd you will see output similar to:
Name: frontendNamespace: defaultSelector: tier=frontendLabels: app=guestbook tier=frontendAnnotations: <none>Replicas: 3 current / 3 desiredPods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 FailedPod Template: Labels: tier=frontend Containers: php-redis: Image: us-docker.pkg.dev/google-samples/containers/gke/gb-frontend:v5 Port: <none> Host Port: <none> Environment: <none> Mounts: <none> Volumes: <none>Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 13s replicaset-controller Created pod: frontend-gbgfx Normal SuccessfulCreate 13s replicaset-controller Created pod: frontend-rwz57 Normal SuccessfulCreate 13s replicaset-controller Created pod: frontend-wkl7wAnd lastly you can check for the Pods brought up:
kubectl get podsYou should see Pod information similar to:
NAME READY STATUS RESTARTS AGEfrontend-gbgfx 1/1 Running 0 10mfrontend-rwz57 1/1 Running 0 10mfrontend-wkl7w 1/1 Running 0 10mYou can also verify that the owner reference of these pods is set to the frontend ReplicaSet.To do this, get the yaml of one of the Pods running:
kubectl get pods frontend-gbgfx -o yamlThe output will look similar to this, with the frontend ReplicaSet's info set in the metadata's ownerReferences field:
apiVersion:v1kind:Podmetadata:creationTimestamp:"2024-02-28T22:30:44Z"generateName:frontend-labels:tier:frontendname:frontend-gbgfxnamespace:defaultownerReferences:-apiVersion:apps/v1blockOwnerDeletion:truecontroller:truekind:ReplicaSetname:frontenduid:e129deca-f864-481b-bb16-b27abfd92292...Non-Template Pod acquisitions
While you can create bare Pods with no problems, it is strongly recommended to make sure that the bare Pods do not havelabels which match the selector of one of your ReplicaSets. The reason for this is because a ReplicaSet is not limitedto owning Pods specified by its template-- it can acquire other Pods in the manner specified in the previous sections.
Take the previous frontend ReplicaSet example, and the Pods specified in the following manifest:
apiVersion:v1kind:Podmetadata:name:pod1labels:tier:frontendspec:containers:-name:hello1image:gcr.io/google-samples/hello-app:2.0---apiVersion:v1kind:Podmetadata:name:pod2labels:tier:frontendspec:containers:-name:hello2image:gcr.io/google-samples/hello-app:1.0As those Pods do not have a Controller (or any object) as their owner reference and match the selector of the frontendReplicaSet, they will immediately be acquired by it.
Suppose you create the Pods after the frontend ReplicaSet has been deployed and has set up its initial Pod replicas tofulfill its replica count requirement:
kubectl apply -f https://kubernetes.io/examples/pods/pod-rs.yamlThe new Pods will be acquired by the ReplicaSet, and then immediately terminated as the ReplicaSet would be overits desired count.
Fetching the Pods:
kubectl get podsThe output shows that the new Pods are either already terminated, or in the process of being terminated:
NAME READY STATUS RESTARTS AGEfrontend-b2zdv 1/1 Running 0 10mfrontend-vcmts 1/1 Running 0 10mfrontend-wtsmm 1/1 Running 0 10mpod1 0/1 Terminating 0 1spod2 0/1 Terminating 0 1sIf you create the Pods first:
kubectl apply -f https://kubernetes.io/examples/pods/pod-rs.yamlAnd then create the ReplicaSet however:
kubectl apply -f https://kubernetes.io/examples/controllers/frontend.yamlYou shall see that the ReplicaSet has acquired the Pods and has only created new ones according to its spec until thenumber of its new Pods and the original matches its desired count. As fetching the Pods:
kubectl get podsWill reveal in its output:
NAME READY STATUS RESTARTS AGEfrontend-hmmj2 1/1 Running 0 9spod1 1/1 Running 0 36spod2 1/1 Running 0 36sIn this manner, a ReplicaSet can own a non-homogeneous set of Pods
Writing a ReplicaSet manifest
As with all other Kubernetes API objects, a ReplicaSet needs theapiVersion,kind, andmetadata fields.For ReplicaSets, thekind is always a ReplicaSet.
When the control plane creates new Pods for a ReplicaSet, the.metadata.name of theReplicaSet is part of the basis for naming those Pods. The name of a ReplicaSet must be a validDNS subdomainvalue, but this can produce unexpected results for the Pod hostnames. For best compatibility,the name should follow the more restrictive rules for aDNS label.
A ReplicaSet also needs a.spec section.
Pod Template
The.spec.template is apod template which is alsorequired to have labels in place. In ourfrontend.yaml example we had one label:tier: frontend.Be careful not to overlap with the selectors of other controllers, lest they try to adopt this Pod.
For the template'srestart policy field,.spec.template.spec.restartPolicy, the only allowed value isAlways, which is the default.
Pod Selector
The.spec.selector field is alabel selector. As discussedearlier these are the labels used to identify potential Pods to acquire. In ourfrontend.yaml example, the selector was:
matchLabels:tier:frontendIn the ReplicaSet,.spec.template.metadata.labels must matchspec.selector, or it willbe rejected by the API.
Note:
For 2 ReplicaSets specifying the same.spec.selector but different.spec.template.metadata.labels and.spec.template.spec fields, each ReplicaSet ignores thePods created by the other ReplicaSet.Replicas
You can specify how many Pods should run concurrently by setting.spec.replicas. The ReplicaSet will create/deleteits Pods to match this number.
If you do not specify.spec.replicas, then it defaults to 1.
Working with ReplicaSets
Deleting a ReplicaSet and its Pods
To delete a ReplicaSet and all of its Pods, usekubectl delete. TheGarbage collector automatically deletes all ofthe dependent Pods by default.
When using the REST API or theclient-go library, you must setpropagationPolicy toBackground orForeground in the-d option. For example:
kubectl proxy --port=8080curl -X DELETE'localhost:8080/apis/apps/v1/namespaces/default/replicasets/frontend'\ -d'{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}'\ -H"Content-Type: application/json"Deleting just a ReplicaSet
You can delete a ReplicaSet without affecting any of its Pods usingkubectl deletewith the--cascade=orphan option.When using the REST API or theclient-go library, you must setpropagationPolicy toOrphan.For example:
kubectl proxy --port=8080curl -X DELETE'localhost:8080/apis/apps/v1/namespaces/default/replicasets/frontend'\ -d'{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Orphan"}'\ -H"Content-Type: application/json"Once the original is deleted, you can create a new ReplicaSet to replace it. As longas the old and new.spec.selector are the same, then the new one will adopt the old Pods.However, it will not make any effort to make existing Pods match a new, different pod template.To update Pods to a new spec in a controlled way, use aDeployment, asReplicaSets do not support a rolling update directly.
Terminating Pods
Kubernetes v1.35 [beta](enabled by default)You can enable this feature by setting theDeploymentReplicaSetTerminatingReplicasfeature gateon theAPI serverand on thekube-controller-manager
Pods that become terminating due to deletion or scale down may take a long time to terminate, and may consumeadditional resources during that period. As a result, the total number of all pods can temporarily exceed.spec.replicas. Terminating pods can be tracked using the.status.terminatingReplicas field of the ReplicaSet.
Isolating Pods from a ReplicaSet
You can remove Pods from a ReplicaSet by changing their labels. This technique may be used to remove Podsfrom service for debugging, data recovery, etc. Pods that are removed in this way will be replaced automatically (assuming that the number of replicas is not also changed).
Scaling a ReplicaSet
A ReplicaSet can be easily scaled up or down by simply updating the.spec.replicas field. The ReplicaSet controllerensures that a desired number of Pods with a matching label selector are available and operational.
When scaling down, the ReplicaSet controller chooses which pods to delete by sorting the available pods toprioritize scaling down pods based on the following general algorithm:
- Pending (and unschedulable) pods are scaled down first
- If
controller.kubernetes.io/pod-deletion-costannotation is set, thenthe pod with the lower value will come first. - Pods on nodes with more replicas come before pods on nodes with fewer replicas.
- If the pods' creation times differ, the pod that was created more recentlycomes before the older pod (the creation times are bucketed on an integer log scale).
If all of the above match, then selection is random.
Pod deletion cost
Kubernetes v1.22 [beta]Using thecontroller.kubernetes.io/pod-deletion-costannotation, users can set a preference regarding which pods to remove first when downscaling a ReplicaSet.
The annotation should be set on the pod, the range is [-2147483648, 2147483647]. It represents the cost ofdeleting a pod compared to other pods belonging to the same ReplicaSet. Pods with lower deletioncost are preferred to be deleted before pods with higher deletion cost.
The implicit value for this annotation for pods that don't set it is 0; negative values are permitted.Invalid values will be rejected by the API server.
This feature is beta and enabled by default. You can disable it using thefeature gatePodDeletionCost in both kube-apiserver and kube-controller-manager.
Note:
- This is honored on a best-effort basis, so it does not offer any guarantees on pod deletion order.
- Users should avoid updating the annotation frequently, such as updating it based on a metric value,because doing so will generate a significant number of pod updates on the apiserver.
Example Use Case
The different pods of an application could have different utilization levels. On scale down, the applicationmay prefer to remove the pods with lower utilization. To avoid frequently updating the pods, the applicationshould updatecontroller.kubernetes.io/pod-deletion-cost once before issuing a scale down (setting theannotation to a value proportional to pod utilization level). This works if the application itself controlsthe down scaling; for example, the driver pod of a Spark deployment.
ReplicaSet as a Horizontal Pod Autoscaler Target
A ReplicaSet can also be a target forHorizontal Pod Autoscalers (HPA). That is,a ReplicaSet can be auto-scaled by an HPA. Here is an example HPA targetingthe ReplicaSet we created in the previous example.
apiVersion:autoscaling/v1kind:HorizontalPodAutoscalermetadata:name:frontend-scalerspec:scaleTargetRef:apiVersion:apps/v1kind:ReplicaSetname:frontendminReplicas:3maxReplicas:10targetCPUUtilizationPercentage:50Saving this manifest intohpa-rs.yaml and submitting it to a Kubernetes cluster shouldcreate the defined HPA that autoscales the target ReplicaSet depending on the CPU usageof the replicated Pods.
kubectl apply -f https://k8s.io/examples/controllers/hpa-rs.yamlAlternatively, you can use thekubectl autoscale command to accomplish the same(and it's easier!)
kubectl autoscale rs frontend --max=10 --min=3 --cpu=50%Alternatives to ReplicaSet
Deployment (recommended)
Deployment is an object which can own ReplicaSets and updatethem and their Pods via declarative, server-side rolling updates.While ReplicaSets can be used independently, today they're mainly used by Deployments as a mechanism to orchestrate Podcreation, deletion and updates. When you use Deployments you don't have to worry about managing the ReplicaSets thatthey create. Deployments own and manage their ReplicaSets.As such, it is recommended to use Deployments when you want ReplicaSets.
Bare Pods
Unlike the case where a user directly created Pods, a ReplicaSet replaces Pods that are deleted orterminated for any reason, such as in the case of node failure or disruptive node maintenance,such as a kernel upgrade. For this reason, we recommend that you use a ReplicaSet even if yourapplication requires only a single Pod. Think of it similarly to a process supervisor, only itsupervises multiple Pods across multiple nodes instead of individual processes on a single node. AReplicaSet delegates local container restarts to some agent on the node such as Kubelet.
Job
Use aJob instead of a ReplicaSet for Pods that areexpected to terminate on their own (that is, batch jobs).
DaemonSet
Use aDaemonSet instead of a ReplicaSet for Pods that provide amachine-level function, such as machine monitoring or machine logging. These Pods have a lifetime that is tiedto a machine lifetime: the Pod needs to be running on the machine before other Pods start, and aresafe to terminate when the machine is otherwise ready to be rebooted/shutdown.
ReplicationController
ReplicaSets are the successors toReplicationControllers.The two serve the same purpose, and behave similarly, except that a ReplicationController does not support set-basedselector requirements as described in thelabels user guide.As such, ReplicaSets are preferred over ReplicationControllers
What's next
- Learn aboutPods.
- Learn aboutDeployments.
- Run a Stateless Application Using a Deployment,which relies on ReplicaSets to work.
ReplicaSetis a top-level resource in the Kubernetes REST API.Read theReplicaSetobject definition to understand the API for replica sets.- Read aboutPodDisruptionBudget and howyou can use it to manage application availability during disruptions.