Customizing node system configuration Stay organized with collections Save and categorize content based on your preferences.
This document shows you how to customize your Google Kubernetes Engine (GKE)node configuration by using a configuration file called anode system configuration.
A node system configuration is a configuration file that provides a way toadjust a limited set of system settings. In your node pool, you can use a nodesystem configuration to specify custom settings for thekubelet Kubernetes node agent and forsysctl low-level Linux kernelconfigurations.
kubelet orsysctl settings can lead to unintendedbehavior that might negatively affect the health of your workloads and nodes.Extensively test anykubelet orsysctl changes before you deploy them onproduction workloads.This document details the available configurations for a node systemconfiguration and how to apply them to your GKE Standardnode pools. Note that because GKE Autopilot clustershave a more managed node environment, their direct node system configurationoptions are limited compared to GKE Standard node pools.
Why use node system configurations
Node system configurations offer the following benefits:
- Performance tuning: optimize network stack performance, memorymanagement, CPU scheduling, or I/O behavior for demanding applications likeAI training or serving, databases, high-traffic web servers, orlatency-sensitive services.
- Security hardening: apply specific kernel-level security settings orrestrict certain system behaviors to reduce the attack surface.
- Resource management: fine-tune how
kubeletmanages PIDs, disk space,image garbage collection, or CPU and memory resources. - Workload compatibility: help ensure that the node environment meetsspecific prerequisites for specialized software or older applications thatrequire particular kernel settings.
Other options for customizing node configurations
You can also customize your node configuration by using other methods:
- Runtime configuration file: to customize a containerd container runtimeon your GKE nodes, you can use a different file called aruntime configuration file. For more information, seeCustomize containerd configuration in GKE nodes.
- ComputeClass: you can specify node attributes in your GKEComputeClass specification. You can use ComputeClasses, in bothGKE Autopilot mode and Standard mode, inGKE version 1.32.1-gke.1729000 and later.For more information, seeCustomize the node system configuration.
- DaemonSets: you can also use DaemonSets to customize nodes. For moreinformation, seeAutomatically bootstrapping GKE nodes with DaemonSets.
Node system configurations arenot supported in Windows Server nodes.
Before you begin
Before you begin, make sure to do the following:
- Install command-line tools:
- If you use the gcloud CLI examples in this document, ensure thatyou install and configure theGoogle Cloud CLI.
- If you use the Terraform examples, ensure that you install and configureTerraform.
- Grant permissions: you need appropriate IAM permissions tocreate and update GKE clusters and node pools, such as
container.clusterAdminor a different role with equivalent permissions. - Plan for potential workload disruption: custom node configurations areapplied at the node pool level. Changes typically trigger a rolling updateof the nodes in the pool, which involvesre-creating the nodes.Plan for potential workload disruption and use Pod Disruption Budgets (PDBs)where appropriate.
- Back up and test all changes: always test configuration changes in astaging or development environment before you apply them to production.Incorrect settings can lead to node instability or workload failures.
- Review GKE default settings: GKE nodeimages come with optimized default configurations. Only customize parametersif you have a specific need and understand the impact of your changes.
Use a node system configuration in GKE Standard mode
When you use a node system configuration, you use a YAML file that contains theconfiguration parameters for thekubelet and the Linux kernel. Although nodesystem configurations are also available in GKEAutopilot mode, the steps in this document show you how to create anduse a configuration file for GKE Standard mode.
To use a node system configuration in GKE Standardmode, do the following:
- Create a configuration file. This file contains your
kubeletandsysctlconfigurations. - Add the configuration when you create a cluster, or when you createor update a node pool.
Create a configuration file
Write your node system configuration in YAML. The following example addsconfigurations for thekubelet andsysctl options:
kubeletConfig:cpuManagerPolicy:staticallowedUnsafeSysctls:-'kernel.shm*'-'kernel.msg*'-'kernel.sem'-'fs.mqueue*'-'net.*'linuxConfig:sysctl:net.core.somaxconn:'2048'net.ipv4.tcp_rmem:'4096873806291456'In this example, the following applies:
- The
cpuManagerPolicy: staticfield configures thekubeletto use thestatic CPU managementpolicy. + Thenet.core.somaxconn: '2048'field limits thesocket listen()backlog to2,048 bytes. - The
net.ipv4.tcp_rmem: '4096 87380 6291456'field sets the minimum,default, and maximum value of the TCP socket receive buffer to 4,096 bytes,87,380 bytes, and 6,291,456 bytes, respectively.
If you want to add configurations only for thekubelet orsysctl, includeonly that section in your node system configuration. For example, to add akubelet configuration, create the following file:
kubeletConfig:cpuManagerPolicy:staticFor a complete list of the fields that you can add to your node systemconfiguration, see theKubelet configuration options andSysctl configuration options sections.
Add the configuration to a Standard node pool
After you create the node system configuration, add the--system-config-from-fileflag by using the Google Cloud CLI. You can add this flag when you create acluster, or when you create or update a node pool. You can't add a node systemconfiguration by using the Google Cloud console.
Create a cluster with the node system configuration
You can add a node system configuration during cluster creation by using thegcloud CLI or Terraform. The following instructions apply the nodesystem configuration to the default node pool:
gcloud CLI
gcloudcontainerclusterscreateCLUSTER_NAME\--location=LOCATION\--system-config-from-file=SYSTEM_CONFIG_PATHReplace the following:
CLUSTER_NAME: the name for your clusterLOCATION: the compute zone or region of the clusterSYSTEM_CONFIG_PATH: the path to the file thatcontains yourkubeletandsysctlconfigurations
After you apply a node system configuration, the default node pool of thecluster uses the settings that you defined.
Terraform
To create a regional cluster with a customized node system configuration byusing Terraform, refer to the following example:
resource "google_container_cluster" "default" { name = "gke-standard-regional-cluster" location = "us-central1" initial_node_count = 1 node_config { # Kubelet configuration kubelet_config { cpu_manager_policy = "static" } linux_node_config { # Sysctl configuration sysctls = { "net.core.netdev_max_backlog" = "10000" } # Linux cgroup mode configuration cgroup_mode = "CGROUP_MODE_V2" # Linux huge page configuration hugepages_config { hugepage_size_2m = "1024" } } }}For more information about using Terraform, seeTerraform support for GKE.
Create a new node pool with the node system configuration
You can add a node system configuration when you use the gcloud CLIor Terraform to create a new node pool.
The following instructions apply the node system configuration to a new nodepool:
gcloud CLI
gcloudcontainernode-poolscreatePOOL_NAME\--clusterCLUSTER_NAME\--location=LOCATION\--system-config-from-file=SYSTEM_CONFIG_PATHReplace the following:
POOL_NAME: the name for your node poolCLUSTER_NAME: the name of the cluster that you wantto add a node pool toLOCATION: the compute zone or region of the clusterSYSTEM_CONFIG_PATH: the path to the file thatcontains yourkubeletandsysctlconfigurations
Terraform
To create a node pool with a customized node system configuration by usingTerraform, refer to the following example:
resource "google_container_node_pool" "default" { name = "gke-standard-regional-node-pool" cluster = google_container_cluster.default.name node_config { # Kubelet configuration kubelet_config { cpu_manager_policy = "static" } linux_node_config { # Sysctl configuration sysctls = { "net.core.netdev_max_backlog" = "10000" } # Linux cgroup mode configuration cgroup_mode = "CGROUP_MODE_V2" # Linux huge page configuration hugepages_config { hugepage_size_2m = "1024" } } }}For more information about using Terraform, seeTerraform support for GKE.
Update the node system configuration of an existing node pool
You can update the node system configuration of an existing node pool by runningthe following command:
gcloudcontainernode-poolsupdatePOOL_NAME\--cluster=CLUSTER_NAME\--location=LOCATION\--system-config-from-file=SYSTEM_CONFIG_PATHReplace the following:
POOL_NAME: the name of the node pool that you want toupdateCLUSTER_NAME: the name of the cluster that you want toupdateLOCATION: the compute zone or region of the clusterSYSTEM_CONFIG_PATH: the path to the file that containsyourkubeletandsysctlconfigurations
This change requires re-creating the nodes, which can cause disruption to yourrunning workloads. For more information about this specific change, find thecorresponding row in themanual changes that re-create the nodes using a node upgrade strategy without respecting maintenance policiestable.
For more information about node updates, seePlanning for node update disruptions.
Caution: GKE immediately begins re-creating the nodes for thischange by using the node upgrade strategy, regardless of active maintenancepolicies. GKE depends onresource availability for the change. Disabling nodeauto-upgradesdoesn't prevent this change.Edit a node system configuration
To edit a node system configuration, you can create a new node pool with theconfiguration that you want, or update the node system configuration of anexisting node pool.
Edit by creating a node pool
To edit a node system configuration by creating a node pool, do the following:
- Create a configuration file with the configuration that you want.
- Add the configuration to a new node pool.
- Migrate your workloads to the new node pool.
- Delete the old nodepool.
Edit by updating an existing node pool
To edit the node system configuration of an existing node pool, follow theinstructions in theUpdate node pool tab foradding the configuration to a node pool. When you update a node systemconfiguration and the new configuration overrides the node pool's existingsystem configuration, the nodes must be re-created. If you omit any parametersduring an update, the parameters are set to their respective defaults.
If you want to reset the node system configuration back to the defaults, updateyour configuration file with empty values for thekubelet andsysctl fields,for example:
kubeletConfig:{}linuxConfig:sysctl:{}Delete a node system configuration
To remove a node system configuration, do the following steps:
- Create a node pool.
- Migrate your workloads to the new node pool.
- Delete the node poolthat has the old node system configuration.
Configuration options for thekubelet
The tables in this section describe thekubelet options that you can modify.
CPU management
The following table describes the CPU management options for thekubelet.
kubelet config settings | Restrictions | Default setting | Description |
|---|---|---|---|
cpuCFSQuota | Must betrue orfalse. | true | This setting enforces thePod's CPU limit. Setting this value tofalse means that the CPU limits for Pods are ignored.Ignoring CPU limits might be beneficial in certain scenarios where Pods are sensitive to CPU limits. The risk of disabling cpuCFSQuota is that a rogue Pod can consume more CPU resources than intended. |
cpuCFSQuotaPeriod | Must be a duration of time. | "100ms" | This setting sets the CPU CFS quota period value,cpu.cfs_period_us, which specifies the period of how often a cgroup's access to CPU resources should be reallocated. This option lets you tune the CPU throttling behavior. |
Memory management and eviction
The following table describes the modifiable options for memory management andeviction. This section also contains a separate table that describes themodifiable options for theevictionSoft flag.
kubelet config settings | Restrictions | Default setting | Description |
|---|---|---|---|
evictionSoft | Map of signal names. For value restrictions, see the following table. | none | This setting maps signal names to a quantity or percentage that defines soft eviction thresholds. A soft eviction threshold must have a grace period. Thekubelet does not evict Pods until the grace period is exceeded. |
evictionSoftGracePeriod | Map of signal names. For each signal name, the value must be a positive duration less than5m. Valid time units arens,us (orµs),ms,s, orm. | none | This setting maps signal names to durations that define grace periods for soft eviction thresholds. Each soft eviction threshold must have a corresponding grace period. |
evictionMinimumReclaim | Map of signal names. For each signal name, the value must be a positive percentage less than10%. | none | This setting maps signal names to percentages that define the minimum amount of a given resource that thekubelet reclaims when it performs a Pod eviction. |
evictionMaxPodGracePeriodSeconds | Value must be an integer between0 and300. | 0 | This setting defines, in seconds, the maximum grace period for Pod termination during eviction. |
The following table shows the modifiable options for theevictionSoft flag.The same options also apply to theevictionSoftGracePeriod andevictionMinimumReclaim flags with different restrictions.
kubelet config settings | Restrictions | Default setting | Description |
|---|---|---|---|
memoryAvailable | Value must be a quantity greater than100Mi and less than50% of the node's memory. | none | This setting represents the amount of memory available before soft eviction. Defines the amount of thememory.available signal in thekubelet . |
nodefsAvailable | Value must be between10% and50%. | none | This setting represents the nodefs available before soft eviction. Defines the amount of thenodefs.available signal in thekubelet . |
nodefsInodesFree | Value must be between5% and50%. | none | This setting represents the nodefs inodes that are free before soft eviction. Defines the amount of thenodefs.inodesFree signal in thekubelet . |
imagefsAvailable | Value must be between15% and50%. | none | This setting represents the imagefs available before soft eviction. Defines the amount ofimagefs.available signal in thekubelet . |
imagefsInodesFree | Value must be between5% and50%. | none | This setting represents the imagefs inodes that are free before soft eviction. Defines the amount of theimagefs.inodesFree signal in thekubelet. |
pidAvailable | Value must be between10% and50%. | none | This setting represents the PIDs available before soft eviction. Defines the amount of thepid.available signal in thekubelet. |
singleProcessOOMKill | Value must betrue orfalse. | true for cgroupv1 nodes,false for cgroupv2 nodes. | This setting sets whether the processes in the container are OOMkilled individually or as a group. Available on GKE versions 1.32.4-gke.1132000, 1.33.0-gke.1748000 or later. |
PID management
The following table describes the modifiable options for PID management.
kubelet config settings | Restrictions | Default setting | Description |
|---|---|---|---|
podPidsLimit | Value must be between1024 and4194304. | none | This setting sets the maximum number of process IDs (PIDs) that each Pod can use. |
Logging
The following table describes the modifiable options for logging.
kubelet config settings | Restrictions | Default setting | Description |
|---|---|---|---|
containerLogMaxSize | Value must be a positive number and a unit suffix between10Mi and500Mi, inclusive. | 10Mi | This setting controls thecontainerLogMaxSize setting of container log rotation policy, which lets you configure the maximum size for each log file. The default value is10Mi. Valid units areKi,Mi, andGi. |
containerLogMaxFiles | Value must be an integer between2 and10, inclusive. | 5 | This setting controls thecontainerLogMaxFiles setting of the container log files rotation policy, which lets you configure the maximum number of files allowed for each container respectively. The default value is5. The total log size(container_log_max_size*container_log_max_files) per container can't exceed 1% of the total storage of the node. |
Image garbage collection
The following table describes the modifiable options for image garbagecollection.
kubelet config settings | Restrictions | Default setting | Description |
|---|---|---|---|
imageGCHighThresholdPercent | Value must be an integer between 10 and 85, inclusive, and higher thanimageGcLowThresholdPercent. | 85 | This setting defines the percent of disk usage above which image garbage collection is run. It represents the highest disk usage to garbage collect to. The percentage is calculated by dividing this field's value by 100. |
imageGCLowThresholdPercent | Value must be an integer between 10 and 85, inclusive, and lower thanimageGcHighThresholdPercent. | 80 | This setting defines the percent of disk usage before which image garbage collection is never run. It represents the lowest disk usage to garbage collect to. The percentage is calculated by dividing this field's value by 100. |
imageMinimumGcAge | Value must be a duration of time not greater than2m. Valid time units arens,us (orµs),ms,s,m, orh. | 2m | This setting defines the minimum age for an unused image before it is garbage-collected. |
imageMaximumGcAge | Value must be a duration of time. | 0s | This setting defines the maximum age an image can be unused before it is garbage-collected. This field's default value is Available on GKE versions 1.30.7-gke.1076000, 1.31.3-gke.1023000 or later. |
Image pulling
The following table describes the modifiable options for image pulling.
kubelet config settings | Restrictions | Default setting | Description |
|---|---|---|---|
maxParallelImagePulls | Value must be an integer between 2 and 5, inclusive. | 2 or3 based on the disk type. | This setting defines the maximum number of image pulls in parallel. The default value is decided by the boot disk type. |
Security and unsafe operations
The following table describes the modifiable options for configuring securityand handling unsafe operations.
kubelet config settings | Restrictions | Default setting | Description |
|---|---|---|---|
allowedUnsafeSysctls | List of
| none | This setting defines a comma-separated allowlist of unsafesysctl names orsysctl groups that can be set on the Pods. |
insecureKubeletReadonlyPortEnabled | Value must be a boolean value, eithertrue orfalse. | true | This setting disables the insecurekubelet read-only port10255 on every new node pool in your cluster. If you configure this setting in this file, you can't use a GKE API client to change the setting at the cluster level. |
Resource Managers
Kubernetes offers a suite of Resource Managers. You can configure these ResourceManagers to coordinate and optimize the alignment of node resources for Podsthat are configured with specific requirements for CPUs, devices, and memory(hugepages) resources.
The following table describes the modifiable options for Resource Managers.
kubelet config settings | Restrictions | Default setting | Description |
|---|---|---|---|
cpuManagerPolicy | Value must benone orstatic. | none | This setting controls thekubeletCPU Manager policy. The default value isnone, which is the default CPU affinity scheme, providing no affinity beyond what the OS scheduler does automatically.Setting this value to static allows Pods that are both in theGuaranteed QoS class and have integer CPU requests to be assigned exclusive CPUs. |
memoryManager.policy | Value must beNone orStatic. | None | This setting controls the If you set this value to This setting is supported for clusters with the control plane running GKE version 1.32.3-gke.1785000 or later. |
topologyManager | Value must be one of the supported settings for each of the respective fields. You can't set the |
| These settings control the You can set the policy and scope settings independently of each other. For more information about these settings, seeTopology manager scopes and policies. The following GKE resources support this setting:
|
Sysctl configuration options
To tune the performance of your system, you can modify Linux kernel parameters.The tables in this section describe the various kernel parameters that you canconfigure.
Filesystem parameters (fs.*)
The following table describes the modifiable parameters for the Linuxfilesystem. These settings control the behavior of the Linux filesystem, such asfile handle limits and event monitoring.
Sysctl parameter | Restrictions | Description |
|---|---|---|
fs.aio-max-nr | Must be between [65536, 4194304]. | This setting defines the maximum system-wide number of asynchronous I/O requests. |
fs.file-max | Must be between [104857, 67108864]. | This setting defines the maximum number of file-handles that the Linux kernel can allocate. |
fs.inotify.max_user_instances | Must be between [8192, 1048576]. | This setting defines the maximum number of inotify instances that a user can create. |
fs.inotify.max_user_watches | Must be between [8192, 1048576]. | This setting defines the maximum number of inotify watches that a user can create. |
fs.nr_open | Must be between [1048576, 2147483584]. | This setting defines the maximum number of file descriptors that can be opened by a process. |
Kernel parameters (kernel.*)
The following table describes the modifiable parameters for the Linux kernel.These settings configure core kernel functionalities, including shared memoryallocation.
| Sysctl parameter | Restrictions | Description |
|---|---|---|
kernel.shmmni | Must be between [4096, 32768]. | This setting defines the system-wide maximum number of shared memory segments. If this value isn't set, it defaults to4096. |
kernel.shmmax | Must be between [0, 18446744073692774399]. | This setting defines the maximum size, in bytes, of a single shared memory segment allowed by the kernel. This value is ignored if it is greater than the actual amount of RAM, which means that all available RAM can be shared. |
kernel.shmall | Must be between [0, 18446744073692774399]. | This setting defines the total amount of shared memory pages that can be used on the system at one time. A page is 4,096 bytes on the AMD64 and Intel 64 architecture. |
kernel.perf_event_paranoid | Must be between [-1, 3]. | This setting controls use of the performance events system by unprivileged users without CAP_PERFMON. The default value is2 in the kernel. |
kernel.sched_rt_runtime_us | Must be between [-1, 1000000]. | This setting defines a global limit on how much time real-time scheduling can use. |
kernel.softlockup_panic | Optional (boolean). | This setting ontrols whether the kernel panics when a soft lockup is detected. |
kernel.yama.ptrace_scope | Must be between [0, 3]. | This setting is defines the scope and restrictions for the
|
kernel.kptr_restrict | Must be between [0, 2]. | This setting indicates whether restrictions are placed on exposing kernel addresses through/proc and other interfaces. |
kernel.dmesg_restrict | Optional (boolean). | This setting indicates whether unprivileged users are prevented from usingdmesg(8) to view messages from the kernel's log buffer. |
kernel.sysrq | Must be between [0, 511]. | This setting controls the functions allowed to be invoked through the SysRq key. Possible values include the following:
|
Network parameters (net.*)
The following table describes the modifiable parameters for networking. Thesesettings tune the performance and behavior of the networking stack, from socketbuffers to connection tracking.
| Sysctl parameter | Restrictions | Description |
|---|---|---|
net.core.busy_poll | Any positive integer, less than 2147483647. | This setting defines the low latency busy poll timeout for poll and select. It represents the approximate time in µs to busy loop waiting for events. |
net.core.busy_read | Any positive integer, less than 2147483647. | This setting defines the low latency busy poll timeout for socket reads. It represents the approximate time in µs to busy loop waiting for packets on the device queue. |
net.core.netdev_max_backlog | Any positive integer, less than 2147483647. | This setting defines the maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them. |
net.core.rmem_default | Any positive integer, less than 2147483647. | This setting defines the default receive socket buffer size, in bytes. |
net.core.rmem_max | Any positive integer, less than 2147483647. | This setting defines the maximum receive socket buffer size, in bytes. |
net.core.wmem_default | Any positive integer, less than 2147483647. | This setting defines the default setting, in bytes, of the socket send buffer. |
net.core.wmem_max | Any positive integer, less than 2147483647. | This setting defines the maximum send socket buffer size, in bytes. |
net.core.optmem_max | Any positive integer, less than 2147483647. | This setting defines the maximum ancillary buffer size allowed per socket. |
net.core.somaxconn | Must be between [128, 2147483647]. | This setting defines the limit of thesocket listen() backlog, which is known in userspace as SOMAXCONN. This setting defaults to128. |
net.ipv4.tcp_rmem | {min, default, max} (each > 0, memory in bytes). | This setting defines the minimal size, in bytes, of the receive buffer used by UDP sockets in moderation. The default setting is 1 page. |
net.ipv4.tcp_wmem | {min, default, max} (each > 0, memory in bytes). | This setting defines the minimal size, in bytes, of the send buffer used by UDP sockets in moderation. The default setting is 1 page. |
net.ipv4.tcp_tw_reuse | Must be between {0, 1}. | This setting defines whether to allow the reuse of sockets in theTIME_WAIT state for new connections when it is safe from a protocol viewpoint. The default value is0. |
net.ipv4.tcp_max_orphans | Must be between [16384, 262144]. | This setting defines the maximal number of TCP sockets that aren't attached to any user file handle. |
net.ipv4.tcp_max_tw_buckets | Must be between [4096, 2147483647]. | This setting defines the maximal number of timewait sockets held by the system simultaneously. If this number is exceeded, the time-wait socket is immediately destroyed and a warning is printed. |
net.ipv4.tcp_syn_retries | Must be between [1, 127]. | This setting defines the number of times initial SYNs for an active TCP connection attempt is retransmitted. |
net.ipv4.tcp_ecn | Must be between [0, 2]. | This setting controls the use of Explicit Congestion Notification (ECN) by TCP. ECN is used only when both ends of the TCP connection indicate support for it. |
net.ipv4.tcp_mtu_probing | Must be between [0, 2]. | This setting controls TCP Packetization-Layer Path MTU Discovery. The supported values are the following:
|
net.ipv4.tcp_congestion_control | Must be one of the supported values from theDescription column. | This setting isn't supported whenGKE Dataplane V2 is enabled on the cluster. The following supported values depend on the image type:
|
net.ipv6.conf.all.disable_ipv6 | Boolean. | Changing this value is same as changing theconf/default/disable_ipv6 setting and also all per-interfacedisable_ipv6 settings to the same value. |
net.ipv6.conf.default.disable_ipv6 | Boolean. | This setting disables the operation of IPv6. |
net.netfilter.nf_conntrack_acct | Must be between {0, 1}. | This setting enables the connection tracking flow accounting. The default value is0, which means that the setting is disabled. Available on GKE versions 1.32.0-gke.1448000 or later. |
net.netfilter.nf_conntrack_max | Must be between [65536, 4194304]. | This setting defines the size of the connection tracking table. If the maximum value is reached, the new connection will fail. Available on GKE versions 1.32.0-gke.1448000 or later. |
net.netfilter.nf_conntrack_buckets | Must be between [65536, 524288]. | This setting defines the size of the hash table. The recommended setting is the result of the following: Available on GKE versions 1.32.0-gke.1448000 or later. |
net.netfilter.nf_conntrack_tcp_timeout_close_wait | Must be between [60, 3600]. | This setting defines the period, in seconds, for which the TCP connections can remain in the Available on GKE versions 1.32.0-gke.1448000 or later. |
net.netfilter.nf_conntrack_tcp_timeout_established | Must be between [600, 86400]. | This setting defines the duration, in seconds, of dead connections before they are deleted automatically from the connection tracking table. Available on GKE versions 1.32.0-gke.1448000 or later. |
net.netfilter.nf_conntrack_tcp_timeout_time_wait | Must be between [1, 600]. | This setting defines the period, in seconds, for which the TCP connections can remain in the Available on GKE versions 1.32.0-gke.1448000 or later. |
Virtual Memory parameters (vm.*)
The following table describes the modifiable parameters for the Virtual Memorysubsystem. These settings manage the Virtual Memory subsystem, which controlshow the kernel handles memory, swapping, and disk caching.
sysctl parameter | Restrictions | Description |
|---|---|---|
vm.max_map_count | Must be between [65536, 2147483647]. | This file defines the maximum number of memory map areas that a process can have. |
vm.dirty_background_ratio | Must be between [1, 100]. | This setting defines the percentage of system memory that can be filled with dirty pages before background kernel flusher threads begin writeback. The value must be less than the value of thevm.dirty_ratio field. |
vm.dirty_background_bytes | Must be between [0, 68719476736]. | This setting defines the amount of dirty memory at which the background kernel flusher threads start writeback. Be aware that |
vm.dirty_expire_centisecs | Must be between [0, 6000]. | This setting defines the maximum age, in hundredths of a second, that dirty data can remain in memory before kernel flusher threads write it to disk. |
vm.dirty_ratio | Must be between [1, 100]. | This setting defines the percentage of system memory that can be filled with dirty pages before processes that perform writes are forced to block and write out dirty data synchronously. |
vm.dirty_bytes | Must be between [0, 68719476736]. | This setting defines the amount of dirty memory at which a process that generates disk writes starts writeback itself. The minimum value allowed for Be aware that |
vm.dirty_writeback_centisecs | Must be between [0, 1000]. | This setting defines the interval, in hundredths of a second, at which kernel flusher threads wake up to write old dirty data to disk. |
vm.overcommit_memory | Must be between {0, 1, 2}. | This setting determines the kernel's strategy for handling memory overcommitment. The values are as follows:
|
vm.overcommit_ratio | Must be between [0, 100]. | This setting defines the percentage of physical RAM allowed for overcommit when the value of thevm.overcommit_memory field is set to2. |
vm.vfs_cache_pressure | Must be between [0, 100]. | This setting adjusts the kernel's preference for reclaiming memory used for dentry (directory) and inode caches. |
vm.swappiness | Must be between [0, 200]. | This setting controls the tendency of the kernel to move processes out of physical memory and onto the swap disk. The default value is60. |
vm.watermark_scale_factor | Must be between [10, 3000]. | This setting controls the aggressiveness of kswapd. It defines the memory left before kswapd wakes and the memory to free before it sleeps. The default value is10. |
vm.min_free_kbytes | Must be between [67584, 1048576]. | This setting defines the minimum free memory before OOM. The default value is67584. |
For more information about the supported values for eachsysctl flag, see the--system-config-from-file gcloud CLI documentation.
DifferentLinux namespacesmight have unique values for a givensysctl flag, but others might be globalfor the entire node. Updatingsysctl options by using a node systemconfiguration helps ensure that thesysctl is applied globally on the node andin each namespace, so that each Pod has identicalsysctl values in each Linuxnamespace.
Linux cgroup mode configuration options
Caution: GKE deprecatescgroupv1 with minor version 1.31, andremoves support forcgroupv1 with version 1.35. For more information about howGKE is transitioning tocgroupv2 and how you can prepare yourclusters for this transition, seeMigrate nodes to Linux cgroupv2.The container runtime andkubelet use Linux kernelcgroups for resource management, suchas limiting how much CPU or memory each container in a Pod can access. There aretwo versions of the cgroup subsystem in the kernel:cgroupv1 andcgroupv2.Kubernetes support forcgroupv2 was introduced as alpha in Kubernetes version1.18, beta in 1.22, and GA in 1.25. For more information, see theKubernetes cgroups v2 documentation.
Node system configuration lets you customize the cgroup configuration of yournode pools. You can usecgroupv1 orcgroupv2. GKE usescgroupv2 for new Standard node pools that run version 1.26 and later,andcgroupv1 for node pools that run versions earlier than 1.26. For nodepools that were created with node auto-provisioning, the cgroup configurationdepends on the initial cluster version, not the node pool version.cgroupv1 isnot supported on Arm machines.
You can use node system configuration to change the setting for a node pool tousecgroupv1 orcgroupv2 explicitly. Upgrading an existing node pool thatusescgroupv1 to version 1.26 doesn't change the setting tocgroupv2.Existing node pools that run a version earlier than 1.26—and that don'tinclude a customized cgroup configuration—will continue to usecgroupv1.To change the setting, you must explicitly specifycgroupv2 for the existingnode pool.
For example, to configure your node pool to usecgroupv2, use a node systemconfiguration file such as the following:
linuxConfig:cgroupMode:'CGROUP_MODE_V2'The supportedcgroupMode options are the following:
CGROUP_MODE_V1: usecgroupv1on the node pool.CGROUP_MODE_V2: usecgroupv2on the node pool.CGROUP_MODE_UNSPECIFIED: use the default GKE cgroupconfiguration.
To usecgroupv2, the following requirements and limitations apply:
- For a node pool that runs a version earlier than 1.26, you must usegcloud CLI version408.0.0 or later. Alternatively,usegcloud beta with version395.0.0 or later.
- Your cluster and node pools must run GKE version1.24.2-gke.300 or later.
- You must use either the Container-Optimized OS with containerd orUbuntu with containerdnode image.
- If any of your workloads depend on reading the cgroup filesystem(
/sys/fs/cgroup/...), ensure that they are compatible with thecgroupv2API. - If you use any monitoring or third-party tools, ensure that they arecompatible with
cgroupv2. - If you use Java workloads (JDK), we recommend that you use versions whichfully support cgroupv2,including JDK
8u372, JDK 11.0.16 or later, or JDK 15 or later.
Verify cgroup configuration
When you add a node system configuration, GKE must re-create thenodes to implement the changes. After youadd the configuration to a node pool and the nodes are re-created, youcan verify the new configuration.
You can verify the cgroup configuration for nodes in a node pool by usinggcloud CLI or thekubectl command-line tool:
gcloud CLI
Check the cgroup configuration for a node pool:
gcloudcontainernode-poolsdescribePOOL_NAME\--format='value(Config.effectiveCgroupMode)'ReplacePOOL_NAME with the name of your node pool.
The potential output is one of the following:
EFFECTIVE_CGROUP_MODE_V1: the nodes usecgroupv1EFFECTIVE_CGROUP_MODE_V2: the nodes usecgroupv2
The output shows only the new cgroup configuration after the nodes in thenode pool are re-created. The output is empty for Windows server nodepools, which don't support cgroup.
kubectl
To usekubectl to verify the cgroup configuration for nodes in this nodepool, select a node and connect to it by using the following instructions:
- Create an interactive shellwith any node in the node pool. In the command, replace
mynodewith thename of any node in the node pool. - Identify the cgroup version on Linux nodes.
Linux hugepages configuration options
You can use a node system configuration file to pre-allocatehugepages. Kubernetessupports pre-allocated hugepages as a resource type, similar to CPU or memory.
To use hugepages, the following limitations and requirements apply:
- To ensure that the node is not fully occupied by hugepages, the overall sizeof the allocated hugepages can't exceed either the following:
- On machines with less than 30 GB memory: 60% of the totalmemory. For example, on e2-standard-2 machine with 8 GB of memory,you can't allocate more than 4.8 GB for hugepages.
- On machines with more than 30 GB memory: 80% of the totalmemory. For example, on c4a-standard-8 machines with 32 GB ofmemory, hugepages cannot exceed 25.6 GB.
- 1 GB hugepages are only available on A3, C2D, C3, C3D, C4, C4A, C4D,CT5E, CT5LP, CT6E, H3, M2, M3, M4, or Z3 machine types.
The following table describes the modifiable settings for Linux hugepages.
| Config parameter | Restrictions | Default value | Description |
|---|---|---|---|
hugepage_size2m | Integer count. Subject to the previously described memory allocation limits. | 0 | This setting pre-allocates a specific number of 2 MB hugepages. |
hugepage_size1g | Integer count. Subject to both of the previously described memory and machine type limitations. | 0 | This setting pre-allocates a specific number of 1 GB hugepages. |
Transparent hugepages (THP)
You can use a node system configuration file to enable the Linux kernel'sTransparent HugePage Support.With THP, the kernel automatically assigns hugepages to processes withoutmanual pre-allocation.
The following table describes the modifiable parameters for THP.
| Config parameter | Supported values | Default value | Description |
|---|---|---|---|
transparentHugepageEnabled |
| UNSPECIFIED | This setting controls whether THP is enabled for anonymous memory. |
transparentHugepageDefrag |
| UNSPECIFIED | This setting defines the defragmentation configuration for THP. |
THP is available on GKE version 1.33.2-gke.4655000 or later. Itis also enabled on new TPU node pools by default on GKE version1.33.2-gke.4655000 or later. THP isn't enabled when you upgrade existing nodepools to a supported version or later.
What's next
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.