server Block in Agent Configuration

Placement	`server`

This page provides reference information for configuring Nomad agent server modein theserver block of a Nomad agent configuration. Server mode lets the agentparticipate in scheduling decisions, register with service discovery, and handlejoin failures. Configure bootstrapping, authoritative region, redundancy zone,data directory, Nomad cluster behavior, client heartbeat period, schedulers,garbage collection, Raft and Raft's BoltDB store, OIDC for workload identity,and leader plan rejection, as well as job priority, job source content size, andtracked job versions.

server {  enabled= true  bootstrap_expect= 3  server_join {    retry_join= ["1.1.1.1", "2.2.2.2" ]    retry_max= 3    retry_interval= "15s"  }}

`server` Parameters

authoritative_region(string: "") - Specifies the authoritative region,which provides a single source of truth for global configurations such as ACLPolicies and global ACL tokens in multi-region, federated deployments.Non-authoritative regions will replicate from the authoritative to act as amirror. By default, the local region is assumed to be authoritative. Settingauthoritative_region assumes that ACLs have been bootstrapped in theauthoritative region. Refer toConfigure for multiple regions in the ACLstutorial.
bootstrap_expect(int: required) - Specifies the number of server nodes towait for before bootstrapping. It is most common to use the odd-numberedintegers3 or5 for this value, depending on the cluster size. A value of1 does not provide any fault tolerance and is not recommended for productionuse cases.
data_dir(string: "") - Specifies the directory to use for server-specificdata, including the replicated log. When this parameter is empty, Nomad willgenerate the path using thetop-leveldata_dir suffixedwithserver, like"/opt/nomad/server". Thetop-level data_dir must be set, even when setting thisvalue. This must be an absolute path. Nomad will create the directory on thehost, if it does not exist when the agent process starts.
enabled(bool: false) - Specifies if this agent should run in server mode.All other server options depend on this value being set.
enabled_schedulers(array<string>: []) - Specifies which sub-schedulersthis server handles. Use this to restrict the evaluations that worker threadsdequeue for processing. Nomad treats the empty default value as["service","batch", "system", "sysbatch"].
enable_event_broker(bool: true) - Specifies if this server will generateevents for its event stream.
encrypt(string: "") - Specifies the secret key to use for encryption ofNomad server's gossip network traffic. This key must be 32 bytes that areRFC4648 "URL and filename safe" base64-encoded. You can generate anappropriately-formatted key with thenomad operator gossip keyringgenerate command. The provided key is automatically persisted to the datadirectory and loaded automatically whenever the agent is restarted. This meansthat to encrypt Nomad server's gossip protocol, this option only needs to beprovided once on each agent's initial startup sequence. If it is providedafter Nomad has been initialized with an encryption key, then the provided keyis ignored and a warning will be displayed. Refer to theencryptiondocumentation for more details on this option and its impact onthe cluster.
event_buffer_size(int: 100) - Specifies the number of events generatedby the server to be held in memory. Increasing this value enables newsubscribers to have a larger look back window when initially subscribing.Decreasing will lower the amount of memory used for the event buffer.
node_gc_threshold(string: "24h") - Specifies how long a node must be in aterminal state before it is garbage collected and purged from the system. Thisis specified using a label suffix like "30s" or "1h".
job_gc_interval(string: "5m") - Specifies the interval between the jobgarbage collections. Only jobs who have been terminal for at leastjob_gc_threshold will be collected. Lowering the interval will perform morefrequent but smaller collections. Raising the interval will perform collectionsless frequently but collect more jobs at a time. Reducing this interval isuseful if there is a large throughput of tasks, leading to a large set ofdead jobs. This is specified using a label suffix like "30s" or "3m".
job_gc_threshold(string: "4h") - Specifies the minimum time a job must bein the terminal state before it is eligible for garbage collection. This isspecified using a label suffix like "30s" or "1h".
eval_gc_threshold(string: "1h") - Specifies the minimum time anevaluation must be in the terminal state before it is eligible for garbagecollection. This is specified using a label suffix like "30s" or "1h". Notethat batch job evaluations are controlled viabatch_eval_gc_threshold. Nomad garbage collects allocations with theirevaluations, so this field also controls server garbage collection ofallocations. Evaluations with non-terminal allocations cannot be garbagecollected.
batch_eval_gc_threshold(string: "24h") - Specifies the minimum time anevaluation stemming from a batch job must be in the terminal state before it iseligible for garbage collection. This is specified using a label suffix like"30s" or "1h". Note that the threshold is a necessary but insufficient conditionfor collection, and the most recent evaluation won't be garbage collected even ifit breaches the threshold. Allocations are garbage collected with theirevaluations, so this field also controls server garbage collection ofallocations. Evaluations with non-terminal allocations cannot be garbagecollected.
deployment_gc_threshold(string: "1h") - Specifies the minimum time adeployment must be in the terminal state before it is eligible for garbagecollection. This is specified using a label suffix like "30s" or "1h".
client_introduction(ClientIntroduction) -Configuration for how the Nomad server handles client introductionrequests.
csi_volume_claim_gc_interval(string: "5m") - Specifies the intervalbetween CSI volume claim garbage collections.
csi_volume_claim_gc_threshold(string: "1h") - Specifies the minimum age ofa CSI volume before it is eligible to have its claims garbage collected.This is specified using a label suffix like "30s" or "1h".
csi_plugin_gc_threshold(string: "1h") - Specifies the minimum age of aCSI plugin before it is eligible for garbage collection if not in use.This is specified using a label suffix like "30s" or "1h".
acl_token_gc_threshold(string: "1h") - Specifies the minimum age of anexpired ACL token before it is eligible for garbage collection. This isspecified using a label suffix like "30s" or "1h".
default_scheduler_config(scheduler_configuration:nil) - Specifies the initial default scheduler config whenbootstrapping cluster. The parameter is ignored once the cluster isbootstrapped or value is updated through theAPIendpoint. Refer tothe examplesection for more details.
heartbeat_grace(string: "10s") - Specifies the additional time givenbeyond the heartbeat TTL of Clients to account for network and processingdelays and clock skew. This is specified using a label suffix like "30s" or"1h". Refer to theClient Heartbeats section fordetails.
min_heartbeat_ttl(string: "10s") - Specifies the minimum time betweenClient heartbeats. This is used as a floor to prevent excessive updates. Thisis specified using a label suffix like "30s" or "1h". Refer to theClientHeartbeats section for details.
failover_heartbeat_ttl(string: "5m") - The time by which all Clients mustheartbeat after a Server leader election. This is specified using a labelsuffix like "30s" or "1h". Refer to theClientHeartbeats section for details.
max_heartbeats_per_second(float: 50.0) - Specifies the maximum targetrate of heartbeats being processed per second. This allows the TTL to beincreased to meet the target rate. Refer to theClientHeartbeats section for details.
non_voting_server(bool: false) - (Enterprise-only) Specifies whetherthis server will act as a non-voting member of the cluster to help provideread scalability.
num_schedulers(int: [num-cores]) - Specifies the number of parallelscheduler threads to run. This can be as many as one per core, or0 todisallow this server from making any scheduling decisions. This defaults tothe number of CPU cores.
license_path(string: "") - Specifies the path to load a Nomad Enterpriselicense from. This must be an absolute path(ex./etc/nomad.d/license.hclic). The license can also be set by settingNOMAD_LICENSE_PATH or by settingNOMAD_LICENSE as the entire licensevalue.license_path has the highest precedence, followed byNOMAD_LICENSEand thenNOMAD_LICENSE_PATH.
plan_rejection_tracker(PlanRejectionTracker) -Configuration for the plan rejection tracker that the Nomad leader uses totrack the history of plan rejections.
raft_boltdb - This is a nested object that allows configuring options forRaft's BoltDB based log store.
- no_freelist_sync - Setting this totrue will disable syncing the BoltDBfreelist to disk within theraft.db file. Not syncing the freelist to diskwill reduce disk IO required for write operations at the expense of longerserver startup times.
raft_protocol(int: 3) - Specifies the Raft protocol version to use whencommunicating with other Nomad servers. This affects available Autopilotfeatures and is typically not required as the agent internally knows thelatest version, but may be useful in some upgrade scenarios. Must be3 inNomad v1.4 or later.
raft_multiplier(int: 1) - An integer multiplier used by Nomad servers toscale key Raft timing parameters. Omitting this value or setting it to 0 usesdefault timing described in the following paragraph. Lower values are used totighten timing and increase sensitivity while higher values relax timings andreduce sensitivity. Tuning this affects the time it takes Nomad to detectleader failures and to perform leader elections, at the expense of requiringmore network and CPU resources for better performance. The maximum allowedvalue is 10.
By default, Nomad will use the highest-performance timing, currently equivalentto setting this to a value of 1. Increasing the timings makes leader electionless likely during periods of networking issues or resource starvation. Sinceleader elections pause Nomad's normal work, it may be beneficial for slow orunreliable networks to wait longer before electing a new leader. The trade-offwhen raising this value is that during network partitions or other events(server crash) where a leader is lost, Nomad will not elect a new leader fora longer period of time than the default. Thenomad.nomad.leader.barrier andnomad.raft.leader.lastContact metrics are a goodindicator of how often leader elections occur and Raft latency.
raft_snapshot_threshold(int: "8192") - Specifies the minimum number ofRaft logs to be written to disk before the node is allowed to take a snapshot.This reduces the frequency and impact of creating snapshots. During nodestartup, Raft restores the latest snapshot and then applies the individuallogs to catch the node up to the last known state. This can be tuned duringoperation by a hot configuration reload.
raft_snapshot_interval(string: "120s") - Specifies the minimum time betweenchecks if Raft should perform a snapshot. The Raft library randomly staggersbetween this value and twice this value to avoid the entire cluster performinga snapshot at once. Nodes are eligible to snapshot once they have exceeded theraft_snapshot_threshold. This value can be tuned during operation by a hotconfiguration reload.
raft_trailing_logs(int: "10240") - Specifies how many logs are retainedafter a snapshot. These logs are used so that Raft can quickly replay logs ona follower instead of being forced to send an entire snapshot. This value canbe tuned during operation by a hot configuration reload.
redundancy_zone(string: "") - (Enterprise-only) Specifies the redundancyzone that this server will be a part of for Autopilot management. For moreinformation, refer to theAutopilot Guide.
rejoin_after_leave(bool: false) - Specifies if Nomad will ignore aprevious leave and attempt to rejoin the cluster when starting. By default,Nomad treats leave as a permanent intent and does not attempt to join thecluster again when starting. This flag allows the previous state to be used torejoin the cluster.
root_key_gc_interval(string: "10m") - Specifies the interval betweenencryption key metadata garbage collections.
root_key_gc_threshold(string: "1h") - Specifies the minimum time aftertheroot_key_rotation_threshold has passed that anencryption key mustexist before it can be eligible for garbage collection.
root_key_rotation_threshold(string: "720h") - Specifies the lifetime ofan activeencryption key before it is automatically rotated on the nextgarbage collection interval. Nomad will prepublish the replacement key at halftheroot_key_rotation_threshold time so external consumers of WorkloadIdentity have time to obtain the new public key from theJWKS URL beforeit is used.
server_join(server_join: nil) - Specifieshow the Nomad server will connect to other Nomad servers. Theretry_joinfields may directly specify the server address or use go-discover syntax forauto-discovery. Refer to theserver_join documentation for more detail.
start_timeout(string: "30s") - A timeout applied to the server setup andstartup processes. These processes (keyring decryption) are expected tocomplete before the server is considered healthy, and if the timeout isreached before they are completed, the server will exit. Without this, theserver can hang indefinitely waiting for these.
upgrade_version(string: "") - A custom version of the format X.Y.Z to usein place of the Nomad version when custom upgrades are enabled in Autopilot.For more information, refer to theAutopilot Guide.
search(search: nil) - Specifies configuration parametersfor the Nomad search API.
job_max_priority(int: 100) - Specifies the maximum priority that can be assigned to a job.A valid value must be between100 and32766.
job_default_priority(int: 50) - Specifies the default priority assigned to a job.A valid value must be between50 andjob_max_priority.
job_max_count(int: 50000) - Specifies the maximum number of allocationsfor a job, as represented by the sum of its task groupcount fields. Jobsof typesystem ignore this value. The child jobs of dispatched batch jobsor periodic jobs are counted separately from their parent job. This valuemust be non-negative. If set to 0, no limit is enforced. This value is enforcedat the time the job is submitted or scaled, and updating the value will notimpact existing jobs.
job_max_source_size(string: "1M") - Specifies the size limit of the associatedjob source content when registering a job. Note this is not a limit on the actualsize of a job. If the limit is exceeded, the original source is simply discardedand no error is returned from the job API.
job_tracked_versions(int: 6) - Specifies the number of historic job versions thatare kept.
oidc_issuer(string: "") - Specifies the Issuer URL forWorkloadIdentity JWTs. For example,"https://nomad.example.com". If set the/.well-known/openid-configuration HTTP endpoint is enabled for thirdparties to discover Nomad's OIDC configuration. Once setoidc_issuercannot be changed without invalidating Workload Identities that have theold issuer claim. For this reason it is suggested to setoidc_issuer to aproxy in front of Nomad's HTTP API to ensure a stable DNS name can be usedinstead of a potentially ephemeral Nomad server IP.

Deprecated Parameters

retry_join(array<string>: []) - Specifies a list of server addresses toretry joining if the first attempt fails. This is similar tostart_join, but only invokes if the initial join attemptfails. The list of addresses will be tried in the order specified, until onesucceeds. After one succeeds, no further addresses will be contacted. This isuseful for cases where we know the address will become available eventually.Useretry_join with an array as a replacement forstart_join,do not useboth options. Refer to theserver_joinsection for more information on the format of the string. This field isdeprecated in favor of theserver_join block.
retry_interval(string: "30s") - Specifies the time to wait between retryjoin attempts. This field is deprecated in favor of theserver_joinblock.
retry_max(int: 0) - Specifies the maximum number of join attempts to bemade before exiting with a return code of 1. By default, this is set to 0which is interpreted as infinite retries. This field is deprecated in favor oftheserver_join block.
start_join(array<string>: []) - Specifies a list of server addresses tojoin on startup. If Nomad is unable to join with any of the specifiedaddresses, agent startup will fail. Refer to theserver addressformatsection for more information on the format of the string. This field isdeprecated in favor of theserver_join block.

`client_introduction` Parameters

Theclient_introduction block controls how the Nomad server should generateintroduction tokens for new clients and what enforcement to apply when newclients attempt to register with the server.

enforcement(string: "warn") - Specifics how the server handles new clientregistration requests. The valid options arenone,warn, andstrict.
- none: The server does not enforce any client registration behaviour.
- warn: The server logs a warning message and emits a telemetry metric whena client attempts to register without an introduction token. However, theserver does allow the registration to proceed.
- strict: The server rejects any client registration attempts that do notinclude a valid introduction token. The server also logs a warning messageand emits a telemetry metric.
default_identity_ttl(string: "5m") - The default TTL assigned togenerated client introduction tokens when the caller does not specify a TTL.Specify this value with a label suffix like "30s" or "1h". Must be lessthanmax_identity_ttl.
max_identity_ttl(string: "30m") - The maximum TTL that can be requestedby a caller when generating client introduction tokens. Specify this valuewith a label suffix like "30s" or "1h". Must be greater thandefault_identity_ttl.

Refer to theClient introduction sectionof the Monitor Nomad guide for details on the client introduction metric that isemitted.

`plan_rejection_tracker` Parameters

The leader plan rejection tracker can be adjusted to prevent evaluations fromgetting stuck due to always being scheduled to a client that may have anunexpected issue. Refer toMonitoring Nomad formore details.

enabled(bool: false) - Specifies if plan rejections should be tracked.
node_threshold(int: 100) - The number of plan rejections for a nodewithin thenode_window to trigger a client to be set as ineligible.
node_window(string: "5m") - The time window for when plan rejections fora node should be considered.

If you observe too many false positives (clients being marked as ineligibleeven if they don't present any problem) you may want to increasenode_threshold.

Or if you are noticing jobs not being scheduled due to plan rejections for thesamenode_id and the client is not being set as ineligible you can tryincreasing thenode_window so more historical rejections are taken intoaccount.

`server` Examples

Common Setup

This example shows a common Nomad agentserver configuration block. The twoIP addresses could also be DNS, and should point to the other Nomad servers inthe cluster

server {  enabled= true  bootstrap_expect= 3  server_join {    retry_join= ["1.1.1.1", "2.2.2.2" ]    retry_max= 3    retry_interval= "15s"  }}

Configuring Data Directory

This example shows configuring a custom data directory for the server data.

server {  data_dir= "/opt/nomad/server"}

Automatic Bootstrapping

The Nomad servers can automatically bootstrap if Consul is configured. For amore detailed explanation, refer to theautomatic Nomad bootstrapping documentation.

Restricting Schedulers

This example shows restricting the schedulers that are enabled as well as themaximum number of cores to utilize when participating in scheduling decisions:

server {  enabled= true  enabled_schedulers= ["batch", "service"]  num_schedulers= 7}

Bootstrapping with a Custom Scheduler Config

Whilebootstrapping a cluster, you can use thedefault_scheduler_config blockto prime the cluster with aSchedulerConfig. Thescheduler configuration determines which scheduling algorithm is configured—spread scheduling or binpacking—and which job types are eligible for preemption.

Warning: Once the cluster is bootstrapped, you must configure this usingtheupdate scheduler configuration API. Thisoption is only consulted during bootstrap.

The structure matches theUpdate Scheduler Config APIendpoint, which you should consult for canonical documentation. However, theattributes names must be adapted to HCL syntax by using snake caserepresentations rather than camel case.

This example shows configuring spread scheduling and enabling preemption for alljob-type schedulers.

server {  default_scheduler_config {    scheduler_algorithm= "spread"    memory_oversubscription_enabled= true    reject_job_registration= false    pause_eval_broker= false    preemption_config {      batch_scheduler_enabled= true      system_scheduler_enabled= true      service_scheduler_enabled= true      sysbatch_scheduler_enabled= true    }  }}

Client Heartbeats

This is an advanced topic. It is most beneficial to clusters over 1,000nodes or with unreliable networks or nodes (eg some edge deployments).

Nomad Clients periodically heartbeat to Nomad Servers to confirm they areoperating as expected. Nomad Clients which do not heartbeat in the specifiedamount of time are considereddown and their allocations are marked aslostordisconnected (ifdisconnect.lost_after is set)and replaced.

The various heartbeat related parameters allow you to tune the followingtradeoffs:

The longer the heartbeat period, the longer Nomad takes to replace adownClient's workload.
The shorter the heartbeat period, the more likely transient network issues,leader elections, and other temporary issues could cause a perfectlyfunctional Client and its workloads to be marked asdown and the workreplaced.

While Nomad Clients can connect to any Server, all heartbeats are forwarded tothe leader for processing. Since this heartbeat processing consumes resources,Nomad adjusts the rate at which Clients heartbeat based on cluster size. Thegoal is to try to keep the resource cost of processing heartbeats constantregardless of cluster size.

The base formula for determining how often a Client must heartbeat is:

<number of Clients> / <max_heartbeats_per_second>

Other factors modify this base TTL:

A random factor up to2x is added to the base TTL to prevent thethundering herd problem where a large number of clients attempt toheartbeat at exactly the same time.
min_heartbeat_ttl is used as the lower bound toprevent small clusters from sending excessive heartbeats.
heartbeat_grace is the amount ofextra time theleader will wait for a heartbeat beyond the base heartbeat.
After a leader election all Clients are given up tofailover_heartbeat_ttlto successfully heartbeat. This gives Clients time to discover a functioningServer in case they were directly connected to a leader that crashed.

For example, given the default values for heartbeat parameters, different sizedclusters will use the following TTLs for the heartbeats. Note that theServer TTLsimply adds theheartbeat_grace parameter to the TTL Clients are given.

Clients	Client TTL	Server TTL	Safe after elections
10	10s - 20s	20s - 30s	yes
100	10s - 20s	20s - 30s	yes
1000	20s - 40s	30s - 50s	yes
5000	100s - 200s	110s - 210s	yes
10000	200s - 400s	210s - 410s	NO

Regardless of size, all clients will have a Server TTL offailover_heartbeat_ttl after a leader election. It should always be largerthan the maximum Client TTL for your cluster size in order to prevent markinglive Clients asdown.

For clusters over 5000 Clients you should increasefailover_heartbeat_ttlusing the following formula:

(2 * (<number of Clients> / <max_heartbeats_per_second>)) + (10 * <min_heartbeat_ttl>) # For example with 6000 Clients:(2 * (6000 / 50)) + (10 * 10) = 340s (5m40s)

This ensures Clients have some additional time to failover even if they weretold to heartbeat after the maximum interval.

The actual value used should take into consideration how much tolerance yoursystem has for a delay in noticing crashed Clients. For example afailover_heartbeat_ttl of 30 minutes may give even the slowest clients in thelargest clusters ample time to heartbeat after an election. However if theelection was due to a datacenter-wide failure affecting Clients, it will be 30minutes before Nomad recognizes that they aredown and replaces theirwork.

Edit this page on GitHub

Movatterモバイル変換