RFC 9199 | Considerations for Large Auth DNS Ops | March 2022 |
Moura, et al. | Informational | [Page] |
Recent research work has explored the deployment characteristics andconfiguration of the Domain Name System (DNS). This documentsummarizes the conclusions from these research efforts and offersspecific, tangible considerations or advice to authoritative DNSserver operators. Authoritative server operators may wish to followthese considerations to improve their DNS services.¶
It is possible that the results presented in this document could beapplicable in a wider context than just the DNS protocol,as some of the results may generically apply toany stateless/short-duration anycasted service.¶
This document is not an IETF consensus document: it is published forinformational purposes.¶
This document is not an Internet Standards Track specification; it is published for informational purposes.¶
This is a contribution to the RFC Series, independently of any other RFC stream. The RFC Editor has chosen to publish this document at its discretion and makes no statement about its value for implementation or deployment. Documents approved for publication by the RFC Editor are not candidates for any level of Internet Standard; see Section 2 of RFC 7841.¶
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained athttps://www.rfc-editor.org/info/rfc9199.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶
This document summarizes recent research that explored thedeployed DNS configurations and offers derived, specific, tangibleadvice to DNS authoritative server operators (referred to as "DNS operators"hereafter). The considerations (C1-C6) presented in this document arebacked by peer-reviewed research, which used wide-scale Internetmeasurements to draw their conclusions. This document summarizes theresearch results and describes the resulting key engineering options.In each section, readers are pointed to the pertinent publications whereadditional details are presented.¶
These considerations are designed for operators of "large"authoritative DNS servers, which, in this context, are servers with a significant global user population, like top-level domain (TLD) operators, run by either a single operator ormultiple operators. Typically, these networks are deployed on wideanycast networks[RFC1546][AnyBest].These considerations may not beappropriate for smaller domains, such as those used by an organizationwith users in one unicast network or in a single city or region, whereoperational goals such as uniform, global low latency are lessrequired.¶
It is possible that the results presented in this document could beapplicable in a wider context than just the DNS protocol, as some ofthe results may generically apply to any stateless/short-durationanycasted service. Because the conclusions of the reviewed studiesdon't measure smaller networks, the wording in this documentconcentrates solely on discussing large-scale DNS authoritative services.¶
This document is not an IETF consensus document: it is published forinformational purposes.¶
The DNS has two main types of DNS servers: authoritative servers andrecursive resolvers, shown by a representational deployment model inFigure 1. An authoritative server (shown as AT1-AT4 inFigure 1) knows the content of a DNS zone and is responsible foranswering queries about that zone. It runs using local (possiblyautomatically updated) copies of the zone and does not need to queryother servers[RFC2181] in order to answer requests. A recursiveresolver (Re1-Re3) is a server that iteratively queries authoritativeand other servers to answer queries received from client requests[RFC1034]. A client typically employs a software library called a "stubresolver" ("stub" inFigure 1) to issue its query to the upstreamrecursive resolvers[RFC1034].¶
+-----+ +-----+ +-----+ +-----+ | AT1 | | AT2 | | AT3 | | AT4 | +-----+ +-----+ +-----+ +-----+ ^ ^ ^ ^ | | | | | +-----+ | | +------| Re1 |----+| | | +-----+ | | ^ | | | | | +----+ +----+ | +------|Re2 | |Re3 |------+ +----+ +----+ ^ ^ | | | +------+ | +-| stub |-+ +------+
DNS queries issued by a client contribute to a user's perceived latency and affect the user experience[Singla2014] dependingon how long it takes for responses to be returned. The DNS system hasbeen subject to repeated Denial-of-Service (DoS) attacks (for example,in November 2015[Moura16b]) in order to specifically degrade the userexperience.¶
To reduce latency and improve resiliency against DoS attacks, the DNSuses several types of service replication. Replication at theauthoritative server level can be achieved with the following:¶
In the next sections, we cover the specific considerations (C1-C6) forconclusions drawn within academic papers about large authoritativeDNS server operators. These considerations are conclusions reachedfrom academic work that authoritative server operators may wish toconsider in order to improve their DNS service. Each considerationoffers different improvements that may impact service latency,routing, anycast deployment, and defensive strategies, for example.¶
Authoritative DNS server operators announce their service using NSrecords[RFC1034]. Different authoritative servers for a given zoneshould return the same content; typically, they stay synchronized usingDNS zone transfers (authoritative transfer (AXFR)[RFC5936] and incremental zone transfer (IXFR)[RFC1995]), coordinatingthe zone data they all return to their clients.¶
As discussed above, the DNS heavily relies upon replication to supporthigh reliability, ensure capacity, and reduce latency[Moura16b].The DNS has two complementary mechanisms for service replication:name server replication (multiple NS records) and anycast (multiplephysical locations). Name server replication is strongly recommendedfor all zones (multiple NS records), and IP anycast is used by manylarger zones such as the DNS root[AnyFRoot], most top-leveldomains[Moura16b], and many large commercial enterprises, governments,and other organizations.¶
Most DNS operators strive to reduce service latency for users, whichis greatly affected by both of these replication techniques. However,because operators only have control over their authoritative serversand not over the client's recursive resolvers, it is difficult toensure that recursives will be served by the closest authoritativeserver. Server selection is ultimately up to the recursive resolver'ssoftware implementation, and different vendors and even differentreleases employ different criteria to choose the authoritative servers with which to communicate.¶
Understanding how recursive resolvers choose authoritative servers isa key step in improving the effectiveness of authoritative serverdeployments. To measure and evaluate server deployments,[Mueller17b] describes the deployment of seven unicast authoritative name servers indifferent global locations and then queried them from more than 9000Reseaux IP Europeens (RIPE) authoritative server operators and their respective recursiveresolvers.¶
It was found in[Mueller17b] that recursive resolvers in the wild query allavailable authoritative servers, regardless of the observedlatency. But the distribution of queries tends to be skewed towardsauthoritatives with lower latency: the lower the latency between arecursive resolver and an authoritative server, the more often therecursive will send queries to that server. These results wereobtained by aggregating results from all of the vantage points, andthey were not specific to any vendor or version.¶
The authors believe this behavior is a consequence of combining thetwo main criteria employed by resolvers when selecting authoritativeservers: resolvers regularly check all listed authoritative servers inan NS set to determine which is closer (the least latent), and when oneisn't available, it selects one of the alternatives.¶
For an authoritative DNS operator, this result means that the latencyof all authoritative servers (NS records) matter, so they all must besimilarly capable -- all available authoritatives will be queried bymost recursive resolvers. Unicasted services, unfortunately, cannotdeliver good latency worldwide (a unicast authoritative server inEurope will always have high latency to resolvers in California andAustralia, for example, given its geographicaldistance).¶
[Mueller17b] recommends that DNS operators deploy equallystrong IP anycast instances for every authoritative server (i.e., foreach NS record). Each large authoritative DNS server provider shouldphase out its usage of unicast and deploy a number of well-engineered anycast instances with good peering strategies so they can providegood latency to their global clients.¶
As a case study, the ".nl" TLD zone was originally served on sevenauthoritative servers with a mixed unicast/anycast setup. In early2018, .nl moved to a setup with 4 anycast authoritativeservers.¶
The contribution of[Mueller17b] to DNS service engineering shows thatbecause unicast cannot deliver good latency worldwide, anycast needsto be used to provide a low-latency service worldwide.¶
When selecting an anycast DNS provider or setting up an anycastservice, choosing the best number of anycast instances[RFC4786][RFC7094] todeploy is a challenging problem. Selecting the right quantity and set of global locations that should send BGP announcements is tricky. Intuitively, onecould naively think that more instances are better and that simply "more" will always lead to shorter response times.¶
This is not necessarily true, however. In fact, proper route engineering can matter more than the total number of locations, as found in[Schmidt17a]. To study the relationship between the number ofanycast instances and the associated service performance, the authors measured the round-trip time (RTT) latency of four DNS root servers. The root DNS servers are implemented by 12 separateorganizations serving the DNS root zone at 13 different IPv4/IPv6address pairs.¶
The results documented in[Schmidt17a] measured the performance ofthe {c,f,k,l}.root-servers.net (referred to as "C", "F", "K", and "L" hereafter)servers from more than 7,900 RIPE Atlas probes. RIPE Atlas is anInternet measurement platform with more than 12,000 global vantagepoints called "Atlas probes", and it is used regularly by bothresearchers and operators[RipeAtlas15a][RipeAtlas19a].¶
In[Schmidt17a], the authors found that the C server, a smaller anycast deploymentconsisting of only 8 instances, provided very similar overallperformance in comparison to the much larger deployments of K and L,with 33 and 144 instances, respectively. The median RTTs for the C, K, and Lroot servers were all between 30-32 ms.¶
Because RIPE Atlas is known to have better coverage in Europe thanother regions, the authors specifically analyzed the results perregion and per country (Figure 5 in[Schmidt17a]) and show thatknown Atlas bias toward Europe does not change the conclusion thatproperly selected anycast locations are more important to latency thanthe number of sites.¶
The important conclusion from[Schmidt17a] is that when engineeringanycast services for performance, factors other than just the numberof instances (such as local routing connectivity) must be considered.Specifically, optimizing routing policies is more important thansimply adding new instances. The authors showed that 12 instances canprovide reasonable latency, assuming they are globally distributed andhave good local interconnectivity. However, additional instances canstill be useful for other reasons, such as when handlingDoS attacks[Moura16b].¶
An anycast DNS service may be deployed from anywhere and from severallocations to hundreds of locations (for example, l.root-servers.nethas over 150 anycast instances at the time this was written). Anycastleverages Internet routing to distribute incoming queries to aservice's nearest distributed anycast locations measured by the number of routing hops. However, queries are usually not evenly distributed across all anycast locations, asfound in the case of L-Root when analyzed using Hedgehog[IcannHedgehog].¶
Adding locations to or removing locations from a deployed anycastnetwork changes the load distribution across all of itslocations. When a new location is announced by BGP, locations mayreceive more or less traffic than it was engineered for, leading tosuboptimal service performance or even stressing some locations whileleaving others underutilized. Operators constantly face this scenario when expanding an anycast service. Operators cannot easilydirectly estimate future query distributions based on proposed anycast network engineering decisions.¶
To address this need and estimate the query loads of an anycast service undergoing changes (in particular expanding),[Vries17b] describes the development of a new technique enabling operators to carry out activemeasurements using an open-source tool called Verfploeter (availableat[VerfSrc]). The results allow the creation of detailed anycastmaps and catchment estimates. By running Verfploeter combined with apublished IPv4 "hit list", the DNS can precisely calculate which remoteprefixes will be matched to each anycast instance in a network. Atthe time of this writing, Verfploeter still does not support IPv6 asthe IPv4 hit lists used are generated via frequent large-scale ICMPecho scans, which is not possible using IPv6.¶
As proof of concept,[Vries17b] documents how Verfploeter was used to predict both the catchment and query load distribution for a new anycast instance deployed for b.root-servers.net. Using twoanycast test instances in Miami (MIA) and Los Angeles (LAX), an ICMPecho query was sent from an IP anycast address to each IPv4 /24 network routing block on the Internet.¶
The ICMP echo responses were recorded at both sites and analyzed andoverlaid onto a graphical world map, resulting in an Internet-scalecatchment map. To calculate expected load once the production networkwas enabled, the quantity of traffic received by b.root-servers.net'ssingle site at LAX was recorded based on a single day's traffic(2017-04-12, "day in the life" (DITL) datasets[Ditl17]). In[Vries17b], it was predicted that81.6% of the traffic load would remain at the LAX site. This Verfploeter estimateturned out to be very accurate; the actual measured traffic volume when production service at MIA was enabled was 81.4%.¶
Verfploeter can also be used to estimate traffic shifts based on otherBGP route engineering techniques (for example, Autonomous System (AS) path prepending orBGP community use) in advance of operational deployment. This was studied in[Vries17b] using prepending with 1-3 hops at each instance, andthe results were compared against real operational changes to validate theaccuracy of the techniques.¶
An important operational takeaway[Vries17b] provides is how DNS operators can make informed engineering choices when changing DNSanycast network deployments by using Verfploeter in advance.Operators can identify suboptimal routing situations in advance withsignificantly better coverage rather than using other active measurementplatforms such as RIPE Atlas. To date, Verfploeter has been deployedon an operational testbed (anycast testbed)[AnyTest] on a largeunnamed operator and is run daily at b.root-servers.net[Vries17b].¶
Operators should use active measurement techniques like Verfploeter inadvance of potential anycast network changes to accurately measure thebenefits and potential issues ahead of time.¶
DDoS attacks are becoming bigger, cheaper, and more frequent[Moura16b]. The most powerful recorded DDoS attack against DNSservers to date reached 1.2 Tbps by using Internet of Things (IoT) devices[Perlroth16].How should a DNS operator engineer its anycastauthoritative DNS server to react to such a DDoS attack?[Moura16b]investigates this question using empirical observations grounded with theoretical option evaluations.¶
An authoritative DNS server deployed using anycast will have manyserver instances distributed over many networks. Ultimately, therelationship between the DNS provider's network and a client's ISPwill determine which anycast instance will answer queries for a givenclient, given that the BGP protocol maps clients to specificanycast instances using routing information. As aconsequence, when an anycast authoritative server is under attack, theload that each anycast instance receives is likely to be unevenlydistributed (a function of the source of the attacks); thus, someinstances may be more overloaded than others, which is what wasobserved when analyzing the root DNS events of November 2015[Moura16b]. Given the fact that different instances may havedifferent capacities (bandwidth, CPU, etc.), making a decision about how to react to stress becomes even more difficult.¶
In practice, when an anycast instance is overloaded with incoming traffic,operators have two options:¶
[Moura16b] describes seeing both of these behaviors deployed in practice when studying instance reachability and RTTs in the DNSroot events. When withdraw strategies were deployed, the stress ofincreased query loads were displaced from one instance to multipleother sites. In other observed events, one site was left to absorbthe brunt of an attack, leaving the other sites to remain relativelyless affected.¶
Operators should consider having both an anycast site withdraw strategyand an absorption strategy ready to be used before a network overloadoccurs. Operators should be able to deploy one or both of thesestrategies rapidly. Ideally, these should be encoded into operatingplaybooks with defined site measurement guidelines for which strategyto employ based on measured data from past events.¶
[Moura16b] speculates that careful, explicit, and automatedmanagement policies may provide stronger defenses to overloadevents. DNS operators should be ready to employ both commonfiltering approaches and other routing load-balancing techniques(such as withdrawing routes, prepending Autonomous Systems (ASes), adding communities, or isolating instances),where the best choice depends on the specifics of the attack.¶
Note that this consideration refers to the operation of just oneanycast service point, i.e., just one anycasted IP address blockcovering one NS record. However, DNS zones with multiple authoritativeanycast servers may also expect loads to shift from one anycastedserver to another, as resolvers switch from one authoritative servicepoint to another when attempting to resolve a name[Mueller17b].¶
Caching is the cornerstone of good DNS performance and reliability. A50 ms response to a new DNS query may be considered fast, but a response of lessthan 1 ms to a cached entry is far faster. In[Moura18b], it wasshown that caching also protects users from short outages and even significant DDoS attacks.¶
Time-to-live (TTL) values[RFC1034][RFC1035] for DNS records directlycontrol cache durations and affect latency, resilience, and the roleof DNS in Content Delivery Network (CDN) server selection. Some early work modeled caches as afunction of their TTLs[Jung03a], and recent work has examined cacheinteractions with DNS[Moura18b], but until[Moura19b], no researchhad provided considerations about the benefits of various TTL valuechoices. To study this, Moura et al. [Moura19b] carried out ameasurement study investigating TTL choices and their impact on userexperiences in the wild. They performed this study independent ofspecific resolvers (and their caching architectures), vendors, orsetups.¶
First, they identified several reasons why operators and zone owners maywant to choose longer or shorter TTLs:¶
Given these considerations, the proper choice for a TTL depends inpart on multiple external factors -- no single recommendation isappropriate for all scenarios. Organizations must weigh thesetrade-offs and find a good balance for their situation. Still, someguidelines can be reached when choosing TTLs:¶
Multiple record types exist or are related between the parent of azone and the child. At a minimum, NS records are supposed to beidentical in the parent (but often are not), as are corresponding IPaddresses in "glue" A/AAAA records that must exist for in-bailiwickauthoritative servers. Additionally, if DNSSEC[RFC4033][RFC4034][RFC4035][RFC4509] is deployed for a zone, theparent's DS record must cryptographically refer to a child's DNSKEYrecord.¶
Because some information exists in both the parent and a child, it ispossible for the TTL values to differ between the parent's copy andthe child's.[Moura19b] examines resolver behaviors when thesevalues differed in the wild, as they frequently do -- often, parent zoneshave de facto TTL values that a child has no control over. Forexample, NS records for TLDs in the root zone are all set to 2 days(48 hours), but some TLDs have lower values within their publishedrecords (the TTLs for .cl's NS records from their authoritativeservers is 1 hour).[Moura19b] also examines the differences in theTTLs between the NS records and the corresponding A/AAAA records forthe addresses of a name server. RIPE Atlas nodes are used to determinewhat resolvers in the wild do with different information and whetherthe parent's TTL is used for cache lifetimes ("parent-centric") orthe child's ("child-centric").¶
[Moura19b] found that roughly 90% of resolvers follow the child'sview of the TTL, while 10% appear parent-centric. Additionally, itfound that resolvers behave differently for cache lifetimes forin-bailiwick vs. out-of-bailiwick NS/A/AAAA TTL combinations.Specifically, when NS TTLs are shorter than the corresponding addressrecords, most resolvers will requery for A/AAAA records for thein-bailiwick resolvers and switch to new address records even if thecache indicates the original A/AAAA records could be kept longer. Onthe other hand, the inverse is true for out-of-bailiwick resolvers: ifthe NS record expires first, resolvers will honor the original cachetime of the name server's address.¶
The important conclusion from this study is that operators cannotdepend on their published TTL values alone -- the parent's values arealso used for timing cache entries in the wild. Operators that areplanning on infrastructure changes should assume that an olderinfrastructure must be left on and operational for at least themaximum of both the parent and child's TTLs.¶
This document discusses applying measured research results tooperational deployments. Most of the considerations affect mostlyoperational practice, though a few do have security-related impacts.¶
Specifically,C4 discusses a couple of strategies to employ when aservice is under stress from DDoS attacks and offers operatorsadditional guidance when handling excess traffic.¶
Similarly,C5 identifies the trade-offs with respect to theoperational and security benefits of using longer TTL values.¶
This document does not add any new, practical privacy issues, asidefrom possible benefits in deploying longer TTLs as suggested inC5.Longer TTLs may help preserve a user's privacy by reducing the numberof requests that get transmitted in both client-to-resolver andresolver-to-authoritative cases.¶
This document has no IANA actions.¶
We would like to thank the reviewers of this document who offeredvaluable suggestions as well as comments at the IETF DNSOPsession (IETF 104):Duane Wessels,Joe Abley,Toema Gavrichenkov,John Levine,Michael StJohns,Kristof Tuyteleers,Stefan Ubbink,Klaus Darilion, andSamir Jafferali.¶
Additionally, we would like thank those acknowledged in the papersthis document summarizes for helping produce the results: RIPE NCC andDNS OARC for their tools and datasets used in this research, as wellas the funding agencies sponsoring the individual research.¶
This document is a summary of the main considerations of six research papers written by the authors and the following people who contributed substantially to the content and should be considered coauthors; this document would nothave been possible without their hard work:¶