Movatterモバイル変換


[0]ホーム

URL:


RFC 9318Measuring Network Quality for End-UsersOctober 2022
Hardaker & ShapiraInformational[Page]
Stream:
Internet Architecture Board (IAB)
RFC:
9318
Category:
Informational
Published:
ISSN:
2070-1721
Authors:
W. Hardaker
O. Shapira

RFC 9318

IAB Workshop Report: Measuring Network Quality for End-Users

Abstract

The Measuring Network Quality for End-Users workshop was heldvirtually by the Internet Architecture Board (IAB) on September 14-16, 2021.This report summarizes the workshop, the topics discussed, and some preliminary conclusions drawn at the end of the workshop.

Note that this document is a report on the proceedings of the workshop. The views and positions documented in this report are those of the workshop participants and do not necessarily reflect IAB views and positions.

Status of This Memo

This document is not an Internet Standards Track specification; it is published for informational purposes.

This document is a product of the Internet Architecture Board (IAB) and represents information that the IAB has deemed valuable to provide for permanent record. It represents the consensus of the Internet Architecture Board (IAB). Documents approved for publication by the IAB are not candidates for any level of Internet Standard; see Section 2 of RFC 7841.

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained athttps://www.rfc-editor.org/info/rfc9318.

Copyright Notice

Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.

Table of Contents

1.Introduction

The Internet Architecture Board (IAB) holds occasional workshops designed toconsider long-term issues and strategies for the Internet, and to suggestfuture directions for the Internet architecture. This long-term planningfunction of the IAB is complementary to the ongoing engineering efforts performed by working groups of the Internet Engineering Task Force (IETF).

The Measuring Network Quality for End-Users workshop[WORKSHOP] was heldvirtually by the Internet Architecture Board (IAB) on September 14-16, 2021.This report summarizes the workshop, the topics discussed, and some preliminaryconclusions drawn at the end of the workshop.

1.1.Problem Space

The Internet in 2021 is quite different from what it was 10 years ago. Today, itis a crucial part of everyone's daily life. People use the Internet for theirsocial life, for their daily jobs, for routine shopping, and for keeping upwith major events. An increasing number of people can access a gigabitconnection, which would be hard to imagine a decade ago. Additionally, thanks toimprovements in security, people trust the Internet for financialbanking transactions, purchasing goods, and everyday bill payments.

At the same time, some aspects of the end-user experience have notimproved as much. Many users have typical connection latencies thatremain at decade-old levels. Despite significant reliabilityimprovements in data center environments, end users also still often seeinterruptions in service. Despite algorithmic advances in the field ofcontrol theory, one still finds that the queuing delays in thelast-mile equipment exceeds the accumulated transit delays. Transportimprovements, such as QUIC, Multipath TCP, and TCP Fast Open, are stillnot fully supported in some networks. Likewise, various advances inthe security and privacy of user data are not widely supported, suchas encrypted DNS to the local resolver.

Some of the major factors behind this lack of progress is the popularperception that throughput is often the sole measure of the quality ofInternet connectivity. With such a narrow focus, the Measuring NetworkQuality for End-Users workshop aimed to discuss various topics:

  • What is user latency under typical working conditions?
  • How reliable is connectivity across longer time periods?
  • Do networks allow the use of a broad range of protocols?
  • What services can be run by network clients?
  • What kind of IPv4, NAT, or IPv6 connectivity is offered, and are therefirewalls?
  • What security mechanisms are available for local services, such as DNS?
  • To what degree are the privacy, confidentiality, integrity, and authenticity of user communications guarded?
  • Improving these aspects of network quality will likely depend onmeasuring and exposing metrics in a meaningful way to all involvedparties, including to end users. Such measurement and exposure ofthe right metrics will allow service providers and network operatorsto concentrate focus on their users' experience and willsimultaneously empower users to choose the Internet ServiceProviders (ISPs) that can deliver the best experience based on their needs.
  • What are the fundamental properties of a network that contributes toa good user experience?
  • What metrics quantify these properties, and how can we collect such metrics in apractical way?
  • What are the best practices for interpreting those metrics and incorporatingthem in a decision-making process?
  • What are the best ways to communicate these properties to service providersand network operators?
  • How can these metrics be displayed to users in a meaningful way?

2.Workshop Agenda

The Measuring Network Quality for End-Users workshop was divided into thefollowing main topic areas; see further discussion in Sections4 and5:

3.Position Papers

The following position papers were received for consideration by theworkshop attendees. The workshop's web page[WORKSHOP] containsarchives of the papers, presentations, and recorded videos.

4.Workshop Topics and Discussion

The agenda for the three-day workshop was broken into four separatesections that each played a role in framing the discussions. Theworkshop started with a series of introduction and problem spacepresentations (Section 4.1), followed by metrics considerations(Section 4.2), cross-layer considerations(Section 4.3), and a synthesis discussion (Section 4.4).After the four subsections concluded, a follow-on discussion was heldto draw conclusions that could be agreed upon by workshop participants(Section 5).

4.1.Introduction and Overviews

The workshop started with a broad focus on the state of user Qualityof Service (QoS) and Quality of Experience (QoE) on the Internet today.The goal of the introductory talks was to set the stage for theworkshop by describing both the problem space and the currentsolutions in place and their limitations.

The introduction presentations provided views of existing QoS and QoEmeasurements and their effectiveness. Also discussed was theinteraction between multiple users within the network, as well as theinteraction between multiple layers of the OSI stack. Vint Cerfprovided a keynote describing the history and importance of thetopic.

4.1.1.Key Points from the Keynote by Vint Cerf

We may be operating in a networking space with dramatically differentparameters compared to 30 years ago. This differentiation justifiesreconsidering not only the importance of one metric over the otherbut also reconsidering the entire metaphor.

It is time for the experts to look at not only adjusting TCP butalso exploring other protocols, such as QUIC has done lately. It'simportant that we feel free to consider alternatives to TCP. TCP isnot a teddy bear, and one should not be afraid to replace it with atransport layer with better properties that better benefit its users.

A suggestion: we should consider exercises to identify desirableproperties. As we are looking at the parametric spaces, one canidentify "desirable properties", as opposed to "fundamentalproperties", for example, a low-latency property. An example comingfrom the Advanced Research Projects Agency (ARPA): you want to know where the missile is now, not where itwas. Understanding drives particular parameter creation and selectionin the design space.

When parameter values are changed in extreme, such as connectiveness,alternative designs will emerge. One case study of note is theinterplanetary protocol, where "ping" is no longer indicative ofanything useful. While we look at responsiveness, we should not ignoreconnectivity.

Unfortunately, maintaining backward compatibility is painful. The workon designing IPv6 so as to transition from IPv4 could have been donebetter if the backward compatibility was considered.It is too late for IPv6, but it is not too late to consider this issue for potential future problems.

IPv6 is still not implemented fully everywhere. It's been a long roadto deployment since starting work in 1996, and we are still notthere. In 1996, the thinking was that it was quite easy to implementIPv6, but that failed to hold true. In 1996, the dot-com boom began,where a lot of money was spent quickly, and the moment was not caught intime while the market expanded exponentially. This should serve as acautionary tale.

One last point: consider performance across multiple hops in theInternet. We've not seen many end-to-end metrics, as successfullydeveloping end-to-end measurements across different network andbusiness boundaries is quite hard to achieve. A good question to askwhen developing new protocols is "will the new protocol work acrossmultiple network hops?"

Multi-hop networks are being gradually replaced by humongous, flatnetworks with sufficient connectivity between operators so thatsystems become 1 hop, or 2 hops at most, away from each other(e.g., Google, Facebook, and Amazon). The fundamental architecture of theInternet is changing.

4.1.2.Introductory Talks

The Internet is a shared network built on IP protocols usingpacket switching to interconnect multiple autonomous networks. TheInternet's departure from circuit-switching technologies allowed it toscale beyond any other known network design. On the other hand, thelack of in-network regulation made it difficult to ensure the bestexperience for every user.

As Internet use cases continue to expand, it becomes increasingly moredifficult to predict which network characteristics correlate withbetter user experiences. Different application classes, e.g., videostreaming and teleconferencing, can affect user experience in ways that are complexand difficult to measure. Internet utilization shifts rapidlyduring the course of each day, week, and year, which furthercomplicates identifying key metrics capable of predicting a good userexperience.

QoS initiatives attempted to overcome thesedifficulties by strictly prioritizing different types oftraffic. However, QoS metrics do not always correlate with userexperience. The utility of the QoS metric is further limited by thedifficulties in building solutions with the desired QoScharacteristics.

QoE initiatives attempted to integrate thepsychological aspects of how quality is perceived and createstatistical models designed to optimize the user experience. Despitethese high modeling efforts, the QoE approach proved beneficial incertain application classes. Unfortunately, generalizing the modelsproved to be difficult, and the question of how different applicationsaffect each other when sharing the same network remains an open problem.

The industry's focus on giving the end user more throughput/bandwidthled to remarkable advances. In many places around the world, a homeuser enjoys gigabit speeds to their ISP. Thisis so remarkable that it would have been brushed off as sciencefiction a decade ago. However, the focus on increased capacity came atthe expense of neglecting another important core metric: latency. Asa result, end users whose experience is negatively affected by highlatency were advised to upgrade their equipment to get morethroughput instead.[MacMillian2021] showed that sometimes such anupgrade can lead to latency improvements, due to the economicalreasons of overselling the "value-priced" data plans.

As the industry continued to give end users more throughput, whilemostly neglecting latency concerns, application designs started toemploy various latency and short service disruption hiding techniques.For example, a user's web browser performance experience is closelytied to the content in the browser's local cache. While suchtechniques can clearly improve the user experience when using staledata is possible, this development further decouples user experiencefrom core metrics.

In the most recent 10 years, efforts by Dave Taht and the bufferbloatsociety have led to significant progress in updating queuing algorithms toreduce latencies under load compared to simpler FIFOqueues. Unfortunately, the home router industry has yet to implementthese algorithms, mostly due to marketing and cost concerns. Most homerouter manufacturers depend on System on a Chip (SoC) acceleration tocreate products with a desired throughput. SoC manufacturers opt forsimpler algorithms and aggressive aggregation, reasoning that ahigher-throughput chip will have guaranteed demand. Because consumersare offered choices primarily among different high-throughput devices,the perception that a higher throughput leads to higher a QoS continues to strengthen.

The home router is not the only place that can benefit from clearer indications of acceptable performance for users.Since users perceive the Internet via the lens of applications, itis important that we call upon application vendors to adopt solutionsthat stress lower latencies. Unfortunately, while bandwidth is straightforward tomeasure, responsiveness is trickier. Many applications have found aset of metrics that are helpful to their realm but do not generalizewell and cannot become universally applicable. Furthermore, due to thehighly competitive application space, vendors may have economicreasons to avoid sharing their most useful metrics.

4.1.3.Introductory Talks - Key Points

  1. Measuring bandwidth is necessary but is not alone sufficient.
  2. In many cases, Internet users don't need more bandwidth but rather need "better bandwidth", i.e., they need other connectivity improvements.
  3. Users perceive the quality of their Internet connection basedon the applications they use, which are affected by a combinationof factors. There's little value in exposing a typical user to theentire spectrum of possible reasons for the poor performanceperceived in their application-centric view.
  4. Many factors affecting user experience are outside the users'sphere of control. It's unclear whether exposing users to theseother factors will help them understand the state of their networkperformance. In general, users prefer simple, categoricalchoices (e.g., "good", "better", and "best" options).
  5. The Internet content market is highly competitive, and manyapplications develop their own "secret sauce".

4.2.Metrics Considerations

In the second agenda section, the workshop continued its discussionabout metrics that can be used instead of or in addition to availablebandwidth. Several workshop attendees presented deep-dive studies onmeasurement methodology.

4.2.1.Common Performance Metrics

Losing Internet access entirely is, of course, the worst userexperience. Unfortunately, unless rebooting the home router restoresconnectivity, there is little a user can do other than contactingtheir service provider. Nevertheless, there is value in the systematiccollection of availability metrics on the client side; these can helpthe user's ISP localize and resolve issues faster while enablingusers to better choose between ISPs. One can measure availabilitydirectly by simply attempting connections from the client side todistant locations of interest. For example, Ookla's[Speedtest]uses a large number of Android devices to measure network and cellularavailability around the globe. Ookla collects hundreds of millions ofdata points per day and uses these for accurate availabilityreporting. An alternative approach is to derive availability from thefailure rates of other tests. For example,[FCC_MBA] and[FCC_MBA_methodology] use thousands of off-the-shelf routers, with measurement software developed by[SamKnows]. These routers perform an array of network tests andreport availability based on whether test connections were successful ornot.

Measuring available capacity can be helpful to end users, but it iseven more valuable for service providers and applicationdevelopers. High-definition video streaming requires significantlymore capacity than any other type of traffic. At the time of theworkshop, video traffic constituted 90% of overall Internet trafficand contributed to 95% of the revenues from monetization (viasubscriptions, fees, or ads). As a result, video streaming services,such as Netflix, need to continuously cope with rapid changes inavailable capacity. The ability to measure available capacity inreal time leverages the different adaptive bitrate (ABR) compressionalgorithms to ensure the best possible user experience. Measuringaggregated capacity demand allows ISPs to beready for traffic spikes. For example, during the end-of-year holidayseason, the global demand for capacity has been shown to be 5-7 timeshigher than during other seasons. For end users, knowledge of theircapacity needs can help them select the best data plan given theirintended usage. In many cases, however, end users have more thanenough capacity, and adding more bandwidth will not improve theirexperience -- after a point, it is no longer the limiting factor inuser experience. Finally, the ability to differentiate between the"throughput" and the "goodput" can be helpful in identifying when thenetwork is saturated.

In measuring network quality, latency is defined as the time it takesa packet to traverse a network path from one end to the other. At thetime of this report, users in many places worldwide can enjoy Internetaccess that has adequately high capacity and availability for theircurrent needs. For these users, latency improvements, rather thanbandwidth improvements, can lead to the most significant improvementsin QoE. The established latency metric is around-trip time (RTT), commonly measured in milliseconds. However,users often find RTT values unintuitive since, unlike otherperformance metrics, high RTT values indicate poor latency and userstypically understand higher scores to be better. To address this,[Paasch2021] and[Mathis2021] present an inverse metric, called"Round-trips Per Minute" (RPM).

There is an important distinction between "idle latency" and "latencyunder working conditions". The former is measured when the network isunderused and reflects a best-case scenario. The latter is measuredwhen the network is under a typical workload. Until recently, typicaltools reported a network's idle latency, which can be misleading. Forexample, data presented at the workshop shows that idle latencies canbe up to 25 times lower than the latency under typical workingloads. Because of this, it is essential to make a clear distinctionbetween the two when presenting latency to end users.

Data shows that rapid changes in capacity affectlatency.[Foulkes2021] attempts to quantify how often a rapid changein capacity can cause network connectivity to become "unstable" (i.e.,having high latency with very little throughput). Such changes incapacity can be caused by infrastructure failures but are much moreoften caused by in-network phenomena, like changing trafficengineering policies or rapid changes in cross-traffic.

Data presented at the workshop shows that 36% of measured lines havecapacity metrics that vary by more than 10% throughout the day andacross multiple days. These differences are caused by many variables,including local connectivity methods (Wi-Fi vs. Ethernet), competingLAN traffic, device load/configuration, time of day, and localloop/backhaul capacity. These factor variations make measuringcapacity using only an end-user device or other end-networkmeasurement difficult. A network router seeing aggregated traffic frommultiple devices provides a better vantage point for capacitymeasurements. Such a test can account for the totality of localtraffic and perform an independent capacity test. However, variousfactors might still limit the accuracy of such a test. Accurate capacity measurement requires multiple samples.

As users perceive the Internet through the lens of applications, itmay be difficult to correlate changes in capacity and latency with thequality of the end-user experience. For example, web browsers rely oncached page versions to shorten page load times and mitigateconnectivity losses. In addition, social networking applications oftenrely on prefetching their "feed" items. These techniques make thecore in-network metrics less indicative of the users' experience andnecessitates collecting data from the end-user applications themselves.

It is helpful to distinguish between applications that operate on a"fixed latency budget" from those that have more tolerance to latencyvariance. Cloud gaming serves as an example application that requiresa "fixed latency budget", as a sudden latency spike can decide the"win/lose" ratio for a player. Companies that compete in the lucrativecloud gaming market make significant infrastructure investments, suchas building entire data centers closer to their users. These datacenters highlight the economic benefit that lower numbers of latencyspikes outweigh the associated deployment costs. On the other hand,applications that are more tolerant to latency spikes can continue tooperate reasonably well through short spikes. Yet, even thoseapplications can benefit from consistently low latency depending onusage shifts. For example, Video-on-Demand (VOD) apps can workreasonably well when the video is consumed linearly, but once the usertries to "switch a channel" or to "skip ahead", the user experiencesuffers unless the latency is sufficiently low.

Finally, as applications continue to evolve, in-application metricsare gaining in importance. For example, VOD applications can assessthe QoE by application-specific metrics, such aswhether the video player is able to use the highest possibleresolution, identifying when the video is smooth or freezing, or othersimilar metrics. Application developers can then effectively use thesemetrics to prioritize future work. All popular video platforms(YouTube, Instagram, Netflix, and others) have developed frameworks tocollect and analyze VOD metrics at scale. One example is the Scubaframework used by Meta[Scuba].

Unfortunately, in-application metrics can be challenging to usefor comparative research purposes. First, different applicationsoften use different metrics to measure the same phenomena. Forexample, application A may measure the smoothness of video via "meantime to rebuffer", while application B may rely on the "probabilityof rebuffering per second" for the same purpose. A differentchallenge with in-application metrics is that VOD is a significant sourceof revenue for companies, such as YouTube, Facebook, and Netflix,placing a proprietary incentive against exchanging the in-applicationdata. A final concern centers on the privacy issues resulting fromin-application metrics that accurately describe the activities andpreferences of an individual end user.

4.2.2.Availability Metrics

Availability is simply defined as whether or not a packet can be sentand then received by its intended recipient. Availability is naivelythought to be the simplest to measure, but it is more complex whenconsidering that continual, instantaneous measurements would be neededto detect the smallest of outages. Also difficult is determining theroot cause of infallibility: was the user's line down, was something inthe middle of the network, or was it the service with which the userwas attempting to communicate?

4.2.3.Capacity Metrics

If the network capacity does not meet user demands, the network qualitywill be impacted. Once the capacity meets the demands, increasing capacitywon't lead to further quality improvements.

The actual network connection capacity is determined by the equipment and thelines along the network path, and it varies throughout the day and acrossmultiple days. Studies involving DSL lines in North America indicate that over30% of the DSL lines have capacity metrics that vary by more than 10%throughout the day and across multiple days.

Some factors that affect the actual capacity are:

  1. Presence of a competing traffic, either in the LAN or in the WANenvironments. In the LAN setting, the competing traffic reflects themultiple devices that share the Internet connection. In the WAN setting, thecompeting traffic often originates from the unrelated network flows thathappen to share the same network path.
  2. Capabilities of the equipment along the path of the network connection,including the data transfer rate and the amount of memory used forbuffering.
  3. Active traffic management measures, such as traffic shapers and policersthat are often used by the network providers.

There are other factors that can negatively affect the actual line capacities.

The user demands of the traffic follow the usage patterns and preferences ofthe particular users. For example, large data transfers can use any availablecapacity, while the media streaming applications require limited capacity tofunction correctly. Videoconferencing applications typically need lesscapacity than high-definition video streaming.

4.2.4.Latency Metrics

End-to-end latency is the time that a particular packet takes to traverse thenetwork path from the user to their destination and back. The end-to-endlatency comprises several components:

  1. The propagation delay, which reflects the path distance and the individuallink technologies (e.g., fiber vs. satellite). The propagation doesn't dependon the utilization of the network, to the extent that the network pathremains constant.
  2. The buffering delay, which reflects the time segments spent in the memory ofthe network equipment that connect the individual network links, as well asin the memory of the transmitting endpoint. The buffering delay depends onthe network utilization, as well as on the algorithms that govern the queued segments.
  3. The transport protocol delays, which reflect the time spent inretransmission and reassembly, as well as the time spent when the transportis "head-of-line blocked".
  4. Some of the workshop submissions that have explicitly called out the applicationdelay, which reflects the inefficiencies in the application layer.

Typically, end-to-end latency is measured when the network isidle. Results of such measurements mostly reflect the propagationdelay but not other kinds of delay. This report uses the term "idlelatency" to refer to results achieved under idle network conditions.

Alternatively, if the latency is measured when the network is underits typical working conditions, the results reflect multiple types ofdelays. This report uses the term "working latency" to refer to suchresults. Other sources use the term "latency under load" (LUL) as asynonym.

Data presented at the workshop reveals a substantial differencebetween the idle latency and the working latency. Depending on thetraffic direction and the technology type, the working latency isbetween 6 to 25 times higher than the idle latency:

Table 1
DirectionTechnology TypeWorking LatencyIdle LatencyWorking - Idle DifferenceWorking / Idle Ratio
DownstreamFTTH1481013815
DownstreamCable10313908
DownstreamDSL1941018419
UpstreamFTTH2071219517
UpstreamCable176271496
UpstreamDSL6862765925

While historically the tooling available for measuring latency focusedon measuring the idle latency, there is a trend in the industry tostart measuring the working latency as well,e.g., Apple's[NetworkQuality].

4.2.5.Measurement Case Studies

The participants have proposed several concrete methodologies formeasuring the network quality for the end users.

[Paasch2021] introduced a methodology for measuring working latencyfrom the end-user vantage point. The suggested method incrementallyadds network flows between the user device and a server endpoint untila bottleneck capacity is reached. From these measurements, a round-triplatency is measured and reported to the end user. The authorschose to report results with the RPM metric. The methodology had beenimplemented in Apple's macOS Monterey.

[Mathis2021] applied the RPM metric to the results of more than4 billion download tests that M-Lab performed from 2010-2021. Duringthis time frame, the M-Lab measurement platform underwent severalupgrades that allowed the research team to compare the effect ofdifferent TCP congestion control algorithms (CCAs) on the measuredend-to-end latency. The study showed that the use of cubic CCA leads toincreased working latency, which is attributed to its use of largerqueues.

[Schlinker2019] presented a large-scale study that aimed toestablish a correlation between goodput and QoE on alarge social network. The authors performed the measurements atmultiple data centers from which video segments of set sizes werestreamed to a large number of end users. The authors used the goodputand throughput metrics to determine whether particular paths werecongested.

[Reed2021] presented the analysis of working latency measurements collected as part of the Measuring Broadband America (MBA)program by the Federal Communication Commission (FCC). The FCC does not include working latency in its yearly reportbut does offer it in the raw data files. The authors used asubset of the raw data to identify important differences in theworking latencies across different ISPs.

[MacMillian2021] presented analysis of working latency acrossmultiple service tiers. They found that, unsurprisingly, "premium"tier users experienced lower working latency compared to a "value"tier. The data demonstrated that working latency varies significantlywithin each tier; one possible explanation is the difference inequipment deployed in the homes.

These studies have stressed the importance of measurement of workinglatency. At the time of this report, many home router manufacturersrely on hardware-accelerated routing that uses FIFO queues. Focusingon measuring the working latency measurements on these devices andmaking the consumer aware of the effect of choosing one manufacturervs. another can help improve the home router situation. The idealtest would be able to identify the working latency and pinpointthe source of the delay (home router, ISP, server side, or some networknode in between).

Another source of high working latency comes from network routersexposed to cross-traffic. As[Schlinker2019] indicated, these canbecome saturated during the peak hours of the day. Systematic testingof the working latency in routers under load can help improve both ourunderstanding of latency and the impact of deployed infrastructure.

4.2.6.Metrics Key Points

The metrics for network quality can be roughly grouped into the following:

  1. Availability metrics, which indicate whether the user can accessthe network at all.
  2. Capacity metrics, which indicate whether the actual line capacity issufficient to meet the user's demands.
  3. Latency metrics, which indicate if the user gets the data in a timely fashion.
  4. Higher-order metrics, which include both the network metrics, such asinter-packet arrival time, and the application metrics, such as the meantime between rebuffering for video streaming.

The availability metrics can be seen as a derivative of either the capacity (zerocapacity leading to zero availability) or the latency (infinite latencyleading to zero availability).

Key points from the presentations and discussions included the following:

  1. Availability and capacity are "hygienic factors" -- unless anapplication is capable of using extra capacity, end users will seelittle benefit from using over-provisioned lines.
  2. Working latency has a stronger correlation with the user experiencethan latency under an idle network load. Working latency canexceed the idle latency by order of magnitude.
  3. The RPM metric is a stable metric, with positive values beingbetter, that may be more effective when communicating latency toend users.
  4. The relationship between throughput and goodput can be effective infinding the saturation points, both in client-side[Paasch2021]and server-side[Schlinker2019] settings.
  5. Working latency depends on the algorithm choice for addressing endpointcongestion control and router queuing.

Finally, it was commonly agreed to that the best metrics are thosethat are actionable.

4.3.Cross-Layer Considerations

In the cross-layer segment of the workshop, participants presentedmaterial on and discussed how to accurately measure exactly whereproblems occur. Discussion centered especially on the differencesbetween physically wired and wireless connections and the difficultiesof accurately determining problem spots when multiple different typesof network segments are responsible for the quality. As an example,[Kerpez2021] showed that a limited bandwidth of 2.4 Ghz Wi-Fi bottlenecks the most frequently. In comparison, the wider bandwidth ofthe 5 Ghz Wi-Fi has only bottlenecked in 20% of observations.

The participants agreed that no single component of a networkconnection has all the data required to measure the effects of thenetwork performance on the quality of the end-user experience.

  • Applications that are running on the end-user devices have the bestinsight into their respective performance but have limitedvisibility into the behavior of the network itself and are unableto act based on their limited perspective.
  • ISPs have good insight into QoSconsiderations but are not able to infer the effect of the QoSmetrics on the quality of end-user experiences.
  • Content providers have good insight into the aggregated behavior ofthe end users but lack the insight on what aspects of networkperformance are leading indicators of user behavior.

The workshop had identified the need for a standard and extensible wayto exchange network performance characteristics. Such an exchangestandard should address (at least) the following:

  • A scalable way to capture the performance of multiple (potentiallythousands of) endpoints.
  • The data exchange format should prevent data manipulation so thatthe different participants won't be able to game the mechanisms.
  • Preservation of end-user privacy. In particular, federated learningapproaches should be preferred so that no centralized entity has theaccess to the whole picture.
  • A transparent model for giving the different actors on a networkconnection an incentive to share the performance data they collect.
  • An accompanying set of tools to analyze the data.

4.3.1.Separation of Concerns

Commonly, there's a tight coupling between collecting performancemetrics, interpreting those metrics, and acting upon theinterpretation. Unfortunately, such a model is not the best forsuccessfully exchanging cross-layer data, as:

  • actors that are able to collect particular performance metrics(e.g., the TCP RTT) do not necessarily have the context necessary fora meaningful interpretation,
  • the actors that have the context and the computational/storagecapacity to interpret metrics do not necessarily have the ability tocontrol the behavior of the network/application, and
  • the actors that can control the behavior of networks and/orapplications typically do not have access to complete measurementdata.

The participants agreed that it is important to separate the abovethree aspects, so that:

  • the different actors that have the data, but not the ability tointerpret and/or act upon it, should publish their measured data and
  • the actors that have the expertise in interpreting and synthesizingperformance data should publish the results of their interpretations.

4.3.2.Security and Privacy Considerations

Preserving the privacy of Internet end users is a difficultrequirement to meet when addressing this problem space. There is anintrinsic trade-off between collecting more data about useractivities and infringing on their privacy while doing so.Participants agreed that observability across multiple layers isnecessary for an accurate measurement of the network quality, butdoing so in a way that minimizes privacy leakage is an open question.

4.3.3.Metric Measurement Considerations

  • The following TCP protocol metrics have been found to be effectiveand are available for passive measurement:

    • TCP connection latency measured using selective acknowledgment (SACK) or acknowledgment (ACK) timing, as well asthe timing between TCP retransmission events, are good proxies forend-to-end RTT measurements.
    • On the Linux platform, the tcp_info structure is the de factostandard for an application to inspect the performance ofkernel-space networking. However, there is no equivalentde facto standard for user-space networking.
  • The QUIC and MASQUE protocols make passive performance measurementsmore challenging.

    • An approach that uses federated measurement/hierarchicalaggregation may be more valuable for these protocols.
    • The QLOG format seems to be the most mature candidate for suchan exchange.

4.3.4.Towards Improving Future Cross-Layer Observability

The ownership of the Internet is spread across multiple administrativedomains, making measurement of end-to-end performance datadifficult. Furthermore, the immense scale of the Internet makesaggregation and analysis of this difficult.[Marx2021] presented asimple logging format that could potentially be used to collect andaggregate data from different layers.

Another aspect of the cross-layer collaboration hampering measurement isthat the majority of current algorithms do not explicitly provideperformance data that can be used in cross-layer analysis. The IETFcommunity could be more diligent in identifying each protocol's keyperformance indicators and exposing them as part of the protocolspecification.

Despite all these challenges, it should still be possible to performlimited-scope studies in order to have a better understanding of howuser quality is affected by the interaction of the differentcomponents that constitute the Internet. Furthermore, recentdevelopment of federated learning algorithms suggests that it might bepossible to perform cross-layer performance measurements whilepreserving user privacy.

4.3.5.Efficient Collaboration between Hardware and Transport Protocols

With the advent of the low latency, low loss, and scalable throughput(L4S) congestion notification and control, there is an even higherneed for the transport protocols and the underlying hardware to workin unison.

At the time of the workshop, the typical home router uses a singleFIFO queue that is large enough to allow amortizing the lower-layer headeroverhead across multiple transport PDUs. These designs worked wellwith the cubic congestion control algorithm, yet the newer generationof algorithms can operate on much smaller queues. To fully support latenciesless than 1 ms, the home router needs to work efficiently on sequentialtransmissions of just a few segments vs. being optimized for largepacket bursts.

Another design trait common in home routers is the use of packetaggregation to further amortize the overhead added by the lower-layerheaders. Specifically, multiple IP datagrams are combined into asingle, large transfer frame. However, this aggregation can add up to10 ms to the packet sojourn delay.

Following the famous "you can't improve what you don't measure" adage,it is important to expose these aggregation delays in a way that wouldallow identifying the source of the bottlenecks and making hardwaremore suitable for the next generation of transport protocols.

4.3.6.Cross-Layer Key Points

  • Significant differences exist in the characteristics of metrics to be measured and the required optimizations needed in wireless vs. wirednetworks.
  • Identification of an issue's root cause is hampered by thechallenges in measuring multi-segment network paths.
  • No single component of a network connection has all the datarequired to measure the effects of the complete network performanceon the quality of the end-user experience.
  • Actionable results require both proper collection and interpretation.
  • Coordination among network providers is important to successfullyimprove the measurement of end-user experiences.
  • Simultaneously providing accurate measurements while preservingend-user privacy is challenging.
  • Passive measurements from protocol implementations may providebeneficial data.

4.4.Synthesis

Finally, in the synthesis section of the workshop, the presentationsand discussions concentrated on the next steps likely needed to makeforward progress. Of particular concern is how to bring forwardmeasurements that can make sense to end users trying to selectbetween various networking subscription options.

4.4.1.Measurement and Metrics Considerations

One important consideration is how decisions can be made and what actionscan be taken based on collected metrics. Measurements must be integratedwith applications in order to get true application views ofcongestion, as measurements over different infrastructure or via otherapplications may return incorrect results. Congestion itself can be atemporary problem, and mitigation strategies may need to be differentdepending on whether it is expected to be a short-term or long-termphenomenon. A significant challenge exists in measuring short-termproblems, driving the need for continuous measurements to ensurecritical moments and long-term trends are captured. For short-termproblems, workshop participants debated whether an issue that goesaway is indeed a problem or is a sign that a network is properlyadapting and self-recovering.

Important consideration must be taken when constructing metrics inorder to understand the results. Measurements can also be affected byindividual packet characteristics -- differently sized packets typically have alinear relationship with their delay. With this in mind,measurements can be divided into a delay based on geographicaldistances, a packet-size serialization delay, and a variable (noise)delay. Each of these three sub-component delays can be different andindividually measured across each segment in a multi-hop path.Variable delay can also be significantly impacted by external factors,such as bufferbloat, routing changes, network load sharing, and otherlocal or remote changes in performance. Network measurements,especially load-specific tests, must also be run long enough to ensurethat any problems associated with buffering, queuing, etc. are captured.Measurement technologies should also distinguish between upstream anddownstream measurements, as well as measure the difference betweenend-to-end paths and sub-path measurements.

4.4.2.End-User Metrics Presentation

Determining end-user needs requires informative measurements andmetrics. How do we provide the users with the service they need orwant? Is it possible for users to even voice their desireseffectively? Only high-level, simplistic answers like "reliability","capacity", and "service bundling" are typical answers given inend-user surveys. Technical requirements that operators can consume,like "low-latency" and "congestion avoidance", are not terms known toand used by end users.

Example metrics useful to end users might include the number of userssupported by a service and the number of applications or streams thata network can support. An example solution to combat networkingissues include incentive-based traffic management strategies (e.g., anapplication requesting lower latency may also mean accepting lowerbandwidth). User-perceived latency must be considered, not justnetwork latency -- user experience in-application to in-serverlatency and network-to-network measurements may only be studying thelowest-level latency. Thus, picking the right protocol to use in ameasurement is critical in order to match user experience (forexample, users do not transmit data over ICMP, even though it is acommon measurement tool).

In-application measurements should consider how to measure differenttypes of applications, such as video streaming, file sharing,multi-user gaming, and real-time voice communications. It may be thatasking users for what trade-offs they are willing to accept would be ahelpful approach: would they rather have a network with low latencyor a network with higher bandwidth? Gamers may make differentdecisions than home office users or content producers, for example.

Furthermore, how can users make these trade-offs in a fair manner thatdoes not impact other users? There is a tension between solutions inthis space vs. the cost associated with solving these problems, as well aswhich customers are willing to front these improvement costs.

Challenges in providing higher-priority traffic to users centersaround the ability for networks to be willing to listen to clientrequests for higher incentives, even though commercial interests maynot flow to them without a cost incentive. Shared mediums in generalare subject to oversubscribing, such that the number of users a networkcan support is either accurate on an underutilized network or mayassume an average bandwidth or other usage metric that fails to beaccurate during utilization spikes. Individual metrics are alsoaffected by in-home devices from cheap routers to microwaves and by(multi-)user behaviors during tests. Thus, a single metric alone or asingle reading without context may not be useful in assisting a useror operator to determine where the problem source actually is.

User comprehension of a network remains a challenging problem.Multiple workshop participants argued for a single number (potentiallycalculated with a weighted aggregation formula) or a small number ofmeasurements per expected usage (e.g., a "gaming" score vs. a "contentproducer" score). Many agreed that some users may instead prefer toconsume simplified or color-coded ratings (e.g., good/better/best,red/yellow/green, or bronze/gold/platinum).

4.4.3.Synthesis Key Points

  • Some proposed metrics:

    • Round-trips Per Minute (RPM)
    • users per network
    • latency
    • 99% latency and bandwidth
  • Median and mean measurements are distractions from the real problems.
  • Shared network usage greatly affects quality.
  • Long measurements are needed to capture all facets of potentialnetwork bottlenecks.
  • Better-funded research in all these areas is needed for progress.
  • End users will best understand a simplified score or ranking system.

5.Conclusions

During the final hour of the three-day workshop, statements that the group deemed to be summary statements were gathered. Later, any statements that were in contention were discarded (listed further below for completeness).For this document, the authors took the original listand divided it into rough categories, applied some suggested editsdiscussed on the mailing list, and further edited for clarity and toprovide context.

5.1.General Statements

  1. Bandwidth is necessary but not alone sufficient.
  2. In many cases, Internet users don't need more bandwidth but ratherneed "better bandwidth", i.e., they need other improvements totheir connectivity.
  3. We need both active and passive measurements -- passive measurementscan provide historical debugging.
  4. We need passive measurements to be continuous, archivable, andqueriable, including reliability/connectivity measurements.
  5. A really meaningful metric for users is whether their applicationwill work properly or fail because of a lack of a network withsufficient characteristics.
  6. A useful metric for goodness must actually incentivize goodness --good metrics should be actionable to help drive industries towardsimprovement.
  7. A lower-latency Internet, however achieved, would benefit all endusers.

5.2.Specific Statements about Detailed Protocols/Techniques

  1. Round-trips Per Minute (RPM) is a useful, consumable metric.
  2. We need a usable tool that fills the current gap between networkreachability, latency, and speed tests.
  3. End users that want to be involved in QoS decisions should be ableto voice their needs and desires.
  4. Applications are needed that can perform and report good qualitymeasurements in order to identify insufficient points innetwork access.
  5. Research done by regulators indicate that users/consumers prefera simple metric per application, which frequently resolves towhether the application will work properly or not.
  6. New measurements and QoS or QoE techniques should not rely only ordepend on reading TCP headers.
  7. It is clear from developers of interactive applications and fromnetwork operators that lower latency is a strong factor in userQoE. However, metrics are lacking to support this statementdirectly.

5.3.Problem Statements and Concerns

  1. Latency mean and medians are distractions from better measurements.
  2. It is frustrating to only measure network services withoutsimultaneously improving those services.
  3. Stakeholder incentives aren't aligned for easy wins in this space.Incentives are needed to motivate improvements in public networkaccess. Measurements may be one step towards driving competitivemarket incentives.
  4. For future-proof networking, it is important to measure theecological impact of material and energy usage.
  5. We do not have incontrovertible evidence that any one metric(e.g., latency or speed) is more important than others to persuadedevice vendors to concentrate on any one optimization.

5.4.No-Consensus-Reached Statements

Additional statements were discussed and recorded that did not have consensus of thegroup at the time, but they are listed here for completeness:

  1. We do not have incontrovertible evidence that bufferbloat is aprevalent problem.
  2. The measurement needs to support reporting localization in order tofind problems. Specifically:

    • Detecting a problem is not sufficient if you can't find the location.
    • Need more than just English -- different localization concerns.
  3. Stakeholder incentives aren't aligned for easy wins in this space.

6.Follow-On Work

There was discussion during the workshop about where future workshould be performed. The group agreed that some work could be donemore immediately within existing IETF working groups (e.g., IPPM,DetNet, and RAW), while other longer-term research may be needed inIRTF groups.

7.IANA Considerations

This document has no IANA actions.

8.Security Considerations

A few security-relevant topics were discussed at the workshop,including but not limited to:

9.Informative References

[Aldabbagh2021]
Aldabbagh, A.,"Regulatory perspective on measuring network quality for end-users",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/2021-09-07-Aldabbagh-Ofcom-presentationt-to-IAB-1v00-1.pdf>.
[Arkko2021]
Arkko, J. andM. Kühlewind,"Observability is needed to improve network quality",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/iab-position-paper-observability.pdf>.
[Balasubramanian2021]
Balasubramanian, P.,"Transport Layer Statistics for Network Quality",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/transportstatsquality.pdf>.
[Briscoe2021]
Briscoe, B.,White, G.,Goel, V., andK. De Schepper,"A Single Common Metric to Characterize Varying Packet Delay",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/single-delay-metric-1.pdf>.
[Casas2021]
Casas, P.,"10 Years of Internet-QoE Measurements Video, Cloud, Conferencing, Web and Apps. What do we need from the Network Side?",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/net_quality_internet_qoe_CASAS.pdf>.
[Cheshire2021]
Cheshire, S.,"The Internet is a Shared Network",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/draft-cheshire-internet-is-shared-00b.pdf>.
[Davies2021]
Davies, N. andP. Thompson,"Measuring Network Impact on Application Outcomes Using Quality Attenuation",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/PNSol-et-al-Submission-to-Measuring-Network-Quality-for-End-Users-1.pdf>.
[DeSchepper2021]
De Schepper, K.,Tilmans, O., andG. Dion,"Challenges and opportunities of hardware support for Low Queuing Latency without Packet Loss",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/Nokia-IAB-Measuring-Network-Quality-Low-Latency-measurement-workshop-20210802.pdf>.
[Dion2021]
Dion, G.,De Schepper, K., andO. Tilmans,"Focusing on latency, not throughput, to provide a better internet experience and network quality",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/Nokia-IAB-Measuring-Network-Quality-Improving-and-focusing-on-latency-.pdf>.
[Fabini2021]
Fabini, J.,"Network Quality from an End User Perspective",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/Fabini-IAB-NetworkQuality.txt>.
[FCC_MBA]
FCC,"Measuring Broadband America",<https://www.fcc.gov/general/measuring-broadband-america>.
[FCC_MBA_methodology]
FCC,"Measuring Broadband America - Open Methodology",<https://www.fcc.gov/general/measuring-broadband-america-open-methodology>.
[Foulkes2021]
Foulkes, J.,"Metrics helpful in assessing Internet Quality",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/IAB_Metrics_helpful_in_assessing_Internet_Quality.pdf>.
[Ghai2021]
Ghai, R.,"Using TCP Connect Latency for measuring CX and Network Optimization",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/xfinity-wifi-ietf-iab-v2-1.pdf>.
[Iyengar2021]
Iyengar, J.,"The Internet Exists In Its Use",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/The-Internet-Exists-In-Its-Use.pdf>.
[Kerpez2021]
Shafiei, J.,Kerpez, K.,Cioffi, J.,Chow, P., andD. Bousaber,"Wi-Fi and Broadband Data",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/Wi-Fi-Report-ASSIA.pdf>.
[Kilkki2021]
Kilkki, K. andB. Finley,"In Search of Lost QoS",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/Kilkki-In-Search-of-Lost-QoS.pdf>.
[Laki2021]
Nadas, S.,Varga, B.,Contreras, L.M., andS. Laki,"Incentive-Based Traffic Management and QoS Measurements",,<https://www.iab.org/wp-content/IAB-uploads/2021/11/CamRdy-IAB_user_meas_WS_Nadas_et_al_IncentiveBasedTMwQoS.pdf>.
[Liubogoshchev2021]
Liubogoshchev, M.,"Cross-layer Cooperation for Better Network Service",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/Cross-layer-Cooperation-for-Better-Network-Service-2.pdf>.
[MacMillian2021]
MacMillian, K. andN. Feamster,"Beyond Speed Test: Measuring Latency Under Load Across Different Speed Tiers",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/2021_nqw_lul.pdf>.
[Marx2021]
Marx, R. andJ. Herbots,"Merge Those Metrics: Towards Holistic (Protocol) Logging",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/MergeThoseMetrics_Marx_Jul2021.pdf>.
[Mathis2021]
Mathis, M.,"Preliminary Longitudinal Study of Internet Responsiveness",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/Preliminary-Longitudinal-Study-of-Internet-Responsiveness-1.pdf>.
[McIntyre2021]
Paasch, C.,McIntyre, K.,Shapira, O.,Meyer, R., andS. Cheshire,"An end-user approach to an Internet Score",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/Internet-Score-2.pdf>.
[Michel2021]
Michel, F. andO. Bonaventure,"Packet delivery time as a tie-breaker for assessing Wi-Fi access points",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/camera_ready_Packet_delivery_time_as_a_tie_breaker_for_assessing_Wi_Fi_access_points.pdf>.
[Mirsky2021]
Mirsky, G.,Min, X.,Mishra, G., andL. Han,"The error performance metric in a packet-switched network",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/IAB-worshop-Error-performance-measurement-in-packet-switched-networks.pdf>.
[Morton2021]
Morton, A. C.,"Dream-Pipe or Pipe-Dream: What Do Users Want (and how can we assure it)?",Work in Progress,Internet-Draft, draft-morton-ippm-pipe-dream-01,,<https://datatracker.ietf.org/doc/html/draft-morton-ippm-pipe-dream-01>.
[NetworkQuality]
Apple,"Network Quality",<https://support.apple.com/en-gb/HT212313>.
[Paasch2021]
Paasch, C.,Meyer, R.,Cheshire, S., andO. Shapira,"Responsiveness under Working Conditions",Work in Progress,Internet-Draft, draft-cpaasch-ippm-responsiveness-01,,<https://datatracker.ietf.org/doc/html/draft-cpaasch-ippm-responsiveness-01>.
[Pardue2021]
Pardue, L. andS. Tellakula,"Lower-layer performance is not indicative of upper-layer success",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/Lower-layer-performance-is-not-indicative-of-upper-layer-success-20210906-00-1.pdf>.
[Reed2021]
Reed, D.P. andL. Perigo,"Measuring ISP Performance in Broadband America: A Study of Latency Under Load",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/Camera_Ready_-Measuring-ISP-Performance-in-Broadband-America.pdf>.
[SamKnows]
"SamKnows",<https://www.samknows.com/>.
[Schlinker2019]
Schlinker, B.,Cunha, I.,Chiu, Y.,Sundaresan, S., andE. Katz-Basset,"Internet Performance from Facebook's Edge",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/Internet-Performance-from-Facebooks-Edge.pdf>.
[Scuba]
Abraham, L. et al.,"Scuba: Diving into Data at Facebook",<https://research.facebook.com/publications/scuba-diving-into-data-at-facebook/>.
[Sengupta2021]
Sengupta, S.,Kim, H., andJ. Rexford,"Fine-Grained RTT Monitoring Inside the Network",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/Camera_Ready__Fine-Grained_RTT_Monitoring_Inside_the_Network.pdf>.
[Sivaraman2021]
Sivaraman, V.,Madanapalli, S., andH. Kumar,"Measuring Network Experience Meaningfully, Accurately, and Scalably",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/CanopusPositionPaperCameraReady.pdf>.
[Speedtest]
Ookla,"Speedtest",<https://www.speedtest.net>.
[Stein2021]
Stein, Y.,"The Futility of QoS",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/QoS-futility.pdf>.
[Welzl2021]
Welzl, M.,"A Case for Long-Term Statistics",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/iab-longtermstats_cameraready.docx-1.pdf>.
[WORKSHOP]
IAB,"IAB Workshop: Measuring Network Quality for End-Users, 2021",,<https://www.iab.org/activities/workshops/network-quality>.
[Zhang2021]
Zhang, M.,Goel, V., andL. Xu,"User-Perceived Latency to Measure CCAs",,<https://www.iab.org/wp-content/IAB-uploads/2021/09/User_Perceived_Latency-1.pdf>.

Appendix A.Program Committee

The program committee consisted of:

Appendix B.Workshop Chairs

The workshop chairs consisted of:

Appendix C.Workshop Participants

The following is a list of participants who attended the workshop over a remote connection:

IAB Members at the Time of Approval

Internet Architecture Board members at the time this document was approved for publication were:

Acknowledgments

The authors would like to thank the workshop participants, the membersof the IAB, and the program committee for creating and participating in many interesting discussions.

Contributors

Thank you to the people that contributed edits to this document:

Authors' Addresses

Wes Hardaker
Email:ietf@hardakers.net
Omer Shapira
Email:omer_shapira@apple.com

[8]ページ先頭

©2009-2026 Movatter.jp