Movatterモバイル変換


[0]ホーム

URL:


W3C

Web Characterization:

From working group to activity

W3C Note Mar 19 1999

This version:
http://www.w3.org/TR/1999/NOTE-WCA-19990319
Latest version:
http://www.w3.org/TR/NOTE-WCA
Editors:
Jim Pitkow <pitkow@parc.xerox.com>, Xerox PARC
Johan Hjelm <hjelm@w3.org>, W3C/Ericsson
Henrik Frystyk Nielsen, <frystyk@w3.org>, W3C

Copyright ©1998W3C (MIT,INRIA,Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rulesapply. Your interactions with this site are in accordance with ourpublic and Member privacystatements.

Status of this document

This document is a W3C Note reporting on the results of the HTTP-NG WebCharacterization Group and the structure of the Web Characterization Activity.The work which was part of theW3CHTTP-NG Activity, phase I, is now continued in theWeb Characterization Activity.

Review comments on this document should be sent to <www-wca@w3.org> which is thearchived email listfor theWeb Characterization Activity. Information on howto subscribe to public W3C email lists can be found atthe subscription request page.

This document is a NOTE made available by the W3C for discussion only.This indicates no endorsement of its content, nor that the Consortium has, is,or will be allocating any resources to the issues addressed by thisNOTE.

Table of Content

Abstract
1. The HTTP-NG Web Characterization Group
1.1 Mission statement
1.2 Participants
1.3 Deliverables and Accomplishments
2. The Web Characterization Activity
2.1 The structure of the Activity
3. Example characterizations
3.1 The HTTP-NG testbed
4. WCG papers
5. Summary

Abstract

This document describes the experiences and results that came out of the WebCharacterization Group as part of the W3C HTTP-NG Activity, and how that workis now continued in the Web Characterization Activity.

The HTTP-NG Working Group created a series of scenarios for the HTTP-NGprotocol design group, which were implemented in the scope of the HTTP-NGtestbed, and used to optimize its design.

The WCA started in November 1998, and will bring that work model to a wideraudience.

1. Introduction

Web Characterization is concerned with looking at the overall patterns of Webstructure and usage by measuring such aspects as server access patterns, thekind of data being accessed, bytes transferred, popularity of resources, etc.By better understanding the dynamics of the Web and how it grows we believethat W3C and the Web Community in general will be better suited to evolve theWeb and to ensure its long term interoperability and robustness.

The purpose of the Activity is to define and implement a scalable mechanismfor gathering data, boiling it down and to presenting it in efficient ways tocontent providers, service providers, user groups, researchers and technologydesigners and other groups.

The information used to characterize the Web is strictly concerned withgeneral patterns of Web usage and does not focus on specific users or Websites. The scope of this Activity is to characterize the Web as a distributedsystem and not on an individual basis.

1.1 Mission Statement

The HTTP-NG Web Characterization Group was chartered in August 1997 as a partof the HTTP-NG Activity. Its intent was to create a stable and comprehensiveplatform of knowledge and analysis of the Web, to enable the protocoldesigners to create a relevant and well-instructed solution. Previously,analysis of user behavior on the Web has often been based on spurious data,gathered in an ad-hoc manner. The HTTP-NG Web Characterization Group was anattempt at rectifying this.

It was set up to fulfill four primary goals:

  1. To respond to the questions raised by the HTTP-NG Protocol Design Groupregarding current usage of the World Wide Web.
  2. To design and develop representative scenarios for use in the HTTP-NG testbed.
  3. To make recommendations to the Protocol Design Group in issues concerning Webusage and characterization methods.
  4. To devise a system and a methodology to make characterization of the Webeasier and more reliable in the future.

1.2 Participants

The group consisted of members from Boston Universities Ocean group, HarvardColleges Vino group, INRIA, Microsoft, Netscape, Virginia Techs NetworkResource Group, and Xerox Parcs Webology group. Jim Pitkow, Xerox Parc,chaired the group.

1.3 Deliverables and Accomplishments

The HTTP-NG WCG has leveraged and helped focus existing research programs,which the group considers one of its major accomplishments.

During its charter, the group has responded to the questions of the HTTP-NGProtocol Design Group. This has been influential in the design of the HTTP-NGprotocol. It has also created the HTTP-NG testbed, which operates by usingSURGE (Scalable URL Generator) from Boston University Ocean Group. Scenarioparameters derived from observed statistical regularities in the distributionof file sizes, reading times, and other metrics, were used to simulateclient traffic in the testbed. SURGE used some aspects of Web traffic whichwere not taken into account by then current traffic generators.

StatusDate accomplishedDeliverable
DoneOct. 2-3, 1997First face-to-face meeting
DoneNov. 1, 1997Identification of classification parameters for Web categorization
DoneDec. 8, 1997  Plan for response to HTTP-NG Protocol Design Group questions
DoneDec. 31, 1997Initial response to HTTP-NG PDG questions
DoneFeb. 7, 1998Final response to HTTP-NG PDG questions
DoneMarch-April 1998Trace analysis for scenario building, refined testbed software
DoneApril 24, 1998Extended scenarios, refined testbed software
Moved to WCADefinition of new log file format
Moved to WCARecommendations for automatic re-sampling
DoneJune 24, 1998Project evaluation

The group has completed all the original requirements, with the exception ofthe redesign of the Common Log File Format and the recommendations forautomatic re-sampling of the Web, which has been moved to the WebCharacterization Activity.

2. The Web Characterization Activity

The W3C Web Characterization Activity was started in November 1998 with aworkshop, gathering some 50 persons interested in the subject. Subsequently, aworking group and an interest group has been started.

The purpose of the Activity is to define and implement a scalable mechanismfor gathering data, boiling it down and to presenting it in efficient ways tocontent providers, service providers, user groups, researchers and technologydesigners and other groups.

The information used to characterize the Web is strictly concerned withgeneral patterns of Web usage and does not focus on specific users or Websites. The scope of this Activity is to characterize the Web as a distributedsystem and not on an individual basis.

The Web Characterization Group in the HTTP-NG Activity was a first phase inthis project. It was completed in August 1998, and phase 2 begun. Its focus isto extend the Web Characterization work and to create an active knowledge basecontaining up-to-date information about the Web by broaden the scope of Webcharacterization, and providing information and test scenarios for the W3CMembership and the Web community in general about the Web and its use, bothnow and in the near future.

An important result of WCG is the identification of the three key groups inthe characterization work and how they interact:

WCA org chart

Bulk Data Providers

The Bulk Data Providers are typically server maintainers and ISPs providingserver and proxy logs but can also be backbone providers gathering informationdirectly from the Net or users running instrumented Web clients etc. Becauseof privacy concerns and because of the sheer size of log files, it is oftenpreferred to have data providers running a set of characterization toolslocally so that only the boiled down data sets and profiles are released.

The W3C Characterization Working Group

The WCG develops and maintains a set of characterization tools used by thedata providers and defines the mechanism for exchanging boiled down data setsand profiles with the data providers in order to maintain confidentiality andtrust. The collected data sets are used to develop characterization models andto provide characterization data to the third group, the reduced dataconsumers.

Reduced Data Consumers

The reduced data consumers use the profiles and data sets provided by the WCGand provide feedback and new questions to be asked. Primary data consumers areexpected to be content providers, service providers, user groups, researchersand technology designers.

2.1 The structure of the Activity

The format for this Activity is to let the interaction between the reduceddata consumers and bulk data providers take place through an Interest Group,with a new Web Characterization Working Group (WCG) functioning as themediator, provider of analysis tools and disseminator of characterizationinformation.

Web Characterization Interest Group

The role of the Interest Group is to be a discussion forum for bulk dataproviders and reduced data consumers, and to provide requests and feedback tothe Working Group. It is expected that the tools and dissemination mechanismproduced by the Working Group will benefit from a feedback mechanism with itsimmediate users, as well as their continuous review. All work will bediscussed on the Web Characterization Activity Forum.

Participation in the Interest Group is open to everybody.

Web Characterization Workshop

The Activity was kicked off by the Web Characterization Workshop, November 5,1998 in Boston, MA, with the intent of bringing together both W3C Members andWeb characterization experts. As a results of the Workshop, the Interest Groupwas formed, and several organizations who wanted to participate in the WorkingGroup were identified.

Web Characterization Working Group

The WCG is intended to work using a request/response based model similar tothe one  used in the HTTP-NG Activity. Requests will be formally issuedby the Interest Group and by W3C Activities and the WCG will respond withrealistic time lines for when and how results can be made available.

The WCG will start its work by formally soliciting requests forcharacterization data needed by other W3C Working Groups and Activities. Thesolicitation process is intended to occur at six-month intervals, enough timefor the Working Group to understand and respond to the requests of the otherW3C Groups. Requests from the Interest Group will be dealt with on a case bycase basis. All work will be discussed on the Web Characterization ActivityForum.

The working group has the following participants:

NameAffiliationFunction in the WCA
Marc AbramsVirginia tech
Martin F. ArlittHP Labs
Paul BarfordBoston University
Pei CaoUniversity of Wisconsin
Anja FeldmannAT&T Research Labs
Edward A. FoxVirginia Tech
Johan HjelmEricsson/W3CInterest Group Chair
Balachander KrishnamurthyAT&T Research Labs
Jim GettysW3C/Compaq
Joe MeadowsBoeing
Henrik Frystyk NielsenW3CW3C Staff Contact
Ed O'NeillOCLC
Jim PitkowXerox PARCWorking Group Chair

Further information about the work in progress can be found at theWeb Characterization Activity Home Page

3. Example Characterizations

The following are examples of some of the findings of the HTTP-NG WCG andother researchers in the field of Web Characterization. This is by no meansmeant to be neither a complete listing of the findings of the HTTP-NG WCG, nora representative sample of research in the field. Rather it contains resultsthat the group found provocative and representative of the types of questionsthe HTTP-NG WCG found to be of interest.

Links vs. Servers

Alexa Internet and WCG Analysis of AOLData - December 1997

 

Number of Web Servers

Source:W3C, Mark Gray,Netcraft Server Survey

3.1 The HTTP-NG testbed

The HTTP-NG testbed was designed for the specific purpose of making reliableand convincing claims that the performance of HTTP-NG would be comparable toprior HTTP implementations. It was designed in close cooperation with theHTTP-NG Protocol Design Group.

An analysis of the current practice in load generation tools left the HTTP-NGWCG concerned with the representativeness of the traffic being generated.

Essentially, three types of traffic generation models exist: Stress testing,trace replay, and statistically derived models. Many current trafficgenerators follow the first model, by varying the number of requests persecond that are issued to the server. While this approach does test thecapacity of the server as measured by the number of HTTP operations persecond, it does not produce traffic patterns that have actually beenobserved.

The second model for traffic generation utilizes packet traces collected fromvarious servers and protocol analyzers. If this method had been used in thetest bed, the group would have had to acquire traces from representativeservers. Apart from determining what is representative, it also presents theproblem of which servers to include, and obtain permission to use their logfile information. Each Web site will also need to be recreated, due to e.g.the effect of the file system configuration on performance.

Consequently, the group selected to statistically model HTTP traffic. Theusers were segmented into three strata: Corporate users, ISP users, andeducational users. To create models for the behavior of each strata, the groupobtained full log files from America Online (major ISP), AltaVista (searchengine/mixed user group), and Boston University (educational users). FromMicrosoft (Corporate usage) a distribution of usage was obtained. All datasets except for the AltaVista data were used to generate scenarios for thetestbed. The log file analysis tools used were based on the prior work of thegroup members, and the personal connections of the group members wereinstrumental in obtaining these data sets.

The HTTP-NG testbed is designed as the diagram below shows:

HTTP-NG testbed

The HTTP-NG testbed was thus able to take both network characteristics anduser behavior into account, inserting a simulated network between the robotsimulating the client and the server. The statistical traffic generator takesa set of parameters to create a mock server with the associated file system,and a set of simulated clients that make statistically based requests forfiles.

The model characterizes sites as containing Web pages with embedded media andWeb pages without embedded media. Using a model that characterizes pages,rather than just objects, makes alteration in the composition of sites easier.This facilitates determining the effect of new technologies, like CascadingStyle Sheets (CSS).

4. WCG Papers

Throughout the year of the WCG's existence, various group members havecontributed papers, articles, and presentations to the group and the Webcharacterization community. Given the limited focus of the HTTP-NG projecteffort, it is not surprising that these items are focused on characterizationsand representative testbed designs.

Author(s)Papers, Articles, NotesDate Published
Jim PitkowW3C Note: HTTP-NG WCG StatusReportJuly 1998
Jim PitkowSummaryof WWW Characterizations
Paper at WWW7 
April 1998
Huberman, Pirolli, Pitkow and LukoseStrong Regularitiesin World Wide Web Surfing(PDF format)April, 1998
Barford and CrovellaGeneratingRepresentative Web Workloads for Network and Server PerformanceEvaluation(Postscript format)November, 1997
Manley, Courage and SeltzerASelf-Scaling and Self-Configuring Benchmark for Web ServersNovember, 1997
Manley and SeltzerWeb Fact andFantasyOctober, 1997
Abdulla, Fox and AbramsShared User Behavior onthe World Wide WebOctober, 1997

5. Summary

The group has achieved its objectives, creating feedback for the HTTP-NGProtocol Design Group by answering the questions this group had about the Web,and by creating the HTTP-NG testbed, which enabled the creation of anoptimized and efficient design of the next generation of the HypertextTransfer Protocol. The Web characterization work is now being continued in theWeb Characterization Activity.


Jim Pitkow, Xerox PARC, Johan Hjelm, Ericsson/W3C, Henrik Frystyk NielsenW3C,
@(#) $Id: NOTE-HTTP-NG-WCG-19990104.html,v 1.11 1999/01/04 23:06:42 frystykExp $
[8]
ページ先頭

©2009-2025 Movatter.jp