• 7-11 April 1997
    • Santa Clara, CA
    • Frank Fujimoto

    Summary

    HTTP/1.1 is still a hot topic. Cascading Style Sheets are here (with MS “Internet Exploder” 3.0 and Netscape “Confusicator” 4.0) and can be a big win.  XML spec was released at about the same time as the conference, and looks destined to be a big thing. Web Objects got a lot of attention, but have a long way to go before people agree on an approach.  And Bob Metcalfe literally ate his words.

    Things I talk about below are what I was able to see, except papers, where I’ve listed others that look like they could be interesting. I tried to concentrate on going to panels, which proved to be an effective way to get information.

    Tutorials

    Doing Things with the Web: Where Applications Execute

    Not much new in this tutorial. The server portion was biased toward Apache, but he got a couple of the details wrong (the access pass is intended for host-based access control, but Doug implied that it was another part of authentication.

    The above URL has links to the two Client sections, but not to the server section (and the URL given for the server talk is currently an empty file).

    Enriching Document Structure: HTML, CSS, and XML

    • Tim Bray, Textuality
    • Lauren Wood, SoftQuad

    This tutorial could be divided into two broad areas – XML and everything else. Much of the ‘everything else’ section covered how existing methods (extending HTML, both through the W3C and by brute force like Netscape and Microsoft do) either don’t really help to give structure to documents or can’t be widely deployed. Structure in this context means making the documents not only human readable but computer-scannable for keywords, framework, etc.

    Cascading Style Sheets (CSS) were covered here, but they seem to only help for broad style (what H1 should look like, defining several classes of TABLEs, etc.). No real macros are available, but it does help for sets of documents which span several files. Since the style sheets cascade, an example is there could be a general “UW Home Page” style, and if I wanted to add some classes for my web developer documentation I could use that style as a starting point and add my own things.

    The last part of the talk covered XML.  What’s going on with XML and SGML in general is at:

    http://www.w3.org/pub/WWW/MarkUp/SGML/Activity

    XML makes HTML more strict and slightly changes some syntax to make it much more easy to parse. It also adds a concept of multiple links. An example that was outlined was a hyperlink to a word could pop up a menu to let the user choose between a graphic of the word ora glossary entry. Another example is there could be links which are meant to be automatically followed – an application could be nested files, *but* the browser isn’t required to read those contents because of the difficulty it would create in rendering.  I haven’t fully read though the linking spec for XML yet.

    Papers

    Supporting Highly Manageable Web Services

    An extremely distributed method of providing services via a distributed object-oriented mechanism.

    MOWS: Distributed Web and Cache Server in Java

    MOWS modules can be loaded in a distributed manner, even from remote systems, so MOWS can be a cluster of distributed web servers, in addition to proxies. Performance is about on par with Jigsaw, the W3C’s Java (interpreted) server, but quite a bit slower than Apache.  It (and Jigsaw) could get a boost from compiled Java, however.

    Proxy Caching That Estimates Page Load Delays

    Applying new algorithms for what cache entries to keep. Factors include how long it takes to connect to a server, how slow the link to the server is, how often documents are referenced, and how large documents are.

    The first experiment was just basing cache replacement on how long it took to retrieve a document, and performance was worse than Harvest.

    However, basing replacement on all the load delay factors was almost always superior than a LFU or LRU algorithm. Of course, since there are many factors, the weights can be played with quite a bit, thus possibly dramatically changing the algorithm’s behavior.

    In Search of Reliable Usage Data on the WWW

    With the advent of proxy caches getting reliable usage data is near impossible.  What many sites are doing is using “cache-busting” mechanisms to disable caches (either forcing the proxies to never cache ther esults, always using POST which proxies don’t cache, etc.). There’s a proposal for a mechanism to have caches report hits to a server (I mention more about this below).

    Instead of hit metering, this paper suggests that doing samples (i.e. don’t let things be cached for a whole day and extrapolate from there) can help, since it gives temporal data which hit metering does not. One advantage such a scheme has is it’s much easier to implement, but it’s arguably no easier to trust the data over hit metering.

    Hits and Miss-es: A Year Watching the Web

    pointcast.net is responsible for almost 18% of the total bytes on the web.  #2 is home.netscape.com at about 13%, and www.yahoo.com is #3 at 2.5%. espnet.sportzone.com is #5 at 1.13%.  altavista.digital.com only gets 0.31%, #29, but it’s above www.playboy.com.

    Over 70% of the users on the web hit home.netscape.com. In 2nd place is www.yahoo.com at about 19%.

    For use by category (by bytes), computer companies command 33% of the web, and search is at 8%. However, pointcast is its own category, and has 36%.  I haven’t yet read closely enough to figure out why this number is 36% and the other is 18%.

    The paper note that Netscape probably gets lots of hits because the browser automatically goes there.

    30% Accessible – A Survey of The UK Wide Web

    Lots of tables showing what HTTP headers are most often seen (Content-Type is in 99.92% of them; Content-Transfer-Encoding in 0.53%). DTDs aren’t all that common in documents (not too surprising, but XML will start requiring them), and the paper’s conclusion is that many authors don’t use use tools that enforce validation. I didn’t see any mention of offline tools such as weblint.

    The documents surveyed average about 9 <A> anchors per document, and 5 <FONT> definitions (a somewhat scary thought). Seems about 97% or so of the documents have a <HEAD> section – higher than I would have thought.

    40% of all the <IMG> statements had ALT text tags. That seems quite low.  98.38% of the documents had no Java applets, and the most in one page was 7.

    The mWeb Presentation Framework

    Proposes “real-time distribution of HTML pages and synchronization of WWW browsers” using reliable multicast.

    WebCanal: a Multicast Web Application

    Another MBone Web application, this time using LRMP (lightweight reliable multicast transport protocol).

    A Case for Delay-Conscious Caching of Web Documents

    Another proposal for changing cache replacement algorithms to include how long it takes to get a page. This one has fewer factors, however.  Since it’s simpler, it seems more likely to be implemented than the one in paper #250.

    ONE-IP: Techniques for Hosting a Service on a Cluster of Machines

    Clustering at one IP address with IP packet dispatching. The failover mode if the router goes down is having a watchdog and then another router takes over.

    Panels

    Technical Challenges in Securing the Web

    • Peter Neumann (moderator)
    • Li Gong, SunSoft
    • Jim Roskind, Netscape
    • Amir Herzberg, IBM

    The latest direction is finer-grained security. The Java Dev Kit 1.1 will eventually include this new scheme which lets you define security levels on a file-by-file level, etc.

    Both SunSoft and Netscape will be coming out with CA servers.

    Managing a Large Web Site

    • Paul Jones, Pratik Patel (moderators)
    • Robert Andrews, Netscape
    • David Owczarek, NetDaemon Associates
    • Wesley Kronick, MSNBC
    • Jason Priebe, WRAL TV, Raleigh
    • Kirstine Loosley, Connectric
    • NetDaemon runs www.monster.com, a job posting and hiring board. Connectric is an ISP.

    Raw numbers

    Netscape

    131 million hits/day.

    5.5 million on their home page.

    Week of (at least that’s what my notes say, rather than day of) 3/20: 780 million hits, 1.4Tb via HTTP, 2.9Tb via ftp.

    www.monster.com

    10x increase in last year.

    1 million hits/day.

    100,000 searches/day.

    MSNBC

    Peaks are about 20x of normal loads.

    MS data center (where MSNBC is located) outputs 1 Tb/day.

    22,000 concurrent users (didn’t explain, but my guess is how many people navigate among the pages at a time, not how many simultaneous hits they get), plan to scale to 100,000.

    4,500 simultaneous RealAudio sessions, plan to scale to 10,000.

    WRAL

    1.5 million hits/week.

    During Hurricane Bertha, got 3 million hits/day.

    Connectric

    200,000 customers.

    3 million hits/day.

    After the HigherSource people shed their containers, that site got 9 million hits in 12 hours.

    Infrastructure

    Netscape

    4 DS3’s using Cisco 7505’s (one of them is named BillGate).

    Robert Andrews had a serial link to behind the Netscape firewall, so he showed us live graphs.

    home.netscape.com core is a big system from every platform that their server runs on.

    Many sattellite systems to do things like CGI, FTP.

    Source from a common host and rdist out, with the ability to push things out by hand on a file-by-file or dir-by-dir basis (from what info he gave, exactly like info.cac).

    Content developers use FrontPage :)

    Recommends to buy systems to minimize the admin FTE required.

    Where it makes sense to scale big instead of wide, then buy big (DB, etc.), but where it makes sense to spread wide for bandwidth and horsepower (CGI processing), then buy smaller.

    www.monster.com


    Resumes, etc. are on Oracle DB.

    Use Open Market server and FastCGI.

    Recommends to take things like DB licensing into account when buying big/few or small/many systems – Oracle costs them $35k/cpu.

    MSNBC

    Lots of little systems with tasks as isolated as possible.

    Update content serially, so the last host gets content about an hour after the first.

    They also use Cisco 7505’s.

    Recommends splitting services out.

    WRAL

    Presentation mostly concentrated on content.

    Connectric

    Their sysadmin recommended buying one of the biggest box you can get (!)

    Important to have content providers have control over what goes out when (content developers now control the documents on the staging system).

    Issues

    • Parallel document propagation is a big win – the MSNBC guy really wants it.
    • If you’re using an ISP, make sure they understand what it means to host a big site. The monster.com guy had to make many trips to the ISP.
    • Those sites conneted with broadcasting have a hard time convincing those folks that 24×7 100% reliability is not asily attainable in the web world.

    Good Web Design: An Essential Ingredient!

    • Nahum Gershon (moderator)
    • Nick Ragouzis, Enosis
    • Jakob Nielsen, SUN
    • Bill Spurlock, HTML Writers Guild
    • David Siegel, Studio Verso
    • ???

    (There isn’t very good information in the WWW6 program or online documents to describe who eventually ended up on a panel, or where they’re from.)

    This was a very lively panel, with panelists arguing with audience members and among themselves.

    Much discussion on whether there should be standards of design, such as “Put your search button here”, etc. It was observed that such standards cannot be applied among different organizations, let alone how difficult it is to do that among different groups in the same organization.  Also, for many graphic designers it’s knowing when to break the rules that counts.

    Animated GIFs were also discussed, and for the graphic designers on the panel they can be a useful tool.  Someone commented that Netscape was reported to be working on animated backgrounds, but the latest version apparently doesn’t do it. Nahum guessed that Netscape was trying to give Microsoft a bum steer.

    It seemed agreed that a having cohesive, consistent site that conveys a mood relevant to the content is a good thing.  Some examples were a site for a flipbook artist which had small animated GIFs of a book with quickly-turning pages, and an overall light feel. Another was a site for a photographer that looked like a negative (but of course was really a positive).  The designer of the last example also noted small touches like absolute position (”Go back to frame 3 – This is frame 4 – Go foward to frame 5″ rather than “Prev – Next”) help prevent a user from feeling lost.

    What is the Object Model of the Web, and What Should It Be

    • Andrew Watson (moderator)
    • Peter Kessler, SUN
    • Jens Christensen, Visigenic
    • Scott Emigh, Antares Alliance Group
    • Bill Janssen, PARC

    The panel ended up dividing between two models – Java as objects and a IIOP (using CORBA as a base).

    Java is here today, and the extensions for objects seem pretty easy to implement.  However, HTTP connections are expensive to build and break down, so over lots of transactions it becomes a big factor.

    IIOP can be up to 200x as efficient as HTTP, but it requires that the initial connection be made, which can be expensive for relativey few transactions.

    This ended up being more a debate over protocol (HTTP vs. IIOP) rather than different object technologies.

    Developer’s Day

    Transparant Content Negotiaion

    We have Content Negotiation with HTTP 1.1, but it’s not very elegant. Since the browser doesn’t know what kind of content the server is going to send, it needs to tell the server all of its preferenes (for images, GIF has a weight of 1.0, JPEG 0.9, PNG 0.8; for text, English 1.0, French 0.9, German 0.8, etc.). TCN allows the browser to send a few general preferences and if the server doesn’t find a match, it will return a list of what’s available, and the browser (or user) can select which is best.  This can also apply to things like displayable PostScript file vs. one that’s tuned to printing.

    Protocol Extension Protocol

    The current ways to extend HTTP are tenuous at best, since you don’t know who supports what, and if you blindly add HTTP headers they may conflict with other extensions. PEP defines a way to not only say “Use this extension” but “here’s where this extension is defined”. The definition is a URL which points to either a machine-readable definition (Java code, DTD, etc.) or something only meant to be viewed by a human. I’m pretty dubious about this extension, as are other HTTP 1.1 Working Group members, because of how nebulous this feature is. Also, it only really works if the client wishes to see if a server knows about a feature. There isn’t a good way for the server to query the client if it supports something.

    Feature Tags

    A mechanism to let an HTTP server know the features that a browser supports, or even user preferences.

    Hit Metering

    This is a very promising extention to HTTP so servers can know how many hits a cache is getting for a page. The proxy server is supposed to keep track of how many hits it has gotten for a page in its cache, and either when it needs to check with the server (or its parent) if the page has changed or if it needs to flush the hit data from its cache, it notifies the server (or parent) of how many hits it has on that URL. Since all the hit reporting is done either piggyback to an existing request or out of band from normal requests, it has minimal impact.

    Downsides: this is a “best try” protocol – the proxy is supposed to try to notify the server, but if the notification fails, the proxy is encouraged but not required to retry.

    Since all the proxy tells the server is how many hits it had saved up, the server doesn’t know when those hits occurred other than between the last time the page was served to that proxy and the current time.  Also, no data about where the hit came from can be discerned.

    Improving Web Performance with HTTP 1.1

    (At the end of that URL is a link to what used to be the paper on which this presentation was based, but the page is no longer there. Hopefully they’ll update NL-PerfNote.html eventually).

    Persistent Connections (Keepalives) and Pipelining are Good Things. When benchmarking an HTTP 1.1 client and server using one connection with Pipelining against an HTTP 1.0 client and server using 6 parallel connections, HTTP 1.1 wins in all categories – # packets (many fewer), average packet size (much larger), wall clock time (faster), # bytes (fewer), # TCP packets which are just for handshaking (far fewer).

    Smaller wins can be had with other things, like removing the tags that FrontPage and Netscape Gold put into a page which say that those editors touched the page (which has no effect on rendering).

    By using gzip compression (HTTP 1.1 lets you specify an alternate encoding) modest improvements can be had over Pipelining. Jim even said that if you use lowercase HTML tags, they’re more likely to appear in the text and improve compression even more. In response to the smirks, he said essentially “hey, add up all these little improvements and you have something significant”. My guess is you’d need *lots* of them to make an impact :)

    The first packet you send out is the most important – if you have something useful that the browser can render, the better off you are because of TCP slow start.

    Disable TCP buffering if you can, since your server most likely already does that (Apache indeed does), and it knows better when buffers should be flushed.

    HTTP 1.x Developer’s Panel

    • Larry Masinter, SUN (moderator)
    • Henrik Frystyk Nielsen, W3C
    • Koen Holtman, CERN
    • Jeffrey Mogul, DEC WRL
    • Andy Mutz, HP
    • Jim Gettys, DEC, W3C
    • Roy Fielding, UCI

    Roy Fielding is also an Apache developer.

    Much of what people wanted for HTTP 1.1 was thrown out because the WG wanted to get a draft out in a reasonable amount of time. They essentially punted on security except for Digest Authentication, which they acknowledge is not really great, but it’s better than Basic Authentication (passwords don’t go in the clear with digest).

    They felt they wasted a lot of time until they agreed to nail down terminology – person A would be talking about frizzbats to person B, not knowing the other person assumed a frizzbat was a completely different thing. Once they all agreed on what a frizzbat was, things went much more smoothly.

    Internationalization was a big topic this year, and Transparent Content Negotiation is seen as helping with this. However, Larry kept pushing back that HTTP 1.1 as it currently stands already handles internationalization. Sure, it may be ugly, but it does work. His point was that “We need TCN because we can’t have internationalization without it” is a false argument to him, but ”We need TCN because otherwise browsers and servers can’t deal with the 10K of headers needed to describe preferences” has merit. He got a few other WG members arguing with him on TCN along the way.

    Apache BOF

    1.2b8 was released during WWW6.

    They’re thinking of a protocol abstraction layer, which would make SSL easier to implement.

    However, they have to be very careful to not specifically do things so that SSL can be integrated, because of export restrictions.

    A c2.net representative (the person who does most of the coding) was there and he said RSA is clever enough to not be fooled by ”We’ll give you xx% of all our revenues if you license to us” - they require a minimum $$ amount.

    The CGI.pm developer was there, and he’s done a few workarounds to help fix Apache from going into an infinite loop under certain conditions. Apache itself should have that bug fixed by now according to the developers.

    No interest in IIOP as a native protocol.

    Not much interest in a GUI interface for configurations, especially one that supports all of Apache’s features.

    The feature that will differentiate 2.0 from previous versions is threads.

    They really don’t want to do a 1.3, and will not do a 1.2.1 unless there’s a really strong reason to do so.

    Some companies are willing to let employees work on Apache because they provide web site hosting based on Apache and custom value-added modules. Since they need Apache to do the value-added stuff for their customers, they’re willing to donate time to the project.

    Someone suggested making a DTD for the Apache config files and then and SGML-compliant editor can be used to edit them. Roy Fielding asserts that writing a web server is easier than writing an SGML editor.

    A NASA site has 8 CPUs running Apache with a max of 200-300 clients, since it gets busy when the shuttle launches. However, Apache has never run on the shuttle (they use x286 technology), not even on a laptop.

    The core Apache developers rarely meet – most everything is done online.

    Speakers

    Gerhard Casper, President, Stanford

    Welcomed people to visit Stanford at www.stanford.edu, or US101, or I280.  Considering how things go, he said US101 may be faster.

    Mae Jemison, the Jemison Group

    Was first female African-American Astronaut on the shuttle.

    Has a vision that science – including social and political science – can go a long way to helping solve problems in the world.

    Thomas Kalil, National Economic Council

    Outlined backing for a web accessibility infrastructure, including people with disabilities and people who would normally not have access to the web.

    Also outlined K12 efforts.  The term “K to Gray” also came up.

    Howard Rheingold, Electric Minds

    Takes a while to get past his clothes – he wore a bright red jacket over a blue shirt, and bright purple pants.

    The internet is a powerful extension of community. Electric Minds is like a big set of newsgroups, with a lot of effort put into making it easy to interact with other people.

    Ted Nelson, Xanadu

    People who run proxy servers are “proxologists” :)

    Hypertext is still his big interest (vs. the web).

    Bob Metcalfe, InfoWorld

    Thinks the killer app on the web will be tele-presense.

    Provided a huge cake for people in the audience, which was decorated like one of his InfoWorld articles.

    At the Boston conference he promised to eat his article saying 1996 would be the year the web collapsed if it didn’t come true. It didn’t, so he did.

    Shigeki Goto, Waseda University

    Talked about how nationalization has immensely helped in getting people to use Usenet and the web.

    Did an excellent reply to a question from the audience of “Why don’t you just use the roman alphabet?” Essentially he said that yes, it may be easier overall, but the truth is the vast majority of things that are done electronically are in native scripts, and in order to get practical usage penetration, using native scripts is a necessity.

    Other speakers were John Gage from SUN, Tim Berners-Lee from W3C, Gregg Vanderheiden from the Trace Center, and Doug Engelbart.

    Vendors and Posters

    Not many free handouts :) HP demonstrated an E-size color inkjet printer which was extremely impressive. Using the special paper, you had to look pretty closely to see that it wasn’t a dye-sub printer. They also had a single-sheet version. The trick is to use both CMY and RGB cartridges.

    Microsoft had a big presense, and Netscape wasn’t in the vendor exhibition.

    This time they mixed the poster presentations in with the vendors, and I think it worked pretty well. Of course, the posters were mostly either things that are done on the client side or things which solved a very specific problem.

    During the presentation of the Yuri Rubinsky Memorial WWW Award, received by Gregg Vanderheiden, a video was shown demonstrating the Trace Center’s kiosk which can be used by the blind, vision-impaired, or people who can’t read. There was a sample of one of these kiosk’s in the Vendor area.

    Misc. Thoughts

    For me this was much more useful than WWW4 in Boston, but I think part of it was I concentrated more on going to panels instead of papers.

    Unfortunately it was hard to find good information about several things, like where lunch was (and even that there was a buffet lunch at all), what time things ended, what terminal rooms were working, etc.

    The online Interactive Conference Environment was a pretty good first pass at having all relevant information on the web, but unfortunately is pretty cumbersome to use.

    Some people got stuck on one of the rides at Great America. Unfortunatelyfor them the ride was an elevator drop-type ride, so they were at the top of this big tower – one guy for about 10 minutes.

    Having a conference at the Santa Clara Convention Center is pretty tough for people who don’t know the area – there aren’t a lot of places to eat within walking distance, nor are there many things to do (except Great America, which was closed during the week except for the night WWW6 was there).

    I talked briefly with Chris Quinn, one of the co-chairs (I know her because she started working for the group at Stanford I worked in not long after I left for HP), and she looked quite harried, but was much more relaxed by Friday.

    Matt Freedman and Pete Libbey also went to JavaOne, and said there was a whole slew of NC’s there for people to use, and a few PCs. It was no trouble to find a free NC – apparantly they were excruciatingly slow.