Jeff Huang's Paper Archive

You probably want to go here instead:

Jeff Huang


Graphstract: Minimal Graphical Help for Computers

Abstract

We explore the use of abstracted screenshots as part of a new help interface. Graphstract, an implementation of a graphical help system, extends the ideas of textually oriented Minimal Manuals to the use of screenshots, allowing multiple small graphical elements to be shown in a limited space. This allows a user to get an overview of a complex sequential task as a whole. The ideas have been developed by three iterations of prototyping and evaluation. A user study shows that Graphstract helps users perform tasks faster on some but not all tasks. Due to their graphical nature, it is possible to construct Graphstracts automatically from pre-recorded interactions. A second study shows that automated capture and replay is a low-cost method for authoring Graphstracts, and the resultant help is as understandable as manually constructed help.

Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs

Abstract

Users frequently modify a previous search query in hope of retrieving better results. These modifications are called query reformulations or query refinements. Existing research has studied how web search engines can propose reformulations, but has given less attention to how people perform query reformulations. In this paper, we aim to better understand how web searchers refine queries and form a theoretical foundation for query reformulation. We study users’ reformulation strategies in the context of the AOL query logs. We create a taxonomy of query refinement strategies and build a high precision rule-based classifier to detect each type of reformulation. Effectiveness of reformulations is measured using user click behavior. Most reformulation strategies result in some benefit to the user. Certain strategies like add/remove words, word substitution, acronym expansion, and spelling correction are more likely to cause clicks, especially on higher ranked results. In contrast, users often click the same result as their previous query or select no results when forming acronyms and reordering words. Perhaps the most surprising finding is that some reformulations are better suited to helping users when the current results are already fruitful, while other reformulations are more effective when the results are lacking. Our findings inform the design of applications that can assist searchers; examples are described in this paper.

Conversational Tagging in Twitter

Abstract

Users on Twitter, a microblogging service, started the phenomenon of adding tags to their messages sometime around February 2008. These tags are distinct from those in other Web 2.0 systems because users are less likely to index messages for later retrieval. We compare tagging patterns in Twitter with those in Delicious to show that tagging behavior in Twitter is different because of its conversational, rather than organizational nature. We use a mixed method of statistical analysis and an interpretive approach to study the phenomenon. We find that tagging in Twitter is more about filtering and directing content so that it appears in certain streams. The most illustrative example of how tagging in Twitter differs is the phenomenon of the Twitter micro-meme: emergent topics for which a tag is created, used widely for a few days, then disappears. We describe the micro-meme phenomenon and discuss the importance of this new tagging practice for the larger real-time search context.

Parallel Browsing Behavior on the Web

Abstract

Parallel browsing describes a behavior where users visit Web pages in multiple concurrent threads. Web browsers explicitly support this by providing tabs. Although parallel browsing is more prevalent than linear browsing online, little is known about how users perform this activity. We study the use of parallel browsing through a log-based study of millions of Web users and present findings on their behavior. We identify a power law distribution in browser metrics comprising “outclicks” and tab switches, which signify the degree of parallel browsing. We find that users switch tabs at least 57.4% of the time, but user activity, measured in pageviews, is split among tabs rather than increasing overall activity. Finally, analysis of a subset of the logs focused on Web search shows that while the majority of users do not branch from search engine result pages, the degree of branching is higher for non-navigational queries. Our findings have design implications for Web sites and browsers, search interfaces, and log analysis.

Studying Trailfinding Algorithms for Enhanced Web Search

Abstract

Search engines return ranked lists of Web pages in response to queries. These pages are starting points for post-query navigation, but may be insufficient for search tasks involving multiple steps. Search trails mined from toolbar logs start with a query and con-tain pages visited by one user during post-query navigation. Im-plicit endorsements from many trails can enhance result ranking. Rather than using trails solely to improve ranking, it may also be worth providing trail information directly to users. In this paper, we quantify the benefit that users currently obtain from trail-following and compare different methods for finding the best trail for a given query and each top-ranked result. We compare the relevance, topic coverage, topic diversity, and utility of trails se-lected using different methods, and break out findings by factors such as query type and origin relevance. Our findings demonstrate value in trails, highlight interesting differences in the performance of trailfinding algorithms, and show we can find best-trails for a query that outperform the trails most users follow. Findings have implications for enhancing Web information seeking using trails.

Assessing the Scenic Route: Measuring the Value of Search Trails in Web Logs

Abstract

Search trails mined from browser or toolbar logs comprise queries and the post-query pages that users visit. Implicit endorsements from many trails can be useful for search result ranking, where the presence of a page on a trail increases its query relevance. Following a search trail requires user effort, yet little is known about the benefit that users obtain from this activity versus, say, sticking with the clicked search result or jumping directly to the destination page at the end of the trail. In this paper, we present a log-based study estimating the user value of trail following. We compare the relevance, topic coverage, topic diversity, novelty, and utility of full trails over that provided by sub-trails, trail origins (landing pages), and trail destinations (pages where trails end). Our findings demonstrate significant value to users in following trails, especially for certain query types. The findings have implications for the design of search systems, including trail recommendation systems that display trails on search result pages.

Optimal Strategies for Reviewing Search Results

Abstract

Web search engines respond to a query by returning more results than can be reasonably reviewed. These results typically include the title, link, and snippet of content from the target link. Each result has the potential to be useful or useless and thus reviewing it has a cost and potential benefit. This paper studies the behavior of a rational agent in this setting, whose objective is to maximize the probability of finding a satisfying result while minimizing cost. We propose two similar agents with different capabilities: one that only compares result snippets relatively and one that predicts from the result snippet whether the result will be satisfying. We prove that the optimal strategy for both agents is a stopping rule: the agent reviews a fixed number of results until the marginal cost is greater than the marginal expected benefit, maximizing the overall expected utility. Finally, we discuss the relationship between rational agents and search users and how our findings help us understand reviewing behaviors.