TL;DR: A knowledge driven, highly interactive dialog that seamlessly combines reactive search and proactive suggestion experience, and a proactive heterogeneous entity recommendation are demonstrated.
Abstract: In this paper we describe a new release of a Web scale entity graph that serves as the backbone of Microsoft Academic Service (MAS), a major production effort with a broadened scope to the namesake vertical search engine that has been publicly available since 2008 as a research prototype. At the core of MAS is a heterogeneous entity graph comprised of six types of entities that model the scholarly activities: field of study, author, institution, paper, venue, and event. In addition to obtaining these entities from the publisher feeds as in the previous effort, we in this version include data mining results from the Web index and an in-house knowledge base from Bing, a major commercial search engine. As a result of the Bing integration, the new MAS graph sees significant increase in size, with fresh information streaming in automatically following their discoveries by the search engine. In addition, the rich entity relations included in the knowledge base provide additional signals to disambiguate and enrich the entities within and beyond the academic domain. The number of papers indexed by MAS, for instance, has grown from low tens of millions to 83 million while maintaining an above 95% accuracy based on test data sets derived from academic activities at Microsoft Research. Based on the data set, we demonstrate two scenarios in this work: a knowledge driven, highly interactive dialog that seamlessly combines reactive search and proactive suggestion experience, and a proactive heterogeneous entity recommendation.
TL;DR: Based on analysis of screen sequences, there was little evidence that search became more directed as screen sequence increased, and navigation among portlets, when at least two columns exist, was biased towards horizontal search (across columns) as opposed to vertical search (within column).
Abstract: An eye tracking study was conducted to evaluate specific design features for a prototype web portal application. This software serves independent web content through separate, rectangular, user-modifiable portlets on a web page. Each of seven participants navigated across multiple web pages while conducting six specific tasks, such as removing a link from a portlet. Specific experimental questions included (1) whether eye tracking-derived parameters were related to page sequence or user actions preceding page visits, (2) whether users were biased to traveling vertically or horizontally while viewing a web page, and (3) whether specific sub-features of portlets were visited in any particular order. Participants required 2-15 screens, and from 7-360+ seconds to complete each task. Based on analysis of screen sequences, there was little evidence that search became more directed as screen sequence increased. Navigation among portlets, when at least two columns exist, was biased towards horizontal search (across columns) as opposed to vertical search (within column). Within a portlet, the header bar was not reliably visited prior to the portlet's body, evidence that header bars are not reliably used for navigation cues. Initial design recommendations emphasized the need to place critical portlets on the left and top of the web portal area, and that related portlets do not need to appear in the same column. Further experimental replications are recommended to generalize these results to other applications.
TL;DR: In this article, a user interface is described wherein information relating to a respective one of the search results are displayed on a map upon selection of at least one component of the respective search result.
Abstract: A user interface is described wherein information relating to a respective one of the search results are displayed on a map upon selection of at least one component of the respective search result.
TL;DR: In this article, the authors survey the web sites of the academic libraries of the Association of Research Libraries (USA) regarding the adoption of Web 2.0 technologies and find that most libraries were using these tools for sharing news, marketing their services, providing information literacy instruction, and soliciting feedback of users.
Abstract: Purpose – This paper aims to survey the web sites of the academic libraries of the Association of Research Libraries (USA) regarding the adoption of Web 2.0 technologies.Design/methodology/approach – The websites of 100 member academic libraries of the Association of Research Libraries (USA) were surveyed.Findings – All libraries were found to be using various tools of Web 2.0. Blogs, microblogs, RSS, instant messaging, social networking sites, mashups, podcasts, and vodcasts were widely adopted, while wikis, photo sharing, presentation sharing, virtual worlds, customized webpage and vertical search engines were used less. Libraries were using these tools for sharing news, marketing their services, providing information literacy instruction, providing information about print and digital resources, and soliciting feedback of users.Originality/value – The paper is useful for future planning of Web 2.0 use in academic libraries.
TL;DR: This work argues that it is risky to determine whether pages share a common template solely based on URLs and proposes a new approach that utilizes similarity between pages to detect templates, which is feasible and effective for improving extraction accuracy.
Abstract: Many websites have large collections of pages generated dynamically from an underlying structured source like a database. The data of a category are typically encoded into similar pages by a common script or template. In recent years, some value-added services, such as comparison shopping and vertical search in a specific domain, have motivated the research of extraction technologies with high accuracy. Almost all previous works assume that input pages of a wrapper induction system conform to a common template and they can be easily identified in terms of a common schema of URL. However, we observed that it is hard to distinguish different templates using dynamic URLs today. Moreover, since extraction accuracy heavily depends on how consistent input pages are, we argue that it is risky to determine whether pages share a common template solely based on URLs. Instead, we propose a new approach that utilizes similarity between pages to detect templates. Our approach separates pages with notable inner differences and then generates wrappers, respectively. Experimental results show that our proposed approach is feasible and effective for improving extraction accuracy.