TL;DR: A new hypertext resource discovery system called a Focused Crawler that is robust against large perturbations in the starting set of URLs, and capable of exploring out and discovering valuable resources that are dozens of links away from the start set, while carefully pruning the millions of pages that may lie within this same radius.
TL;DR: It is argued that that a keywordbased “find similar” search based on a giant all-purpose crawler is neither necessary nor adequate for resource discovery, and instead the properties that pages tend to cite pages with related topics are exploited.
Abstract: We describe the architecture of a hypertext resource discovery system using a relational database. Such a system can answer questions that combine page contents, metadata, and hyperlink structure in powerful ways, such as “find the number of links from an environmental protection page to a page about oil and natural gas over the last year.” A key problem in populating the database in such a system is to discover web resources related to the topics involved in such queries. We argue that that a keywordbased “find similar” search based on a giant all-purpose crawler is neither necessary nor adequate for resource discovery. Instead we exploit the properties that pages tend to cite pages with related topics, and given that a page u cites a page about a desired topic, it is very likely that u cites additional desirable pages. We exploit these properties by using a crawler controlled by two hypertext mining programs: (1) a classifier that evaluates the relevance of a region of the web to the user’s interest (2) a distiller that evaluates a page as an access point for a large neighborhood of relevant pages. Our implementation uses IBM’s Universal Database, not only for robust data storage, but also for integrating the computations of the classifier and distiller into the database. This results in significant increase in I/O efficiency: a factor of ten for the classifier and a factor of three for the distiller. In addition, ad-hoc SQL queries can be used to monitor the crawler, and dynamically change crawling strategies. We report on experiments to establish that our system is efficient, effective, and robust.
TL;DR: A number resource management system for automatically managing number resources implements a Web-based client-server application for accessing and updating information pertaining to a number resource in a stored repository of number resources as mentioned in this paper.
Abstract: A number resource management system for automatically managing number resources implements a Web-based client-server application for accessing and updating information pertaining to a number resource in a stored repository of number resources. Each number resource has an associated unique customer identifier and an associated status attribute. A Web/browser-based interface device (140) enables communication between a business client (175) and an accessing server (120), the interface device directing the server to retrieve and update information relating to a number resource according to a unique customer identifier and/or a status attribute input.
TL;DR: The problems of ambiguity, context sensitivity, synonymy and polysemy that are inherent in natural languages, together with the abundance of web pages related to prominent topics, have exacerbated the difficulty of fulfilling the user’s information need.
Abstract: Classical information retrieval (IR) is concerned with indexing a collection of documents and answering queries by returning a ranked list of relevant documents [14, 21, 24]. With the growth of the web, the problems of ambiguity, context sensitivity, synonymy (two terms with the same meaning) and polysemy (one term with different meanings) that are inherent in natural languages, together with the abundance of web pages related to prominent topics, have exacerbated the difficulty of fulfilling the user’s information need. Most search sites have added directory-based topic browsing. The web is organized as a tree of topics, similar to the Dewey decimal system, the Library of Congress catalog, or the US Patent and Trademarks Office subject codes. Tree nodes are maintained by paid ontologists and/or specialist volunteers, such as at Yahoo!, The Mining Co., WWW Virtual Library, and Open Directory Project. This strategy may be biased because of sparsity of experts; at any rate it is biased away from the most accomplished and busiest people.
TL;DR: In this article, the authors present an apparatus and method for accessing web resources with a client browser (100) where the web resources are on a server (31), where the client browser generates a token that is provided to a security server (140) to provide third party validation of a client request for service.
Abstract: An apparatus and method provide flexible and heightened security for accessing web resources with a client browser (100), where the web resources are on a server (31). In particular, the apparatus and method are accomplished by having the client browser (100) generate a token that is provided to a security server (140) to provide third party validation of a client request for service. The client browser (100) makes a call for service, and includes the token as a argument of the call. A CGI-BIN program (160) that receives the call for service also receives the service identifier and arguments, among which is the client browser (100) generated token. The CGI-BIN application program (160) establishes a connection to the security server (140), and then sends the token received as an argument to the security server (140) for third-party verification. If the token is verified by the security server (140), then the CGI-BIN application program (160) executes the requested service program.
TL;DR: In this paper, a method and system for using alternative resource identifiers in the place of the conventional resource identifiers was proposed, which transforms the alternative resource identifier to conventional resource identifier using software on-the-fly.
Abstract: A method and system for using alternative resource identifiers in the place of the conventional resource identifiers. The invention transforms the alternative resource identifier to conventional resource identifier using software on-the-fly. The resources on the Internet are then accessed using the conventional resource identifiers in the conventional manner and are displayed to the user.
TL;DR: Uniform Resource Characteristics (URCs) are discussed in this document but only as descriptions of resources rather than identifiers.
Abstract: Retrieving the resource identified by a Uniform Resource Identifier (URI) [1] is only one of the operations that can be performed on a URI. One might also ask for and get a list of other identifiers that are aliases for the original URI or a bibliographic description of the resource the URI denotes, for example. This applies to both Uniform Resource Names (URNs) and Uniform Resource Locators (URLs). Uniform Resource Characteristics (URCs) are discussed in this document but only as descriptions of resources rather than identifiers.
TL;DR: A number of mechanisms have considerably improved STARTS ability to analyze real-world sentences and answer queries through expansion of its lexicon and integration of Web resources.
Abstract: The START system responds to natural language queries with answers in text, pictures, and other media. START's sentence-level natural language parsing relies on a number of mechanisms to help it process the huge, diverse resources available on the World Wide Web. Blitz, a hybrid heuristic- and corpus-based natural language preprocessor enables START to integrate a large and ever-changing lexicon of proper names, by using heuristic rules and precompiled tables of symbols to preprocess various highly regular and fixed expressions into lexical tokens. LaMeTH, a content-based system for extracting information from HTML documents, assists START by providing a uniform method of accessing information on the Web in real time. These mechanisms have considerably improved STARTS ability to analyze real-world sentences and answer queries through expansion of its lexicon and integration of Web resources.
TL;DR: Copernic 2000 is a desktop application that can make the task of managing information from the Web more efficient and effective and highlights some of the most useful features.
Abstract: Information professionals must deal with an ever-growing number of information sources. In the midst of this richness, information available freely over the World Wide Web is an extremely important resource. Identifying relevant Web resources and managing them over time -whether for our own use or use by our clients- is a time-consuming and often inefficient process. Copernic 2000 is a desktop application that can make the task of managing information from the Web more efficient and effective. This article highlights some of the most useful features of Copernic 2000.
TL;DR: This paper focuses on Web resources for basic cancer genetics research as opposed to patient care, clinical trial information, genetic testing, gene therapy, and general cancer health issues.
Abstract: The purpose of this paper is to summarize information resources and databases available on the World Wide Web that are pertinent to cancer genetics research. We focus primarily on Web resources for basic research as opposed to patient care, clinical trial information, genetic testing, gene therapy, and general cancer health issues. Included in our survey are Web sites that are primarily descriptive, those that are searchable by key words, and those that are linking pages to other cancer research sites. A summary table for the cancer research Web sites described in this paper is provided (Table 1). A summary list of general cancer research Web sites is also provided in Appendix I. To focus the scope of this survey further, we concentrated on five key areas for basic cancer genetics research: (1) animal models; (2) cancer genetics and genomics; (3) pathology; (4) reagents, services and laboratory protocols; and (5) cancer biology. In Appendix I, we provide a list of some cancer research-related sites that did not readily match any of the five focus areas but that will be of general interest to the community. As we searched for relevant Web sites, we tried to use them to answer the following kinds of research questions:
TL;DR: The approach to such difficulties as the representation, display and manipulation of symbolic expressions, numerical data and graphical visualizations are discussed, and a prototype Web site is described that has been constructed to test, evaluate and advance the NIST Digital Library of Mathematical Functions project.
Abstract: The concept of a digital library is of proven worth because of its ability to provide dramatic capabilities that are impossible with traditional print media. We are interested in providing such capabilities for scientific, technical and educational users of mathematical reference data. Our attention is focused on the highly specialized field of mathematics that is concerned with the properties, application and computation of the elementary and higher mathematical functions. Calling upon domain experts worldwide for assistance, the National Institute of Standards and Technology is conducting an ambitious project to construct, ab initio, a comprehensive and authoritative Web resource on this subject. The need to make effective use of the latest developments in digital library research is a major focus, as is the development of content. We discuss our approach to such difficulties as the representation, display and manipulation of symbolic expressions, numerical data and graphical visualizations, and we describe a prototype Web site that has been constructed to test, evaluate and advance the NIST Digital Library of Mathematical Functions project.
TL;DR: In this article, the authors discuss the need for teaching the evaluation of Web resources and suggest criteria for appraising Web pages, pose a possible format for citing web pages, and briefly discuss how to assess students' knowledge of the topic.
Abstract: Teaching Sociology, Vol. 27,1999 (January:31-37) WITH THE EXPLOSIVE and exponential growth of Web sites, the Internet has become an important (if often flawed) information re source. As more scholarly journals, reports, statistics, and polling data become available online, the academic value of this resource grows. Many textbooks and publishers now have supplements and guides to the Internet for sociologists and their students (Ferrante and Vaughn 1997; Rivard 1997). However, access to this glut of information is not always useful if one does not know how to select and evaluate the best and most reliable sites. Recognizing when to tum to the Web for information and when to rely on more tradi tional resources such as library catalogs, periodical and citation indexes, and subject encyclopedias is equally important. (See, for example, Abowitz's [1994] Appendix.) De spite the media hype, the Web is not a comprehensive information resource. This article illustrates a series of teaching tools useful for addressing these issues. After first examining the need for teaching the evaluation of Web resources, I suggest criteria for appraising Web pages, pose a possible format for citing Web pages, and briefly discuss how to assess students' mas tery of the topic.
TL;DR: The methods outlined include computer-based analysis tools, computer-facilitated focus groups, and focused individual interviews for determining disability accessibility barriers and potential solutions for those barriers found in four World Wide Web-based learning environments.
Abstract: This paper presents the methods and results of a year-long evaluation study, conducted for the purpose of determining disability accessibility barriers and potential solutions for those barriers found in four World Wide Web-based learning environments. The primary questions used to frame the evaluation study were: (1) Are there any features of the specific Web-based courseware package (learning environment) that are difficult to access by persons with disabilities? (2) What are the ways in which accessibility might be improved for the Web-based courseware? (3) Are there any standard HTML features of many Web pages in general that are difficult to access by persons with disabilities? and (4) What tools are available for checking accessibility for future revisions of the Web-based courseware? Subjects were 11 university students with disabilities. The methods outlined include computer-based analysis tools, computer-facilitated focus groups, and focused individual interviews. Common limitations and suggested solutions are discussed in the following areas: lack of alternative text for images, imagemap hotspots, and applets; forms usage; frames usage; graphical icons; tables usage; and browser-specific code. Additionally, the paper includes URLs (Uniform Resource Locators) for several Web resources on accessibility, including a site created by the author based on the results of the evaluation study. (MES) Reproductions supplied by EDRS are the best that can be made from the original document. Evaluating the Accessibility of Web-Based Instruction for Students With Disabilities PERMISSION TO REPRODUCE AND DISSEMINATE THIS MATERIAL HAS BEEN GRANTED BY
TL;DR: Insiders from twelve digital libraries, such as Agriculture Network Information Center, Infomine, Internet Public Library, and Social Science Information Gateway, reveal their selection criteria, evaluation process, funding sources and project budgets, and software and hardware tools.
Abstract: From the Publisher:
The Amazing Internet Challenge can help you organize web resources for your users. Insiders from twelve digital libraries, such as Agriculture Network Information Center (AgNIC), Infomine, Internet Public Library, and Social Science Information Gateway (SOSIG), reveal their selection criteria, evaluation process, funding sources and project budgets, and software and hardware tools.
TL;DR: This paper attempts to characterize the conceptual 'playing field' of the current transformations taking place, and in so doing proposes a structural model of the relationship that libraries should develop to Internet-based resources.
Abstract: As a powerful and radically new information medium, the World Wide Web has been embraced by libraries, as information centers par excellence, for its potential in effectively addressing patron needs. Because of the Web's rapid growth, librarians and other information professionals are developing a variety of solutions to bring the explosion of Web resources under control. While paradigmatic transformations like that taking place in the information industry today have become a tangible reality, information professionals are recognizing that only through the strategic redefining of the essential functions of libraries-selection, acquisition, organization, and access-will the transformative power of such change be harnessed most effectively. This paper attempts to characterize the conceptual 'playing field' of the current transformations taking place, and in so doing proposes a structural model of the relationship that libraries should develop to Internet-based resources. The tandem concepts of digi...
TL;DR: Clinicians should include Web sources when they provide information to their patients and a list of web sites that can be given to patients is provided.
TL;DR: The experiences of nursing and library services faculty in the development and implementation of a Web-delivered module for the evaluation of healthcare Web resources revealed its usefulness to the students and the potential of the collaborative development model for other content areas.
Abstract: Information literacy skills, which include the ability to evaluate electronic healthcare sites, are critical to the decision-making responsibilities of students and professionals. The authors describe the experiences of nursing and library services faculty in the development and implementation of a Web-delivered module for the evaluation of healthcare Web resources. A range of electronic tools was used for both the collaborative creation of the module as well as the instructional delivery of the content. Evaluation of the module revealed its usefulness to the students and the potential of the collaborative development model for other content areas.
TL;DR: In this paper, the authors present a guide for librarians who are planning, setting up and/or maintaining a virtual library, which can help even the smallest institution localize Web resources without costly equipment or telephone lines.
Abstract: Virtual libraries offer a single interface from which users can find out about (and even tour) the library, examine its catalogue, go to online databases, enter an interactive children's room, find out about special collections or community services and explore the rest of the Web. This guide is for librarians who are planning, setting up and/or maintaining a virtual library. The manual should help even the smallest institution localize Web resources without costly equipment or telephone lines. The first section considers the Web as a new medium for providing library services and offers advice about rethinking Web design principles in a library context. The second section covers all stages of putting a virtual library online: needs analysis, planning, maintenance, and eventual enhancement and change. The final section lays out the principles for electronic collection development and shows how to build special collections ranging from children's sites to multimedia collections. Librarians tackling difficult technical problems should also find strategies for ensuring that the special qualities of their physical library are translated into cyberspace.
TL;DR: In this article, Small and Arnone provide ideas, lesson plans, and examples for offering in-service workshops to practitioners, as well as lesson plans and related materials for student instruction.
Abstract: The motivational assessment tools may be used in several ways: * As a teaching tool to help your students learn valuable information literacy skills * As a decision support tool for deciding which web sites are appropriate for teaching your objectives * As a research tool for conducting practical research comparing the motivational effectiveness of various sites * As a design tool for creating a web site that will attract visitors and motivate them to remain in the web site Small and Arnone provide ideas, lesson plans, and examples for offering in-service workshops to practitioners, as well as lesson plans and related materials for student instruction. Overhead transparency masters and handouts are included. What web resources will you use in your instruction? Be sure to examine their motivational potential first!
TL;DR: This unique guide helps librarians teach their staff to be Internet trainers, and includes complete scripted workshops tailored for teaching Internet skills to library patrons.
Abstract: From the Publisher:
Training staff and patrons to use the Internet is one of today's most pressing needs for libraries of every type. This unique guide helps librarians teach their staff to be Internet trainers, and includes complete scripted workshops tailored for teaching Internet skills to library patrons. Part 1 focuses on training library staff with three one-hour workshops teaching staff members the basic principles of learning; how to apply those principles to the design of solid Internet training session; and how to conduct effective Internet training sessions with confidence. Areas covered include training objectives, adult learning principles, choosing and preparing the training site, interactive presentation techniques, and assisting resistant or intimidated trainees. Part 2 features seven ready-to-go workshops for users on major Internet topics, including the World Wide Web, search engines and finding information, books and literature information on the Web, college information, and Web publishing and HTML. Each one-hour module includes an introduction, an objective, a timed lesson plan, suggested methods for demonstrating the skills covered, tips, a sample script, a reproducible handout, lists of related Websites, and recommended books and magazines. These modules are designed to be as universal as possible in terms of content, setting and software, and are relevant to every type and size of library. Evaluation/assessment instruments and a listing of Web resources for trainers round out this time-saving resource.
TL;DR: A workshop on "Teaching Ethics and Computing" was held in August of 1998, sponsored by an NSF UFE grant, and participants developed model teaching/learning activities, most of which have been classroom-tested during the Fall semester of 1998.
Abstract: A workshop on "Teaching Ethics and Computing" was held in August of 1998, sponsored by an NSF UFE grant Participants developed model teaching/learning activities, most of which have been classroom-tested during the Fall semester of 1998, and revisions made where appropriate The activity write-ups are part of a web resource at http://marathoncseeusfedu//spl sim/kwb/nsf-ufe/ The web pages also include reviews of relevant videos The resources described on the web pages should be of value to all faculty whose teaching includes an "ethics and computing" component
TL;DR: CyberQuest as discussed by the authors ) is an activity developed to help secondary English education students on productive processes for assessing Web-based educational materials and incorporating them into the classroom, which can be seen as a kind of "getting caught in the web" problem.
Abstract: Atruism about teaching suggests that we tend to teach like we were taught. Even the most enthusiastically radical education students often bear out this truism when faced with actual teaching situations or decisions about what and how to teach. Introducing the complexities of the Internet into preservice teachers' arsenal of resources complicates the paradox between tradition and innovation even more. The potentially overwhelming nature of the Internet may cause some future -- and veteran -- teachers to avoid the medium, sticking with teaching materials and strategies they know rather than tackling the challenges posed by the Internet for the enhancement of learning. Although assuming such an "ostrich" attitude toward this dynamic technology may be tempting, the Internet will almost certainly continue to expand its impact on education and society as a whole, at least within the foreseeable future.[1] Because the Internet offers so many instructional possibilities, along with potential problems, finding strategies to utilize Web-based instruction meaningfully will be critical for all teachers in the next millennium. This article describes the use of the CyberQuest, an activity I have developed to help focus secondary English education students on productive processes for assessing Web-based educational materials and incorporating them into the classroom. Getting Caught in the Web Most Internet users have experienced the phenomenon of "getting caught in the Web" -- that is, spending hours exploring all sorts of interesting sites but never addressing the reason they first went online in the first place. Despite the many excellent educational resources available through the Internet, education students still in the process of developing their abilities to make sound curricular decisions are faced with a daunting task when examining Web resources. As one of my students described the Internet, the medium can be simply "a huge collection of stuff" unless we have some guidance for finding, then evaluating, useful links relevant to our disciplines and teaching objectives. Because "it has been common ... for groups of teachers to work to define a curriculum around a set of core books to be read in a particular course ... or at a particular grade level" (Allington 1995), allowing English education majors to explore text selections within a group of peers from a perspective of creating core reading experiences for students seems a vital exercise to prepare them for their future roles as curriculum innovators and implementers. Teacher education courses can help create collaborative learning situations that encourage students to ask "questions about [teachers'] roles and the value of the content we offer" (Kaiser 1995). At the same time, free-wheeling discussions or unstructured technology explorations may suggest to novices that "such [curricular] choices may be based on individual preferences, commonsense views of what is meaningful and fun, and stereotyped notions of what particular students need or can learn," without emphasizing the importance of the process of curriculum planning (Young 1991). The problem of text selection alone is challenging. But what happens when we add the expectation that novice teachers infuse technology into their teaching? Students are generally familiar with, if not wholly conscious of, values that frame text selection and literature study, applying a New Critical, reader response or other approach they have been exposed to in the classroom. Helping future English teachers clarify the basis behind their text selections is obviously critical. Technology infusion into the curriculum, however, may need new guidelines. The CyberQuest was developed to provide a structure for the exploration of English and language arts material on the Web to enhance the teaching of literature. Based on a more generalized Web-based activity for evaluating Internet sites for educators,[2] the CyberQuest utilizes "Cyberguides" developed by the SCORE project (Schools of California Online Resources for Educators). …
TL;DR: An overview of state of the art developments and emerging infrastructures to discover, identify and access information published on the web is provided.
Abstract: This article provides an overview of state of the art developments and emerging infrastructures to discover, identify and access information published on the web. It is based on an earlier discussion paper (Werf‐Davelaar, 1999) of work carried out in the context of the following projects: DONOR, which aims to establish an enabling infrastructure for improved information management and retrieval on SURFnet. SURFnet is the academic network in the Netherlands that provides Internet access to over 250 research and higher education organisations. DONOR is an initiative of the Koninklijke Bibliotheek, the National Library of the Netherlands. The project ran from 1998‐1999, with funding from the Steering Committee for Innovation in Scientific Information Provision in the Netherlands (IWI). The project‘s home page is at www.konbib.nl/donor; DESIRE, which aims to enhance existing European information networks for research users across Europe through research and development in three main areas of activity: caching, resource discovery and directory services. DESIRE runs from 1996‐2000, with funding from the European Commission‘s Telematics Application Programme. The project‘s home page is at www.desire.org; NEDLIB, which aims to construct the basic infrastructure upon which a networked European deposit library can be built. The objectives of NEDLIB concur with the mission of national deposit libraries to ensure that present electronic publications, including web resources can be used now and in the future. NEDLIB runs from 1998‐2000, with funding from the European Commission‘s Telematics Application Programme. The project‘s home page is at www.konbib.nl/nedlib.
TL;DR: In this paper, the results of a study conducted by the Scottish Business Information Service (SCOTBIS) part of the National Library of Scotland, evaluate the basic UK national company directories available via the Web.
Abstract: Reports results of a study, conducted by the Scottish Business Information Service (SCOTBIS), part of the National Library of Scotland, to evaluate the basic UK national company directories available via the Web. It aimed to evaluate and identify key Web sites that could be recommended, with some confidence, to SCOTBIS users; and to compare the retrieval rates for company information from such sites against some of SCOTBIS’ existing printed resources. A given list of UK companies, derived from current printed directories, was compared with similar Web directories to compare success rates in locating companies and to provide some basic evaluation of the content and nature of the selected Web resources. Key indicators were coverage and level of data provided and it was seen that the majority of sites cover approximately between 1.5 and two million companies. Each site also provides an address as well as a telephone number (Dun & Bradstreet being the sole exception to the latter). In terms of product informa...
TL;DR: Distance education students in a B. Ed (In-service) Education and Training of Adults strand, traditionally serviced by standard distance education techniques, including print media and telephone conference, were offered the opportunity to participate in a new learning experience using the Internet.
Abstract: Distance education students in a B. Ed (In-service) Education and Training of Adults strand, traditionally serviced by standard distance education techniques, including print media and telephone conference, were offered the opportunity to participate in a new learning experience using the Internet. A team of academics and IT staff modified a set of management subjects to run on a dedicated web page. The text-based materials were still distributed to students in advance, and an initial telephone conference was used to describe to students how the Internet delivery would be used, particularly the threaded web chat page. Written information was provided to assist students to access the Internet and the dedicated web page. Initially, students were reluctant to comment on the chat page and then, very slowly and prompted by examples of hyper-linked web resources, questions and ice-breakers from the academic staff, responses started to appear. An initial web conference was scheduled for local and international students and students finally began to discuss questions among themselves and offer help to oneanother. This was what the team had been waiting for, but it had been a long time coming!
TL;DR: This handbook is designed to provide the means for staff developers to teach their staff to be Internet trainers, and includes complete scripted workshops tailored for teaching Internet skills to users.
Abstract: This handbook is designed to provide the means for staff developers to teach their staff to be Internet trainers. It includes complete scripted workshops tailored for teaching Internet skills to users. Part One focuses on training library staff with three one-hour workshops, teaching staff the basic principles of learning; how to apply those principles to the design of solid Internet training sessions; and how to conduct effective Internet training sessions with confidence. Areas covered include training objectives; adult learning principles; choosing and preparing the training site; interactive presentation techniques; and assisting resistant or intimidated trainees. The second part features seven ready-to-deliver workshops for users on major Internet topics, including: the World Wide Web; search engines and finding information; finding high-quality medical data; finding high-quality business information; and Web publishing and HTML. Each one-hour module includes an introduction, an objective, a timed lesson plan, suggested methods for demonstrating the skills covered, tips, a session script, a reproducible handout, lists of related Web sites, and recommended books and magazines. These modules are designed to be as universal as possible in terms of content, setting and software. Also included is a selection of evaluation/assessment instruments and a listing of Web resources for trainers.