TL;DR: The development of an instrument that captures key characteristics of web site quality from the user's perspective is reported on, which provides an aggregate measure of web quality and would be useful to organizations and web designers, and to researchers in related web research.
TL;DR: This work shows that the Web self-organizes and its link structure allows efficient identification of communities and is significant because no central authority or process governs the formation and structure of hyperlinks.
Abstract: The vast improvement in information access is not the only advantage resulting from the increasing percentage of hyperlinked human knowledge available on the Web. Additionally, much potential exists for analyzing interests and relationships within science and society. However, the Web's decentralized and unorganized nature hampers content analysis. Millions of individuals operating independently and having a variety of backgrounds, knowledge, goals and cultures author the information on the Web. Despite the Web's decentralized, unorganized, and heterogeneous nature, our work shows that the Web self-organizes and its link structure allows efficient identification of communities. This self-organization is significant because no central authority or process governs the formation and structure of hyperlinks.
TL;DR: A taxonomy for characterizing Web data extraction fools is proposed, a survey of major web data extraction tools described in the literature is briefly surveyed, and a qualitative analysis of them is provided.
Abstract: In the last few years, several works in the literature have addressed the problem of data extraction from Web pages. The importance of this problem derives from the fact that, once extracted, the data can be handled in a way similar to instances of a traditional database. The approaches proposed in the literature to address the problem of Web data extraction use techniques borrowed from areas such as natural language processing, languages and grammars, machine learning, information retrieval, databases, and ontologies. As a consequence, they present very distinct features and capabilities which make a direct comparison difficult to be done. In this paper, we propose a taxonomy for characterizing Web data extraction fools, briefly survey major Web data extraction tools described in the literature, and provide a qualitative analysis of them. Hopefully, this work will stimulate other studies aimed at a more comprehensive analysis of data extraction approaches and tools for Web data.
TL;DR: This chapter discusses the infrastructure of the Web, the future of Web mining, and applications of semi-supervised learning for text and similarity and clustering.
Abstract: Preface. Introduction. I Infrastructure: Crawling the Web. Web search. II Learning: Similarity and clustering. Supervised learning for text. Semi-supervised learning. III Applications: Social network analysis. Resource discovery. The future of Web mining.
TL;DR: The Semantic Web as discussed by the authors is a new type of hierarchy and standardization that will replace the current Web of links with a web of meaning using a flexible set of languages and tools.
Abstract: From the Publisher:
As the World Wide Web continues to expand, it becomes increasingly difficult for users to obtain information efficiently. Because most search engines read format languages such as HTML or SGML, search results reflect formatting tags more than actual page content, which is expressed in natural language. Spinning the Semantic Web describes an exciting new type of hierarchy and standardization that will replace the current "web of links" with a "web of meaning." Using a flexible set of languages and tools, the Semantic Web will make all available information--display elements, metadata, services, images, and especially content--accessible. The result will be an immense repository of information accessible for a wide range of new applications.
This first handbook for the Semantic Web covers, among other topics, software agents that can negotiate and collect information, markup languages that can tag many more types of information in a document, and knowledge systems that enable machines to read Web pages and determine their reliability. The truly interdisciplinary Semantic Web combines aspects of artificial intelligence, markup languages, natural language processing, information retrieval, knowledge representation, intelligent agents, and databases.
TL;DR: By explicitly representing the role of semantics in different components of the information retrieval process (people, interfaces, search systems, and information resources), the Semantic Geospatial Web will enable users to retrieve more precisely the data they need, based on the semantics associated with these data.
Abstract: With the growth of the World Wide Web has come the insight that currently available methods for finding and using information on the web are often insufficient. In order to move the Web from a data repository to an information resource, a totally new way of organizing information is needed. The advent of the Semantic Web promises better retrieval methods by incorporating the data's semantics and exploiting the semantics during the search process. Such a development needs special attention from the geospatial perspective so that the particularities of geospatial meaning are captured appropriately. The creation the Semantic Geospatial Web needs the development multiple spatial and terminological ontologies, each with a formal semantics; the representation of those semantics such that they are available both to machines for processing and to people for understanding; and the processing of geospatial queries against these ontologies and the evaluation of the retrieval results based on the match between the semantics of the expressed information need and the available semantics of the information resources and search systems. This will lead to a new framework for geospatial information retrieval based on the semantics of spatial and terminological ontologies. By explicitly representing the role of semantics in different components of the information retrieval process (people, interfaces, search systems, and information resources), the Semantic Geospatial Web will enable users to retrieve more precisely the data they need, based on the semantics associated with these data.
TL;DR: The use of web mining techniques are suggested to build such an agent that could recommend on-line learning activities or shortcuts in a course web site based on learners' access history to improve course material navigation as well as assist the online learning process.
Abstract: A recommender system in an e-learning context is a software agent that tries to "intelligently" recommend actions to a learner based on the actions of previous learners. This recommendation could be an on-line activity such as doing an exercise, reading posted messages on a conferencing system, or running an on-line simulation, or could be simply a web resource. These recommendation systems have been tried in e-commerce to entice purchasing of goods, but haven't been tried in e-learning. This paper suggests the use of web mining techniques to build such an agent that could recommend on-line learning activities or shortcuts in a course web site based on learners' access history to improve course material navigation as well as assist the online learning process. These techniques are considered integrated web mining as opposed to off-line web mining used by expert users to discover on-line access patterns.
TL;DR: The paper summarizes the different characteristics of Web data, the basic components of Web mining and its different types, and the current state of the art.
Abstract: The paper summarizes the different characteristics of Web data, the basic components of Web mining and its different types, and the current state of the art. The reason for considering Web mining, a separate field from data mining, is explained. The limitations of some of the existing Web mining methods and tools are enunciated, and the significance of soft computing (comprising fuzzy logic (FL), artificial neural networks (ANNs), genetic algorithms (GAs), and rough sets (RSs) are highlighted. A survey of the existing literature on "soft Web mining" is provided along with the commercially available systems. The prospective areas of Web mining where the application of soft computing needs immediate attention are outlined with justification. Scope for future research in developing "soft Web mining" systems is explained. An extensive bibliography is also provided.
TL;DR: This article presents the origins, measurement functions, formulations and comparisons of well-known Web metrics for quantifying Web graph properties, Webpage significance, Web page similarity, search and retrieval, usage characterization and information theoretic properties, and discusses how these metrics can be applied for improving Web information access and use.
Abstract: The unabated growth and increasing significance of the World Wide Web has resulted in a flurry of research activity to improve its capacity for serving information more effectively. But at the heart of these efforts lie implicit assumptions about "quality" and "usefulness" of Web resources and services. This observation points towards measurements and models that quantify various attributes of web sites. The science of measuring all aspects of information, especially its storage and retrieval or informetrics has interested information scientists for decades before the existence of the Web. Is Web informetrics any different, or is it just an application of classical informetrics to a new medium? In this article, we examine this issue by classifying and discussing a wide ranging set of Web metrics. We present the origins, measurement functions, formulations and comparisons of well-known Web metrics for quantifying Web graph properties, Web page significance, Web page similarity, search and retrieval, usage characterization and information theoretic properties. We also discuss how these metrics can be applied for improving Web information access and use.
TL;DR: In this article, the focus of the research described in this paper was to identify the key quality factors in web site design and use from the factors identified, a conceptual model has been developed to assess how a web site can deliver what its users expect.
Abstract: The focus of the research described in this paper was to identify the key quality factors in Web site design and use From the factors identified, a conceptual model has been developed to assess how a Web site can deliver what its users expect The model is based on: ease of use, customer confidence, on‐line resources, and relationship services These facts have been validated in an assessment of a range of Web sites The model comprises a useful measurement tool which designers can use to suffice the quality of their Web sites
TL;DR: A preliminary implementation of the proposed Estimated-regression planners for web-services domain requires extending classical notations in various ways, and further tests are underway.
Abstract: "Web services" are agents on the web that provide services to other agents. Interacting with a web service is essentially a planning problem, provided the service exposes an interface containing action definitions, which in fact is an elegant representation of how web services actually behave. The question is what sort of planner is best suited for solving the resulting problems, given that dealing with web services involves gathering information and then acting on it. Estimated-regression planners use a backward analysis of the difficulty of a goal to guide a forward search through situation space. They are well suited to the web-services domain because it is easy to relax the assumption of complete knowledge, and to formalize what it is they don't know and could find out by sending the relevant messages. Applying them to this domain requires extending classical notations (e.g., PDDL) in various ways. A preliminary implementation of these ideas has been constructed, and further tests are underway.
TL;DR: This paper presents a design methodology for web services and business processes and discusses how business process should be described so that services can be properly identified and provide strategies and principles regarding functional and non-functional aspects of web service design.
Abstract: E-business is shifting attention from component based to web service based applications. Most enterprises spend most of their time assembling applications by consuming web services rather than worrying about the design principles underlying them, their granularity or the development of components that implement them. In this paper we present a design methodology for web services and business processes. We discuss how business process should be described so that services can be properly identified and provide strategies and principles regarding functional and non-functional aspects of web service design.
TL;DR: The purpose of this paper is to present several strategies and techniques that have proven successful for community building in Web-based learning environments.
Abstract: THE WEB, LIKE NO OTHER TECHNOLOGIES BEFORE IT, has received widespread acceptance and use across disciplines within higher education. Despite its broad application, satisfaction with the Web for purposes of instruction is not as prevalent. Many obstacles have presented themselves, creating barriers to satisfaction with the environment. Creating a community of learners is one strategy that has been recommended for increasing satisfaction. The purpose of this paper is to present several strategies and techniques that have proven successful for community building in Web-based learning environments.
TL;DR: This paper presents an approach to recover the architecture of dynamic web applications, in order to make maintenance more manageable, and is flexible and retargetable to the various technologies that are used in developing web applications.
Abstract: Web applications are the legacy software of the future. Developed under tight schedules, with high employee turn over, and in a rapidly evolving environment, these systems are often poorly structured and poorly documented. Maintaining such systems is problematic.This paper presents an approach to recover the architecture of such systems, in order to make maintenance more manageable. Our lightweight approach is flexible and retargetable to the various technologies that are used in developing web applications. The approach extracts the structure of dynamic web applications and shows the interaction between their various components such as databases, distributed objects, and web pages. The recovery process uses a set of specialized extractors to analyze the source code and binaries of web applications. The extracted data is manipulated to reduce the complexity of the architectural diagrams. Developers can use the extracted architecture to gain a better understanding of web applications and to assist in their maintenance.
TL;DR: This paper examines the synergy between Web service technology and simulation, and work on seamlessly using simulation as a part of Web service composition and process design, as well as on using Web services to re-build the JSIM Web-based simulation environment is highlighted.
Abstract: The World Wide Web has had an huge influence on the computing field in general as well as simulation in particular (e.g., Web-Based Simulation). A new wave of development based upon XML has started. Two of the most interesting aspects of this development are the Semantic Web and Web Services. This paper examines the synergy between Web service technology and simulation. In one direction, Web service processes can be simulated for the purpose of correcting/improving the design. In the other direction, simulation models/components can be built out of Web services. Work on seamlessly using simulation as a part of Web service composition and process design, as well as on using Web services to re-build the JSIM Web-based simulation environment is highlighted.
TL;DR: The Grid is an emerging platform to support on-demand "virtual organisations" for coordinated resource sharing and problem solving on a global scale and to realise its potential it also stands to benefit from Semantic Web technologies.
Abstract: The Grid is an emerging platform to support on-demand "virtual organisations" for coordinated resource sharing and problem solving on a global scale. The application thrust is large-scale scientific endeavour, and the scale and complexity of scientific data presents challenges for databases. The Grid is beginning to exploit technologies developed for Web Services and to realise its potential it also stands to benefit from Semantic Web technologies; conversely, the Grid and its scientific users provide application pull which will benefit the Semantic Web.
TL;DR: In this paper, a hybrid-order treelike Markov model is proposed to predict Web access precisely while providing high coverage and scalability, which is crucial in the rapidly growing World Wide Web.
Abstract: Accurately predicting Web user access behavior can minimize user-perceived latency, which is crucial in the rapidly growing World Wide Web. Although traditional Markov models have helped predict user access behavior, they have serious limitations. Hybrid-order treelike Markov models predict Web access precisely while providing high coverage and scalability.
TL;DR: Developing Web Information Systems brings together traditional system development methods that have been taught for many years on information systems and computer science courses with web/e-commerce development with coverage of data management and e-business strategy.
Abstract: Developing Web Information Systems brings together traditional system development methods that have been taught for many years on information systems and computer science courses with web/e-commerce development. It is the first book to bring together IS development and the web applications in a thorough and systematic way. There is a running case study that illustrates web IS development from start to finish. The case is easy to understand (a theatre) and results in a working web application. Most, if not all, analysis and design texts fall short of making that step into software. The book draws heavily on practical experiences of web-based IS development resulting from commercial system development, so as well as appealing to students and academics, it will also interest practitioners. The coverage of data management and e-business strategy gives the book the broader scope essential for understanding IS development properly in an Internet context.
TL;DR: By reading this book as soon as possible, you can renew the situation to get the inspirations and this way will lead you to always think more and more.
Abstract: Want to get experience? Want to get any ideas to create new things in your life? Read mining the world wide web an information search approach the information retrieval series now! By reading this book as soon as possible, you can renew the situation to get the inspirations. Yeah, this way will lead you to always think more and more. In this case, this book will be always right for you. When you can observe more about the book, you will know why you need this.
TL;DR: Sterne et al. as mentioned in this paper presented Web Metrics, a set of tools and techniques that can be used to determine if and how a Web site is adding value to a company.
Abstract: From the Publisher:
Learn how to determine whether a Web site is offering a competitive advantage
Despite the fact that numerous online ventures have recently fallen by the wayside, companies still realize that the Web plays an integral role in conducting business. They recognize the importance of measuring and analyzing the information gathered from their sites so they can find new ways to balance online and offline efforts. In this innovative book, leading Internet marketing expert Jim Sterne uncovers the latest tools and techniques that will help you determine if and how your Web site is adding value to your company. He clearly shows you how to use the range of available metrics to improve your Web marketing strategies. Incorporating his vast experience with clients such as Eastman Kodak, Ericsson, Sears Roebuck, and IBM, Sterne exposes the key issues facing corporate sites today. He then explains the role of Web metrics, detailing the criteria to follow in order to build a successful site and gain a competitive advantage in the marketplace.
Web Metrics provides you with everything youll need to know to measure your online business strategy, including:
Types of Web metrics tools, services, techniques, and standards for Web measurement
Ways to fully integrate Web metrics with the customer experience
Details on how to use metrics to meet specific business goals
The companion Web site includes links to online tools, resources, and white papers.
Author Biography: Jim Sterne is a leading expert on Internet marketing, specializing in creating strategies for business. As an author, a consultant to Fortune 500 companies and Internet entrepreneurs,and a public speaker, he focuses on the changing landscape of the World Wide Web as a medium for creating and strengthening customer relationships. With a special emphasis on Web metrics, his company, Target Marketing is dedicated to helping companies understand the possibilities and manage the realities of conducting business online.
TL;DR: A Web tool called MySpiders is presented, which implements an evolutionary algorithm managing a population of adaptive crawlers who browse the Web autonomously, and discusses the development and deployment of such a system.
Abstract: The dynamic nature of the World Wide Web makes it a challenge to find information that is both relevant and recent. Intelligent agents can complement the power of search engines to meet this challenge. We present a Web tool called MySpiders, which implements an evolutionary algorithm managing a population of adaptive crawlers who browse the Web autonomously. Each agent acts as an intelligent client on behalf of the user, driven by a user query and by textual and linkage clues in the crawled pages. Agents autonomously decide which links to follow, which clues to internalize, when to spawn offspring to focus the search near a relevant source, and when to starve. The tool is available to the public as a threaded Java applet. We discuss the development and deployment of such a system.
TL;DR: Detailed accounts of how students use the Web as a science resource are provided to illuminate how the different levels of domain knowledge, search expertise, and situational interest impact students' ability to find useful and relevant information on the Web.
Abstract: Students are increasingly using the World Wide Web (Web) as a science resource, especially to gather information on a variety of topics. The abundance of information on the Web makes it an especially tantalizing source of information, but not one without considerable risks due to its size and the inability of most Web search engines to organize and prioritize their search results. The purpose of this study was to examine searching patterns of students using the Web as a science information resource. We present cases of both successful and unsuccessful student experiences. Previous research demonstrates that domain knowledge and search expertise are particularly important in terms of students finding information on the Web. In light of these findings, we attempted to (a) provide detailed accounts of how students use the Web as a science resource, (b) illuminate how the different levels of domain knowledge, search expertise, and situational interest impact students' ability to find useful and relevant information on the Web, and (c) draw inferences about the types of tools and scaffolding needed by students when using the Web as a science resource. Detailed case descriptions of students' experiences facilitate discussion of how educators may integrate this popular information source more efficiently and effectively in their classrooms.
TL;DR: A web-based training framework comprising a set of topics that revolve around the use of feature structures as the core data structure in linguistic theory, its formal foundations, and its use in syntactic processing is proposed.
Abstract: We propose the creation of a web-based training framework comprising a set of topics that revolve around the use of feature structures as the core data structure in linguistic theory, its formal foundations, and its use in syntactic processing.
TL;DR: A set of criteria for evaluating and selecting Web resources as external data sources of a data warehouse and how to screen Web data sources using multi-criteria decision making (MCDM) methods are developed and discussed.
Abstract: A company's local data is often insufficient for analyzing market trends and making reasonable business plans. Decision making must also be based on information from suppliers, partners and competitors. Systematically integrating suitable external data from the Web into a data warehouse is a meaningful solution and will benefit the enterprise. However, the autonomy and dynamics of the Web make the task of selecting relevant and qualified external data from the Web challenging. We develop a set of criteria for evaluating and selecting Web resources as external data sources of a data warehouse and discuss how to screen Web data sources using multi-criteria decision making (MCDM) methods. The final decision with respect to selecting Web sources is sensitive to critical factors, i.e., the criterion weight and performance score of alternatives in terms of each criterion. We analyzed the sensitivity of the final rank of alternatives in terms of critical factors in order to gain an insight into the stability of our final decision. The comparison of several MCDM approaches for Web source screening is also presented.
TL;DR: This paper presents the mapping of RDF into CG and its interest in the context of a Corporate Semantic Web.
Abstract: With the aim of building a Corporate Semantic Web, the content of the documents must be explicitly represented through metadata in order to enable contents-guided search. The Corese engine is dedicated to the querying of corporate semantic webs whose documents are described into RDF annotations. Corese interprets these RDF metadata in the Conceptual Graphs (CG) model in order to exploit the inference capabilities of this formalism. This paper presents our mapping of RDF into CG and its interest in the context of a Corporate Semantic Web.
TL;DR: An analysis of questionnaires in which over 300 users were asked about incidents in which they found various kinds of information quality problems while performing tasks using the World Wide Web leads to the development of a theoretical model of factors affecting user detection of informationquality problems on the World wide Web.
Abstract: Although it is generally believed that information quality problems are not uncommon on the World Wide Web, little is known about the conditions in which users find these problems and the strategies they employ for dealing with them. Furthermore, very little theory is available to guide research on user detection of information quality problems on the Internet. This study involves an analysis of questionnaires in which over 300 users were asked about incidents in which they found various kinds of information quality problems while performing tasks using the World Wide Web. The objective of the research is the development of a theoretical model of factors affecting user detection of information quality problems on the World Wide Web. Preliminary results based on 132 questionnaires are discussed in this paper.
TL;DR: This paper addresses security concerns in Web services and the role of technology trust and addresses issues relating to security, transactions and scalability that need to be addressed.
Abstract: The Internet is changing the way businesses operate today. Firms are using the Web for procurement, to find trading partners, and to link existing applications to other applications. Web services are rapidly becoming the enabling technology of today’s e‐business, and e‐commerce systems. We are having a massive impact on the way businesses think about designing, developing, and deploying Web‐based applications. Web services may be an evolutionary step in designing distributed applications, however, they are not without problems. There are issues relating to security, transactions and scalability that need to be addressed. This paper addresses security concerns in Web services and the role of technology trust.
TL;DR: Some areas for application of the semantic web will provide intelligent access to heterogeneous, distributed information, enabling software products (agents) to mediate between user needs and the information sources available are described.
Abstract: Currently, computers are changing from single, isolated devices into entry points to a worldwide network of information exchange and business transactions called the World Wide Web (WWW). However, the success of the WWW has made it increasingly difficult to find, access, present and maintain the information required by a wide variety of users. In response to this problem, many new research initiatives and commercial enterprises have been set up to enrich the available information with machine-processable semantics. This Semantic Web will provide intelligent access to heterogeneous, distributed information, enabling software products (agents) to mediate between user needs and the information sources available. In this paper we describe some areas for application of this new technology. We focus on ongoing work in the fields of knowledge management and electronic commerce. We also take a perspective on the semantic web-enabled web services which will help to bring the semantic web to its full potential