TL;DR: The current policy debate surrounding third-party web tracking is surveyed and the FourthParty web measurement platform is presented, to inform researchers with essential background and tools for contributing to public understanding and policy debates about web tracking.
Abstract: In the early days of the web, content was designed and hosted by a single person, group, or organization. No longer. Webpages are increasingly composed of content from myriad unrelated "third-party" websites in the business of advertising, analytics, social networking, and more. Third-party services have tremendous value: they support free content and facilitate web innovation. But third-party services come at a privacy cost: researchers, civil society organizations, and policymakers have increasingly called attention to how third parties can track a user's browsing activities across websites. This paper surveys the current policy debate surrounding third-party web tracking and explains the relevant technology. It also presents the FourthParty web measurement platform and studies we have conducted with it. Our aim is to inform researchers with essential background and tools for contributing to public understanding and policy debates about web tracking.
TL;DR: The World Wide Web as the largest information construct has had much progress since its advent and this paper provides a background of the evolution of the web from web 1.0 to web 4.0.
Abstract: The World Wide Web as the largest information construct has had much progress since its advent. This paper provides a background of the evolution of the web from web 1.0 to web 4.0. Web 1.0 as a web of information connections, Web 2.0 as a web of people connections, Web 3.0 as a web of knowledge connections and web 4.0 as a web of intelligence connections are described as four generations of the web in the paper.
TL;DR: It is demonstrated that different components of intelligence have their analogs in distinct brain networks, and it is proposed that intelligence is an emergent property of anatomically distinct cognitive systems, each of which has its own capacity.
TL;DR: This paper implemented a content-based RS that leverages the data available within Linked Open Data datasets (in particular DBpedia, Freebase and LinkedMDB) in order to recommend movies to the end users.
Abstract: The World Wide Web is moving from a Web of hyper-linked Documents to a Web of linked Data Thanks to the Semantic Web spread and to the more recent Linked Open Data (LOD) initiative, a vast amount of RDF data have been published in freely accessible datasets These datasets are connected with each other to form the so called Linked Open Data cloud As of today, there are tons of RDF data available in the Web of Data, but only few applications really exploit their potential power In this paper we show how these data can successfully be used to develop a recommender system (RS) that relies exclusively on the information encoded in the Web of Data We implemented a content-based RS that leverages the data available within Linked Open Data datasets (in particular DBpedia, Freebase and LinkedMDB) in order to recommend movies to the end users We extensively evaluated the approach and validated the effectiveness of the algorithms by experimentally measuring their accuracy with precision and recall metrics
TL;DR: It is shown that the division between Web 1.0, Web 2.0 and Web 3.0 is often deconstructed by activists’ media practices, and the importance of developing an approach that draws attention to the interplay between Web platforms rather than their transition is highlighted.
Abstract: Current internet research has been influenced by application developers and computer engineers who see the development of the Web as being divided into three different stages: Web 1.0, Web 2.0 and Web 3.0. This article will argue that this understanding – although important when analysing the political economy of the Web – can have serious limitations when applied to everyday contexts and the lived experience of technologies. Drawing from the context of the Italian student movement, we show that the division between Web 1.0, Web 2.0 and Web 3.0 is often deconstructed by activists’ media practices. Therefore, we highlight the importance of developing an approach that – by focusing on practice – draws attention to the interplay between Web platforms rather than their transition. This approach, we believe, is essential to the understanding of the complex relationship between Web developments, human negotiations and everyday social contexts.
TL;DR: It is suggested that the Internet/Web changes the dynamic relationship between what Cattell and Horn have identified as the two general factors of human intelligence: crystallized intelligence and fluid intelligence.
TL;DR: This demo presents the prototype of a scalable architecture for a large scale social Web of Things for smart objects and services, named Paraimpu, a Web-based platform which allows to add, use, share and inter-connect real HTTP-enabledSmart objects and "virtual" things like services on the Web and social networks.
Abstract: The Web of Things is a scenario where potentially billions of connected smart objects communicate using the Web protocols, HTTP in primis. A Web of Things envisioning and design has raised several research issues, from protocols adoption and communication models to architectural styles and social aspects facing. In this demo we present the prototype of a scalable architecture for a large scale social Web of Things for smart objects and services, named Paraimpu. It is a Web-based platform which allows to add, use, share and inter-connect real HTTP-enabled smart objects and "virtual" things like services on the Web and social networks. Paraimpu defines and uses few strong abstractions, in order to allow mash-ups of heterogeneous things introducing powerful rules for data adaptation. Adding and inter-connecting objects is supported through user friendly models and features.
TL;DR: It is argued that machine learning research has to offer a wide variety of methods applicable to different expressivity levels ofSemantic Web knowledge bases: ranging from weakly expressive but widely available knowledge bases in RDF to highly expressive first-order knowledge bases, this paper surveys statistical approaches to mining the Semantic Web.
Abstract: In the Semantic Web vision of the World Wide Web, content will not only be accessible to humans but will also be available in machine interpretable form as ontological knowledge bases. Ontological knowledge bases enable formal querying and reasoning and, consequently, a main research focus has been the investigation of how deductive reasoning can be utilized in ontological representations to enable more advanced applications. However, purely logic methods have not yet proven to be very effective for several reasons: First, there still is the unsolved problem of scalability of reasoning to Web scale. Second, logical reasoning has problems with uncertain information, which is abundant on Semantic Web data due to its distributed and heterogeneous nature. Third, the construction of ontological knowledge bases suitable for advanced reasoning techniques is complex, which ultimately results in a lack of such expressive real-world data sets with large amounts of instance data. From another perspective, the more expressive structured representations open up new opportunities for data mining, knowledge extraction and machine learning techniques. If moving towards the idea that part of the knowledge already lies in the data, inductive methods appear promising, in particular since inductive methods can inherently handle noisy, inconsistent, uncertain and missing data. While there has been broad coverage of inducing concept structures from less structured sources (text, Web pages), like in ontology learning, given the problems mentioned above, we focus on new methods for dealing with Semantic Web knowledge bases, relying on statistical inference on their standard representations. We argue that machine learning research has to offer a wide variety of methods applicable to different expressivity levels of Semantic Web knowledge bases: ranging from weakly expressive but widely available knowledge bases in RDF to highly expressive first-order knowledge bases, this paper surveys statistical approaches to mining the Semantic Web. We specifically cover similarity and distance-based methods, kernel machines, multivariate prediction models, relational graphical models and first-order probabilistic learning approaches and discuss their applicability to Semantic Web representations. Finally we present selected experiments which were conducted on Semantic Web mining tasks for some of the algorithms presented before. This is intended to show the breadth and general potential of this exiting new research and application area for data mining.
TL;DR: A comprehensive overview about the state-of-the-art architecture and technologies, and the most recent developments in the Geoprocessing Web is provided.
TL;DR: The lessons from this retrospective examination of the evolution of the Web are outlined, the main outcomes of Web Science activities are presented and directions along which future developments could be anticipated are discussed.
TL;DR: A framework for choreographing semantically enhanced Web Services encoded in a extended lightweight coordinative language which is derived from process calculus and is dedicated to running in modern Web browsers is proposed.
Abstract: Several solutions to describing service choreography have emerged, mainly focused on encoding capabilities of services especially for those deployed on the Web. These solutions are either derived from traditional Web service standards such as WSDL or inspired by the theory of process calculus. Little attention has however been paid to finding a lightweight solution which can enable peers to obtain, publish and share service choreography in an open environment or peer-to-peer network. This paper proposes a framework for choreographing semantically enhanced Web Services encoded in a extended lightweight coordinative language which is derived from process calculus and is dedicated to running in modern Web browsers. A proof-of-concept prototype has been implemented and demoed as a decentralised service choreography-management platform based on this framework. There is no need for users to install any third-party application, and service choreography execution is achieved via client-side Web browsers. Also, the preliminary experiments indicate the efficiency and scalability of our proof-of-concept implementation of this framework.
TL;DR: A hybrid method for personalized recommendation of news on the Web is presented, which provides Web users with an autonomous tool that is able to minimize repetitive and tedious Web surfing.
Abstract: A hybrid method for personalized recommendation of news on the Web is presented, which provides Web users with an autonomous tool that is able to minimize repetitive and tedious Web surfing. The proposed approach classifies Web pages by calculating the respective weights of terms. A user's interest and preference models are generated by analyzing the user's navigational history. Based on the content of the Web pages and on a user's interest and preference models, the recommender system suggests news Web pages to the user who is likely interested in the related topics. Moreover, the technique of collaborative filtering, which aims to choose the trusted users, is employed to improve the performance of the recommender system. Experiments are carried out in order to demonstrate the effectiveness of the proposed method. In the experiments, Web news items are classified and recommended to Web users by matching the users' interests with the contents of the news.
TL;DR: In this article, the authors consider the potential of the World Wide Web (web) as a medium for communicating social and environmental issues in the Australian minerals industry and find that managers are willing to utilise the organisational and mass communication capabilities of the web more than its timeliness and presentation features.
TL;DR: The presence of Web 2.0 applications was found to have a correlation with the overall web site quality, and in particular, service quality, as well as the perceived quality of government web sites.
Abstract: Purpose – The purpose of this paper is to investigate the extent to which Web 2.0 applications are prevalent in government web sites, the ways in which Web 2.0 applications have been used in government web sites, as well as whether the presence of Web 2.0 applications correlates with the perceived quality of government web sites.Design/methodology/approach – Divided equally between developing and advanced economies, a total of 200 government web sites were analysed using content analysis and multiple regression analysis.Findings – The prevalence of seven Web 2.0 applications in descending order was: RSS, multimedia sharing services, blogs, forums, social tagging services, social networking services and wikis. More web sites in advanced countries include Web 2.0 applications than those in developing countries. The presence of Web 2.0 applications was found to have a correlation with the overall web site quality, and in particular, service quality.Research limitations/implications – This paper only covers g...
TL;DR: Novel ideas are presented for solving the automated web service composition problem through a dynamic planning approach based on a novel AI planner designed for working in highly dynamic environments under time constraints, namely Simplanner.
Abstract: In this paper, novel ideas are presented for solving the automated web service composition problem. Some of the possible real world problems such as partial observability of the environment, nondeterministic effects of web services and service execution failures are solved through a dynamic planning approach. The proposed approach is based on a novel AI planner that is designed for working in highly dynamic environments under time constraints, namely Simplanner. World altering service calls are done according to the WS-Coordination and WS-Business Activity web service transaction specifications in order to physically recover from failure situations and prevent the undesired side effects of the aborted web service composition efforts.
TL;DR: This work demonstrates how the semantic web technology provides efficient solutions for the management of complex and distributed data in heterogeneous systems, and it can be used in the medical information systems as well.
Abstract: With the increased development of cloud computing, access control policies have become an important issue in the security filed of cloud computing. Semantic web is the extension of current Web which aims at automation, integration and reuse of data among different web applications such as clouding computing. However, Semantic web applications pose some new requirements for security mechanisms especially in the access control models. In this paper, we analyse existing access control methods and present a semantic based access control model which considers semantic relations among different entities in cloud computing environment. We have enriched the research for semantic web technology with role-based access control that is able to be applied in the field of medical information system or e-Healthcare system. This work demonstrates how the semantic web technology provides efficient solutions for the management of complex and distributed data in heterogeneous systems, and it can be used in the medical information systems as well.
TL;DR: This paper explores the use of Web 2.0 technologies for collaborative learning in a higher education context and concludes there is a clear need to develop, support and encourage strong interaction both between teachers and students, and amongst the students themselves.
Abstract: This paper explores the use of Web 2.0 technologies for collaborative learning in a higher education context. A review of the literature exploring the strengths and weaknesses of Web 2.0 technology is presented, and a conceptual model of a Web 2.0 community of inquiry is introduced. Two Australian case studies are described, with an ex-poste evaluation of the use of Web 2.0 tools. Conclusions are drawn as to the potential for the use of Web 2.0 tools for collaborative e-learning in higher education. In particular, design and integration of Web 2.0 tools should be closely related to curriculum intent and pedagogical requirements, care must be taken to provide clear guidance on both expected student activity and learning expectations, and there is a clear need to develop, support and encourage strong interaction both between teachers and students, and amongst the students themselves.
TL;DR: A semantic web usage mining approach for discovering periodic web access patterns from annotated web usage logs which incorporates information on consumer emotions and behaviors through self-reporting and behavioral tracking is proposed.
Abstract: The relationships between consumer emotions and their buying behaviors have been well documented. Technology-savvy consumers often use the web to find information on products and services before they commit to buying. We propose a semantic web usage mining approach for discovering periodic web access patterns from annotated web usage logs which incorporates information on consumer emotions and behaviors through self-reporting and behavioral tracking. We use fuzzy logic to represent real-life temporal concepts (e.g., morning) and requested resource attributes (ontological domain concepts for the requested URLs) of periodic pattern-based web access activities. These fuzzy temporal and resource representations, which contain both behavioral and emotional cues, are incorporated into a Personal Web Usage Lattice that models the user's web access activities. From this, we generate a Personal Web Usage Ontology written in OWL, which enables semantic web applications such as personalized web resources recommendation. Finally, we demonstrate the effectiveness of our approach by presenting experimental results in the context of personalized web resources recommendation with varying degrees of emotional influence. Emotional influence has been found to contribute positively to adaptation in personalized recommendation.
TL;DR: This is the first study of its kind to understand and quantify the value of Web-scale extraction, and how structured information is distributed amongst top aggregator websites and tail sites for various interesting domains.
Abstract: In this paper, we analyze the nature and distribution of structured data on the Web. Web-scale information extraction, or the problem of creating structured tables using extraction from the entire web, is gathering lots of research interest. We perform a study to understand and quantify the value of Web-scale extraction, and how structured information is distributed amongst top aggregator websites and tail sites for various interesting domains. We believe this is the first study of its kind, and gives us new insights for information extraction over the Web.
TL;DR: The World Wide Web has become a primary meeting place for information and recreation, for communication and commerce, for a quarter of the world's population and as a source of machine-readable texts for corpus linguists and researchers in complementary fields like natural language processing, information retrieval, and text mining.
Abstract: The World Wide Web has become a primary meeting place for information and recreation, for communication and commerce, for a quarter of the world's population. Millions of Web authors have created billions of Web pages, unknowingly providing texts to be mined for their linguistic and cultural content. The Web has evolved into the resource of first resort for lexicographers and linguists, for translators, teachers, and other language professionals. As a source of machine-readable texts for corpus linguists and researchers in complementary fields like natural language processing (NLP), information retrieval, and text mining, the Web offers extraordinary accessibility, quantity, variety, and cost-effectiveness. Investigators in these disciplines have developed scores of tools and products from Web content for both researchers and end users, and authored hundreds of scholarly papers on their projects.
Keywords:
educational linguistics;
natural language processing;
second language acquisition;
corpus;
language for specific purposes;
language learning technology
TL;DR: This work considers how to aggregate data from many Web sources to create topical portals and how to provide search over the collection of data tables on the Web.
Abstract: The World Wide Web offers a vast array of data in many forms. The majority of this data is structured for presentation to humans, in the form of HTML tables, lists, and forms-based search interfaces. Building systems that offer data integration services on this vast collection of data raises several unique challenges. We begin by describing different approaches for accessing and querying data that is on the deep Web , referring to data that is stored in databases and available only by querying HTML forms. We then consider how to aggregate data from many Web sources to create topical portals and how to provide search over the collection of data tables on the Web. Finally, we discuss recent work that allows users with varying technical skills to perform lightweight tasks with data they find on the Web.
TL;DR: An overview of different approaches for web service discovery described in literature is given and a survey of how these approaches differ from each other is presented.
Abstract: Web services are playing an important role in e-business and e-commerce applications. As web service applications are interoperable and can work on any platform, large scale distributed systems can be developed easily using web services. Finding most suitable web service from vast collection of web services is very crucial for successful execution of applications. Traditional web service discovery approach is a keyword based search using UDDI. Various other approaches for discovering web services are also available. Some of the discovery approaches are syntax based while other are semantic based. Having system for service discovery which can work automatically is also the concern of service discovery approaches. As these approaches are different, one solution may be better than another depending on requirements. Selecting a specific service discovery system is a hard task. In this paper, we give an overview of different approaches for web service discovery described in literature. We present a survey of how these approaches differ from each other.
TL;DR: Computational Analysis of Terrorist Groups: Lashkar-e-Taiba provides an in-depth look at Web intelligence, and how advanced mathematics and modern computing technology can influence the insights the authors have on terrorist groups.
Abstract: Computational Analysis of Terrorist Groups: Lashkar-e-Taiba provides an in-depth look at Web intelligence, and how advanced mathematics and modern computing technology can influence the insights we have on terrorist groups. This book primarily focuses on one famous terrorist group known as Lashkar-e-Taiba (or LeT), and how it operates. After 10 years of counter Al Qaeda operations, LeT is considered by many in the counter-terrorism community to be an even greater threat to the US and world peace than Al Qaeda. Computational Analysis of Terrorist Groups: Lashkar-e-Taiba is the first book that demonstrates how to use modern computational analysis techniques including methods for big data analysis. This book presents how to quantify both the environment in which LeT operate, and the actions it took over a 20-year period, and represent it as a relational database table. This table is then mined using sophisticated data mining algorithms in order to gain detailed, mathematical, computational and statistical insights into LeT and its operations. This book also provides a detailed history of Lashkar-e-Taiba based on extensive analysis conducted by using open source information and public statements. Each chapter includes a case study, as well as a slide describing the key results which are available on the authors web sites. Computational Analysis of Terrorist Groups: Lashkar-e-Taiba is designed for a professional market composed of government or military workers, researchers and computer scientists working in the web intelligence field. Advanced-level students in computer science will also find this valuable as a reference book.
TL;DR: This paper introduces a declarative language designed to specify fragments of the Web of Data and actions to be performed based on these data, and implements it in a centralized fashion, and shows its power and performance.
Abstract: The massive semantic data sources linked in the Web of Data give new meaning to old features like navigation; introduce new challenges like semantic specification of Web fragments; and make it possible to specify actions relying on semantic data. In this paper we introduce a declarative language to face these challenges. Based on navigational features, it is designed to specify fragments of the Web of Data and actions to be performed based on these data. We implement it in a centralized fashion, and show its power and performance. Finally, we explore the same ideas in a distributed setting, showing their feasibility, potentialities and challenges.
TL;DR: This paper reviews the process of discovering useful patterns from the web server log file of an academic institute and finds that web usage mining techniques can apply on these web logs.
Abstract: Web server log repositories are great source of knowledge, which keeps the record of web usage patterns of different web users. The Web usage pattern analysis is the process of identifying browsing patterns by analyzing the user's navigational behavior. The web server log files which store the information about the visitors of web sites is used as input for the web usage pattern analysis process. First these log files are preprocessed and converted into required formats so web usage mining techniques can apply on these web logs. This paper reviews the process of discovering useful patterns from the web server log file of an academic institute. The obtained results can be used in different applications like web traffic analysis, efficient website administration, site modifications, system improvement and personalization and business intelligence etc.
TL;DR: This paper provides a conceptual analysis of the second and third version of the Web model resulting in the convergence and integration of key features from the current and next generation Web.
Abstract: The social trend is progressively becoming the key feature of current Web understanding (Web 20) This trend appears irrepressible as millions of users, directly or indirectly connected through social networks, are able to share and exchange any kind of content, information, feeling or experience Social interactions radically changed the user approach Furthermore, the socialization of content around social objects provides new unexplored commercial marketplaces and business opportunities On the other hand, the progressive evolution of the web towards the Semantic Web (or Web 30) provides a formal representation of knowledge based on the meaning of data When the social meets semantics, the social intelligence can be formed in the context of a semantic environment in which user and community profiles as well as any kind of interaction is semantically represented (Semantic Social Web) This paper first provides a conceptual analysis of the second and third version of the Web model That discussion is aimed at the definition of a middle concept (Web 25) resulting in the convergence and integration of key features from the current and next generation Web The Semantic Social Web (Web 25) has a clear theoretical meaning, understood as the bridge between the overused Web 20 and the not yet mature Semantic Web (Web 30)
TL;DR: To study the customer's behavior using the Web mining techniques and its application in e-commerce to mine customer behavior, the principle of data mining is to cluster customer segments by using K-Means algorithm in which input data comes from web log of various e- commerce websites.
Abstract: With the explosive growth of information sources available on the WWW, it has become an important tool for users in order to find, extract, filter and evaluate the desired information and resources. The main purpose of this paper is to study the customer's behavior using the Web mining techniques and its application in e-commerce to mine customer behavior. The concept of Web mining describing the process of Web data mining in detail: source data collection, data pre-processing, pattern discovery, pattern analysis and cluster analysis. With the advanced information technologies, server are now able to collect and store mountains of data, describing their numerous contributions and different customer profiles, from which they seek to derive information about their customer's requirements. Conventional methods are no longer appropriate for these business situations to find the customer behavior. The principle of data mining is to cluster customer segments by using K-Means algorithm in which input data comes from web log of various e-commerce websites. Hence, determine the relationship between Web data mining and e-commerce and also to apply Web mining technology in e-commerce.
TL;DR: A unified framework is described where some of these techniques are integrated in order to build efficient vector web mapping clients and servers and some principles for future standards to support the development ofvector web mapping are given.
Abstract: Improving the use of vector data in web mapping is often shown as an important challenge. Such shift from raster to vector web maps would open web mapping and GIS to new innovations and new practices. The main obstacle is a performance issue: Vector web maps in nowadays web mapping environments are usually too slow and not usable. Existing techniques for vector web mapping cannot solve alone the performance issue. This article describes a unified framework where some of these techniques are integrated in order to build efficient vector web mapping clients and servers. This framework is composed of the following elements: Specific formats for vector data and symbology, vector tiling, spatial index services, and generalization for multi-scale data. A prototype based on this framework has been implemented and has shown satisfying results. Some principles for future standards to support the development of vector web mapping are given.
TL;DR: This chapter describes existing algorithms for extraction and processing of target and scene information, multi-sensor cross camera analysis, inferencing of simple, complex and abnormal video events, data mining, image search and retrieval, intuitive UIs for efficient customer experience, and text summarization of visual data.
Abstract: This chapter focuses on various algorithms and techniques in video analytics that can be applied to the business intelligence domain. The goal is to provide the reader with an overview of the state of the art approaches in the field of video analytics, and also describe the various applications where these technologies can be applied. We describe existing algorithms for extraction and processing of target and scene information, multi-sensor cross camera analysis, inferencing of simple, complex and abnormal video events, data mining, image search and retrieval, intuitive UIs for efficient customer experience, and text summarization of visual data. We have also presented the evaluation results of each of these technology components using in-house and other publicly available datasets.
TL;DR: This book takes readers through all aspects of Web 2.0, from the development of technologies to current services, and goes beyond this to explore such topics as the Semantic Web, cloud computing and Web Science.
Abstract: Web 2.0 and Beyond: Principles and Technologies draws on the authors iceberg model of Web 2.0, which places the social Web at the tip of the iceberg underpinned by a framework of technologies and ideas. The author incorporates research from a range of areas, including business, economics, information science, law, media studies, psychology, social informatics and sociology. This multidisciplinary perspective illustrates not only the wide implications of computing but also how other areas interpret what computer science is doing. After an introductory chapter, the book is divided into three sections. The first one discusses the underlying ideas and principles, including user-generated content, the architecture of participation, data on an epic scale, harnessing the power of the crowd, openness and the network effect and Web topology. The second section chronologically covers the main types of Web 2.0 servicesblogs, wikis, social networks, media sharing sites, social bookmarking and microblogging. Each chapter in this section looks at how the service is used, how it was developed and the technology involved, important research themes and findings from the literature. The final section presents the technologies and standards that underpin the operation of Web 2.0 and goes beyond this to explore such topics as the Semantic Web, cloud computing and Web Science. Suitable for nonexperts, students and computer scientists, this book provides an accessible and engaging explanation of Web 2.0 and its wider context yet is still grounded in the rigour of computer science. It takes readers through all aspects of Web 2.0, from the development of technologies to current services.