TL;DR: The life cycle of Web services composition is overviews and the main standards, research prototypes, and platforms are surveyed using a set of assessment criteria identified in the article.
TL;DR: A structured and comprehensive overview of the literature in the field of Web Data Extraction is provided, namely applications at the Enterprise level and at the Social Web level, which allows to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users.
Abstract: Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction.This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.
TL;DR: This article reviews existing scraping frameworks and tools, identifying their strengths and limitations in terms of extraction capabilities and describing the operation of WhichGenes and PathJam, two bioinformatics meta-servers that use scraping as means to cope with gene set enrichment analysis.
Abstract: Web services are the de facto standard in biomedical data integration. However, there are data integration scenarios that cannot be fully covered by Web services. A number of Web databases and tools do not support Web services, and existing Web services do not cover for all possible user data demands. As a consequence, Web data scraping, one of the oldest techniques for extracting Web contents, is still in position to offer a valid and valuable service to a wide range of bioinformatics applications, ranging from simple extraction robots to online meta-servers. This article reviews existing scraping frameworks and tools, identifying their strengths and limitations in terms of extraction capabilities. The main focus is set on showing how straightforward it is today to set up a data scraping pipeline, with minimal programming effort, and answer a number of practical needs. For exemplification purposes, we introduce a biomedical data extraction scenario where the desired data sources, well-known in clinical microbiology and similar domains, do not offer programmatic interfaces yet. Moreover, we describe the operation of WhichGenes and PathJam, two bioinformatics meta-servers that use scraping as means to cope with gene set enrichment analysis.
TL;DR: A new approach for generating customers’ interaction via company’s contact web forms using the textual analytics, processing of frequently asked questions and a rule-based system is described.
Abstract: Modern service enterprises are challenged by a strong competition and dynamically changing business environments. Consequently, the precision of the business requirements identification and rigorous planning regarding investments to information technologies play a key role in the implementation of new capabilities. As the business today is more and more powered by information that is unstructured, social and distributed via various channels, the multi-channel interaction is a reality. It means each customer generates more and more data. On the other hand, a customer is often flooded with a huge amount, mostly not relevant, advertising information. Companies are challenged to collect the data from the customer interaction, analyze it and prepare an intelligent recommendation for an agent or to feed relevant offers to the customer. Modern understanding of the unstructured information requires a fundamentally new approach using the technology to deliver insights, ideas, and an intuition into the rapidly growing and diverse data that customers deal with every day. A hot topic is to decrease the costs and increase the customer satisfaction. One of the sensitive areas is customers’ interaction via company’s contact web forms. This interaction consists mostly of questions or complaints. Accordingly, we describe in our paper a new approach for generating these contact forms using the textual analytics, processing of frequently asked questions and a rule-based system. We also present particular use-cases to illustrate how this approach works in practice.
TL;DR: A quantitative evaluation shows a significant improvement when using an enriched version of SenticNet for polarity classification and a qualitative evaluation sheds light on the strengths and weaknesses of the concept grounding, and on the quality of the enrichment process.
Abstract: This paper presents a novel method for contextualizing and enriching large semantic knowledge bases for opinion mining with a focus on Web intelligence platforms and other high-throughput big data applications The method is not only applicable to traditional sentiment lexicons, but also to more comprehensive, multi-dimensional affective resources such as SenticNet It comprises the following steps: (i) identify ambiguous sentiment terms, (ii) provide context information extracted from a domain-specific training corpus, and (iii) ground this contextual information to structured background knowledge sources such as ConceptNet and WordNet A quantitative evaluation shows a significant improvement when using an enriched version of SenticNet for polarity classification Crowdsourced gold standard data in conjunction with a qualitative evaluation sheds light on the strengths and weaknesses of the concept grounding, and on the quality of the enrichment process
TL;DR: This paper analyses all the existing definitions of artificial intelligence and recommends that "Artificial Intelligence is the mechanical simulation system of collecting knowledge and information and processing intelligence of universe: (collating and interpreting) and disseminating it to the eligible in the form of actionable intelligence".
Abstract: The purpose of artificial intelligence is to acquire knowledge of required subject. Knowledge acquisition involves knowledge of dark energy (74% of universe), knowledge of dark matter (22%) and knowledge of visible matter (4%). This knowledge can be gathered through biological sensors (5 senses) and non-biological sensors like, robot, TV, mobile, camera, microscopes, radar, computer etc. The present definitions of artificial intelligence cover only the computer portion of the visible word i.e., an iota of 4% hence are not complete. A complete definition is required to cover the entire universe and all the means of acquisition of intelligence through artificial means. This paper analyses all the existing definitions of artificial intelligence and recommends that "Artificial Intelligence is the mechanical simulation system of collecting knowledge and information and processing intelligence of universe: (collating and interpreting) and disseminating it to the eligible in the form of actionable intelligence".
TL;DR: This paper investigates how big data analytics will affect the landscape of business intelligence, leading to big data intelligence and delineates business opportunities and managerial challenges brought forward by the emergence of big data Analytics.
Abstract: Big data analytics have been embraced as a disruptive technology that will reshape business intelligence, particularly marketing intelligence, which has have traditionally relied on market surveys to understand consumer behavior and product design. In this paper, we investigate how big data analytics will affect the landscape of business intelligence, leading to big data intelligence. Rooted in the recent literature, we delineate business opportunities and managerial challenges brought forward by the emergence of big data analytics and outline a number of research directions in big data intelligence for business.
TL;DR: The results showed that current informal learning websites have moderately adopted the most heavily promoted features of Web 2.0, and a positive relationship between Web 1.0 features and informal learning website ratings was found.
Abstract: Learning is becoming increasingly self-directed and often occurs away from schools and other formal educational settings. The development of a myriad of new technologies for learning has enabled people to learn anywhere and anytime. Web 2.0 technology allows researchers to shed a new light on the importance and prevalence of informal learning. However, there are few empirical studies that support the claim that this technology facilitates informal learning. The present study investigates the relationship between Web 2.0 levels and the evaluation of informal learning websites. For this purpose, 287 informal learning websites were selected and their Web 2.0 levels were rated based upon eight criteria proposed in the Web 2.0 exploratory literature. In addition, previously examined informal learning evaluation results were employed. The results showed that current informal learning websites have moderately adopted the most heavily promoted features of Web 2.0. Correlation analyses showed a positive relationship between Web 2.0 features and informal learning website ratings. The implications for the relationship and internal correlations of variables were summarized and discussed.
TL;DR: The history and key elements of Web site design in an e-commerce context – primarily in the period 2002–2012 are outlined and design issues as they are relevant to diverse users including those in global markets are articulated.
Abstract: Both the use of Web sites and the empirical knowledge as to what constitutes effective Web site design has grown exponentially in recent years. The aim of the current article is to outline the history and key elements of Web site design in an e-commerce context – primarily in the period 2002–2012. It was in 2002 that a Special Issue of ISR was focused on ‘Measuring e-Commerce in Net-Enabled Organizations.’ Before this, work was conducted on Web site design, but much of it was anecdotal. Systematic, empirical research and modeling of Web site design to dependent variables like trust, satisfaction, and loyalty until then had not receive substantial focus – at least in the information systems domain. In addition to an overview of empirical findings, this article has a practical focus on what designers must know about Web site elements if they are to provide compelling user experiences, taking into account the site’s likely users. To this end, the article elaborates components of effective Web site design, user characteristics, and the online context that impact Web usage and acceptance, and design issues as they are relevant to diverse users including those in global markets. Web site elements that result in positive business impact are articulated. This retrospective on Web site design concludes with an overview of future research directions and current developments.
TL;DR: The International Journal of Information systems aims to reflect the wide and interdisciplinary nature of the subject and articles that integrate technological disciplines with social, contextual and management issues, based on research using appropriate research methods.
Abstract: The International Journal of Information Systems(IJIS) is a peer-reviewed, open access journal that promotes the study of interest in information systems.It aims to reflect the wide and interdisciplinary nature of the subject and articles that integrate technological disciplines with social,contextual and management issues,based on research using appropriate research methods. Topics covered : Web databases. Web search and information extraction. Managing and storing XML data. Web data integration. Web media. Internet quality of service. Internet traffic Engineering. Mobile computing for the Internet. Web-based education. Advanced Web Applications. Communities on the Web.
TL;DR: A Context-aware Personal Information Retrieval (CPIR) algorithm, which considers both the participatory and implicit-topical properties of the context to improve the retrieval performance and demonstrates that CPIR can achieve significant improvements over several baselines.
Abstract: -People use a variety of social networking services to collect and organize web information for future reuse. When such contents are actually needed as reference to reply a post in an online conversation, however, the user may not be able to retrieve them with proper cues or may even forget their existence at all. In this paper, we study this problem in the online conversation context and investigate how to automatically retrieve the most context-relevant previously-seen web information without user intervention. We propose a Context-aware Personal Information Retrieval (CPIR) algorithm, which considers both the participatory and implicit-topical properties of the context to improve the retrieval performance. Since both the context and the user's web information are usually short and ambiguous, the participatory context is utilized to formulate and expand the query. Moreover, the implicit-topical context is exploited to implicitly determine the importance of each web information of the targeting user in the given context. The experimental results using real-world dataset demonstrate that CPIR can achieve significant improvements over several baselines.
TL;DR: This article summarizes the current state of web archiving in relation to researchers and research needs and outlines the challenges that still face researchers who wish to engage seriously with web content as an object of research, and archivists who must strike a balance reflecting a range of user needs.
Abstract: The web encourages the constant creation and distribution of large amounts of information; it is also a valuable resource for understanding human behavior and communication. To take full advantage of the web as a research resource that extends beyond the consideration of snapshots of the present, however, it is necessary to begin to take web archiving much more seriously as an important element of any research program involving web resources. The ephemeral character of the web requires that researchers take proactive steps in the present to enable future analysis. Efforts to archive the web or portions thereof have been developed around the world, but these efforts have not yet provided reliable and scalable solutions. This article summarizes the current state of web archiving in relation to researchers and research needs. Interviews with researchers, archivists, and technologists identify the differences in purpose, scope, and scale of current web archiving practice, and the professional tensions that arise given these differences. Findings outline the challenges that still face researchers who wish to engage seriously with web content as an object of research, and archivists who must strike a balance reflecting a range of user needs.
TL;DR: While certain conceptual models and frameworks exist on how to implement Web 2.0 tools in organisations there is a lack of evidence to suggest that they have been empirically tested, the findings of the scoping literature review indicate.
Abstract: Purpose – The aim of this paper is to examine the subject area of implementing Web 2.0 tools in organisations to identify from the literature common issues that must be addressed to assist organisations in their approach towards introducing Web 2.0 tools in their workplace. Based on the findings of the literature a Web 2.0 tools implementation model is presented. Design/methodology/approach – A general scoping review of the literature will be conducted to identify potential issues that might impact on the implementation of Web 2.0 tools in organisations to provide an overview of examples of empirical evidence that exists in this subject area with a view to examining how to advance this particular field of research. Findings – The findings of the scoping literature review indicate that while certain conceptual models and frameworks exist on how to implement Web 2.0 tools in organisations there is a lack of evidence to suggest that they have been empirically tested. The paper also notes that though organisa...
TL;DR: The research aimed to document the current state of Web 2.0 practice and make suggestions for engaging in specific types of content creation strategies (such as plain language and transparent communication practices).
Abstract: Web 2.0 experts working in social marketing participated in qualitative in-depth interviews. The research aimed to document the current state of Web 2.0 practice. Perceived strengths (such as the viral nature of Web 2.0) and weaknesses (such as the time consuming effort it took to learn new Web 2.0 platforms) existed when using Web 2.0 platforms for campaigns. Lessons learned were identified—namely, suggestions for engaging in specific types of content creation strategies (such as plain language and transparent communication practices). Findings present originality and value to practitioners working in social marketing who want to effectively use Web 2.0.
TL;DR: This paper analyzes the role and impact of Web 3.0 in business and identifies nine potential business models, based in direct and undirected revenue sources, which have emerged with the appearance of semantic web technologies.
Abstract: Web 30 promises to have a significant effect in users and businesses It will change how people work and play, how companies use information to market and sell their products, as well as operate their businesses The basic shift occurring in Web 30 is from information-centric to knowledge-centric patterns of computing Web 30 will enable people and machines to connect, evolve, share and use knowledge on an unprecedented scale and in new ways that make our experience of the Internet better Additionally, semantic technologies have the potential to drive significant improvements in capabilities and life cycle economics through cost reductions, improved efficiencies, enhanced effectiveness, and new functionalities that were not possible or economically feasible before In this paper we look to the semantic web and Web 30 technologies as enablers for the creation of value and appearance of new business models For that, we analyze the role and impact of Web 30 in business and we identify nine potential business models, based in direct and undirected revenue sources, which have emerged with the appearance of semantic web technologies
TL;DR: The Web Observatory project is a global effort to create a global distributed infrastructure that will foster communities exchanging and using each other's web-related datasets as well as sharing analytic applications for research and business web applications.
Abstract: The Web Observatory project is a global effort that is being led by the Web Science Trust, its network of WSTnet laboratories, and the wider Web Science community. The goal of this project is to create a global distributed infrastructure that will foster communities exchanging and using each other's web-related datasets as well as sharing analytic applications for research and business web applications.3 It will provide the means to observe the digital planet, explore its processes, and understand their impact on different sectors of human activity.
TL;DR: This paper presents and evaluates a collection of emerging techniques developed to determine the degree of similarity between text expressions and implements a variety of paradigms including the study of co-occurrence, text snippet comparison, frequent pattern finding, or search log analysis.
Abstract: Computing the semantic similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is a key challenge in many computer related fields. The problem is that traditional approaches to semantic similarity measurement are not suitable for all situations, for example, many of them often fail to deal with terms not covered by synonym dictionaries or are not able to cope with acronyms, abbreviations, buzzwords, brand names, proper nouns, and so on. In this paper, we present and evaluate a collection of emerging techniques developed to avoid this problem. These techniques use some kinds of web intelligence to determine the degree of similarity between text expressions. These techniques implement a variety of paradigms including the study of co-occurrence, text snippet comparison, frequent pattern finding, or search log analysis. The goal is to substitute the traditional techniques where necessary.
TL;DR: A study on web usage mining, its methods and applications, which focuses on the techniques that could predict user's behavior while the user interacts with web.
Abstract: Web mining is the application of data mining techniques to extract knowledge from Web. Web mining has been explored to a vast degree and different techniques have been proposed for a variety of applications that includes Web Search, Classification and Personalization etc. Web Usage Mining is that area of Web Mining which deals with the extraction of interesting knowledge from logging information produced by web servers. Web usage mining tries to discover the useful information from the secondary data derived from the interactions of the users while surfing on the web. It focuses on the techniques that could predict user's behavior while the user interacts with web. This paper discusses a study on web usage mining, its methods and applications.
TL;DR: A new publicly available provenance ontology for service discovery is created and a user study indicates the results extend beyond e-Science, and an integrated approach to web service description and discovery is developed.
Abstract: Web services have become common, if not essential, in the areas of business-to-business integration, distributed computing, and enterprise application integration. Yet the XML-based standards for web service descriptions encode only a syntactic representation of the service input and output. The actual meaning of these terms, their formal definitions, and their relationships to other concepts are not represented. This poses challenges for leveraging web services in the development of software capabilities. As the number of services grows and the specificity of users' needs increases, the ability to find an appropriate service for a specific application is strained. In order to overcome this challenge, semantic web services were proposed. For the discovery of web services, semantic web services use ontologies to find matches between user requirements and service capabilities. The computational reasoning afforded by ontologies enables users to find categorizations that weren't explicitly defined. However, there are a number of methodological variants on semantic web service discovery. Based on e-Science, an analog to e-Business, one methodology advocates deep and detailed semantic description of a web service's inputs and outputs. Yet, this methodology predates recent advances in semantic web and provenance research, and it is unclear the extent to which it applies outside of e-Science. We explore this question through a within-subjects experiment and we extend this methodology with current research in provenance, semantic web, and web service standards, developing and empirically evaluating an integrated approach to web service description and discovery. Implications for more advanced web service discovery algorithms and user interfaces are also presented. We address limitations in semantic web service discovery.Our approach is grounded in semantic web standards and W3C provenance ontology.Our user study indicates the results extend beyond e-Science.Our user study provides insights for web service discovery applications.We have created a new publicly available provenance ontology for service discovery.
TL;DR: This article reviews the evolution of the interface of client-server distributed systems, from Messaging and RPC systems that predate the Web, to RESTful Web APIs, and points out four directions in which Web APIs moving, including the incorporation of hypermedia and semantics.
Abstract: Distributed information systems predominantly have client-server architectures, as does the Web itself. In this article, we review the evolution of the interface of client-server distributed systems, from Messaging and RPC systems that predate the Web, to RESTful Web APIs. We highlight the often overlooked importance of the client-server interface in Web applications, and we reference historic and current systems to discuss the roles of "Web Service" technologies and Service-Oriented Architectures. Considering the future, we point out four directions in which we can see Web APIs moving, including the incorporation of hypermedia and semantics.
TL;DR: A high-quality program that covers a wide spectrum of topics, including semantics-driven information retrieval, semantic agent, intelligent e-Technologies, linked data applications, web mining, knowledge representation formalisms, semantic search, social network analysis and ontology engineering.
Abstract: A total of 170 submissions were received, of which 44 full papers and 11 posters were accepted for presentation at the conference and publication in the electronic proceedings and ACM's Digital Library. To cope with timing constraints and allow for sufficient time for interactions and poster sessions during the conference, 44 papers were given a 30-minute presentation slot and 11 papers were given an opportunity for poster presentation. All accepted papers in each category were given the same maximum number of pages in the proceedings.
Each of the 170 submissions received exactly 2 reviews, a remarkable result, for which I am grateful to the members of the Program Committee. In a few cases, we conducted an on-line discussion of the reviews and author responses, before reaching a final decision on the selected papers. The result is, I believe, a high-quality program that covers a wide spectrum of topics, including semantics-driven information retrieval, semantic agent, intelligent e-Technologies, linked data applications, web mining, knowledge representation formalisms, semantic search, social network analysis and ontology engineering.
TL;DR: Predicting how future changes in the Web will eventually bring about changes in e-Learning systems is made, potentially leading to virtual spaces of collaborative knowledge centered on active learning, student-centered applications, 3D visualization and intelligent agents based on semantic machines to permit students easy, intuitive access to information.
Abstract: It is widely accepted that the WWW has evolved consistently over the years. Early Web tools were simple, but as information technology and internet speeds evolved, new tools would emerge, creating an interactive, user-centered space where information is shared among all. The next generation of the Web, the Web 3.0, will aim primarily at organizing it through intelligent agents and semantic standards. At the same time, one of the earliest and most popular uses of the Web, e-Learning, is also changing. Thus, much as the Web changed from a “read-only” medium, to “read write” and to “read-write-collaborate”, so have the concept and methods of e Learning changed from a simple transposition of educational material to online support, to entirely new approaches to education, centered on student’s active participation, interaction and collaboration. Web 3.0 will further emphasize this revolutionary approach, potentially leading to virtual spaces of collaborative knowledge centered on active learning, student-centered applications, 3D visualization and intelligent agents based on semantic machines to permit students easy, intuitive access to information. By taking note of the parallels between the evolution of the Web and of e-Learning, we can make predictions of how future changes in the Web will eventually bring about changes in e-Learning systems.
TL;DR: This work proposes different sentiment-focused Web crawling strategies that prioritize discovered URLs based on their predicted sentiment scores and shows that these strategies achieve considerable performance improvement over general-purpose Web crawl strategies in discovery of sentimental Web content.
Abstract: Sentiments and opinions expressed in Web pages towards objects, entities, and products constitute an important portion of the textual content available in the Web. In the last decade, the analysis of such content has gained importance due to its high potential for monetization. Despite the vast interest in sentiment analysis, somewhat surprisingly, the discovery of sentimental or opinionated Web content is mostly ignored. This work aims to fill this gap and addresses the problem of quickly discovering and fetching the sentimental content present in the Web. To this end, we design a sentiment-focused Web crawling framework. In particular, we propose different sentiment-focused Web crawling strategies that prioritize discovered URLs based on their predicted sentiment scores. Through simulations, these strategies are shown to achieve considerable performance improvement over general-purpose Web crawling strategies in discovery of sentimental Web content.
TL;DR: In this article, the authors discuss the use of IR technology for handling annotations in Semantic Web (SW) languages and discuss the knowledge representation languages used for retrieving information from documents.
Abstract: A large amount of data is present on the web. It contains huge number of web pages and to find suitable information from them is very cumbersome task. There is need to organize data in formal manner so that user can easily access and use them. To retrieve information from documents, we have many Information Retrieval (IR) techniques. Current IR techniques are not so advanced that they can be able to exploit semantic knowledge within documents and give precise results. IR technology is major factor responsible for handling annotations in Semantic Web (SW) languages and in the present paper knowledgeable representation languages used for retrieving information are discussed.
TL;DR: This paper develops an efficient approach for automatic composition of Web services using the state-of-the-art Artificial Intelligence (AI) planners, where a Web service composition (WSC) problem is regarded as a WSC planning problem.
Abstract: Web services as independent software components are published by service providers over the Internet and invoked by service requesters for their desired functionalities. In many cases, however, there is no single service in a Web service repository satisfying a service request. So how to design an efficient method for composing a chain of connected services has become an important research issue. Recently, much research has been done into the search time reduction when finding a composite service. However, most methods take a long time for traversing all of the Web services in a service repository, thus it makes their response time significantly overrun a user's waiting patience. This paper develops an efficient approach for automatic composition of Web services using the state-of-the-art Artificial Intelligence (AI) planners, where a Web service composition (WSC) problem is regarded as a WSC planning problem. Unlike most traditional WSC methods that traverse a Web service repository many times, our approach converts a Web service repository into a planning domain in PDDL just once, which will only be regenerated when the Web service repository changes. This treatment substantially reduces the response time and improves the scalability of solving WSC problems. We have implemented a prototype system and conducted extensive experiments on large-scale Web service repositories. The experimental results demonstrate that our proposed approach outperforms the state-of-the-art.
TL;DR: Social networks were the most widely adopted while social bookmarking and tagging were the least used applications, and Web 2.0 utilization in African academic libraries was still in early stages.
Abstract: Purpose – This study aims to explore the extent of Web 2.0 adoption by libraries of top universities in Africa. It focuses on identifying the extent of utilization, types of Web 2.0 technologies adopted and how these technologies are used. Design/methodology/approach – The content analysis method was used. Data was collected by analyzing library websites of 82 top universities in Sub-Saharan Africa. Also, a combination of literature review and document analysis was applied. Findings – About half of the libraries in the study adopted one or more Web 2.0 applications. Social networks were the most widely adopted while social bookmarking and tagging were the least used applications. Web 2.0 utilization in African academic libraries was still in early stages. Research limitations/implications – This study is mainly based on analysis of library websites. Web 2.0 platforms that were password protected and accessible through intranet were not studied. Therefore, studies that are based on feedback of librarians a...
TL;DR: This book is intended to be a textbook about the Semantic Web and related topics, and is based on successful courses taught by the authors, and describes not only the theoretical issues underlying the semantic web, but also practical matters (such as algorithms, optimisation ideas and implementation details).
Abstract: The Semantic Web is a new area of research and development in the field of computer science, which aims to make it easier for computers to process the huge amount of information on the Web, and indeed other large databases, by enabling computers not only to read, but also understand the information. This book is intended to be a textbook about the Semantic Web and related topics, and is based on successful courses taught by the authors. They describe not only the theoretical issues underlying the semantic web, but also practical matters (such as algorithms, optimisation ideas and implementation details) and this aspect will make the book valuable as well to practitioners. Supplementary materials available via the web include include source the code of program examples, and the syntactic description of various languages.
TL;DR: The practical scan results show that the W3af security scanning service can not only detect the Clickjacking vulnerabilities brought by HTML5, but also provide efficient Web application security scanning and evaluation services for the websites.
Abstract: Web application has got a remarkable change in the past few years, many new technologies are reshaping the pattern of Web applications. Since many manufacturers' promotion on HTML5 technology, more and more websites are using HTML5 gradually. The new technology provides users with a variety of Internet applications, but introduces new security problems at the same time. Currently, most Web application scanners can not detect the security problems with HTML5 features, which make HTML5 security issues become blind spots in security vulnerability scanning process. The paper focuses on a research among the existing Web application scanners firstly. Then we selected W3af(Web Application Attack and Audit Framework) as a basic platform for transformation, and by customizing scanning modules and scripts, we designed a Web application security scanning service. The practical scan results show that it can not only detect the Clickjacking vulnerabilities brought by HTML5, but also provide efficient Web application security scanning and evaluation services for the websites.
TL;DR: The main focus of this paper is on doing users classification on the basis of discovered patterns from web logs, based on three steps which can be used by the website administrators for efficient administration and personalization of their websites.
Abstract: Web usage mining is a special area of web mining which is based upon the discovery and analysis of web usage patterns from web logs so as to effectively and efficiently serve the needs of the users visiting the websites The main focus of this paper is on doing users classification on the basis of discovered patterns from web logs Our proposed framework is based on three steps In the first step, preprocessing is done to remove useless data from web log file so as to reduce its size In the second step, this cleaned log file is used for discovering usage patterns Finally, the discovered patterns lead to the classification of users: on the basis of countries; on the basis of direct entry to the site or referred by the other site; on the basis of ti me of access, ie, either different seasons or different months or different days This information can then be used by the website administrators for efficient administration and personalization of their websites and thus the specific needs of specific communities of users can be fulfilled and so the profit can be increased