About: Unstructured data is a research topic. Over the lifetime, 5112 publications have been published within this topic receiving 66632 citations. The topic is also known as: unstructured information.
TL;DR: The need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats is highlighted and the need to devise new tools for predictive analytics for structured big data is reinforced.
TL;DR: The current status of AI applications in healthcare, in the three major areas of early detection and diagnosis, treatment, as well as outcome prediction and prognosis evaluation, are surveyed and its future is discussed.
Abstract: Artificial intelligence (AI) aims to mimic human cognitive functions. It is bringing a paradigm shift to healthcare, powered by increasing availability of healthcare data and rapid progress of analytics techniques. We survey the current status of AI applications in healthcare and discuss its future. AI can be applied to various types of healthcare data (structured and unstructured). Popular AI techniques include machine learning methods for structured data, such as the classical support vector machine and neural network, and the modern deep learning, as well as natural language processing for unstructured data. Major disease areas that use AI tools include cancer, neurology and cardiology. We then review in more details the AI applications in stroke, in the three major areas of early detection and diagnosis, treatment, as well as outcome prediction and prognosis evaluation. We conclude with discussion about pioneer AI systems, such as IBM Watson, and hurdles for real-life deployment of AI.
TL;DR: Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.
Abstract: 1. Introduction to text mining 2. Core text mining operations 3. Text mining preprocessing techniques 4. Categorization 5. Clustering 6. Information extraction 7. Probabilistic models for Information extraction 8. Preprocessing applications using probabilistic and hybrid approaches 9. Presentation-layer considerations for browsing and query refinement 10. Visualization approaches 11. Link analysis 12. Text mining applications Appendix Bibliography.
TL;DR: A heterogeneous network embedding method is adopted, termed as TransR, to extract items' structural representations by considering the heterogeneity of both nodes and relationships and a final integrated framework, which is termed as Collaborative Knowledge Base Embedding (CKE), to jointly learn the latent representations in collaborative filtering.
Abstract: Among different recommendation techniques, collaborative filtering usually suffer from limited performance due to the sparsity of user-item interactions. To address the issues, auxiliary information is usually used to boost the performance. Due to the rapid collection of information on the web, the knowledge base provides heterogeneous information including both structured and unstructured data with different semantics, which can be consumed by various applications. In this paper, we investigate how to leverage the heterogeneous information in a knowledge base to improve the quality of recommender systems. First, by exploiting the knowledge base, we design three components to extract items' semantic representations from structural content, textual content and visual content, respectively. To be specific, we adopt a heterogeneous network embedding method, termed as TransR, to extract items' structural representations by considering the heterogeneity of both nodes and relationships. We apply stacked denoising auto-encoders and stacked convolutional auto-encoders, which are two types of deep learning based embedding techniques, to extract items' textual representations and visual representations, respectively. Finally, we propose our final integrated framework, which is termed as Collaborative Knowledge Base Embedding (CKE), to jointly learn the latent representations in collaborative filtering as well as items' semantic representations from the knowledge base. To evaluate the performance of each embedding component as well as the whole system, we conduct extensive experiments with two real-world datasets from different scenarios. The results reveal that our approaches outperform several widely adopted state-of-the-art recommendation methods.
TL;DR: The application of big data to health care is discussed, using an economic framework to highlight the opportunities it will offer and the roadblocks to implementation, and suggests that leveraging the collection of patient and practitioner data could be an important way to improve quality and efficiency of health care delivery.
Abstract: THE AMOUNT OF DATA BEING DIGITALLY COLLECTED AND stored is vast and expanding rapidly. As a result, the science of data management and analysis is also advancing to enable organizations to convert this vast resource into information and knowledge that helps them achieve their objectives. Computer scientists have invented the term big data to describe this evolving technology. Big data has been successfully used in astronomy (eg, the Sloan Digital Sky Survey of telescopic information), retail sales (eg, Walmart’s expansive number of transactions), search engines (eg, Google’s customization of individual searches based on previous web data), and politics (eg, a campaign’s focus of political advertisements on people most likely to support their candidate based on web searches). In this Viewpoint, we discuss the application of big data to health care, using an economic framework to highlight the opportunities it will offer and the roadblocks to implementation. We suggest that leveraging the collection of patient and practitioner data could be an important way to improve quality and efficiency of health care delivery. Widespread uptake of electronic health records (EHRs) has generated massive data sets. A survey by the American Hospital Association showed that adoption of EHRs has doubled from 2009 to 2011, partly a result of funding provided by the Health Information Technology for Economic and Clinical Health Act of 2009. Most EHRs now contain quantitative data (eg, laboratory values), qualitative data (eg, text-based documents and demographics), and transactional data (eg, a record of medication delivery). However, much of this rich data set is currently perceived as a byproduct of health care delivery, rather than a central asset to improve its efficiency. The transition of data from refuse to riches has been key in the big data revolution of other industries. Advances in analytic techniques in the computer sciences, especially in machine learning, have been a major catalyst for dealing with these large information sets. These analytic techniques are in contrast to traditional statistical methods (derived from the social and physical sciences), which are largely not useful for analysis of unstructured data such as text-based documents that do not fit into relational tables. One estimate suggests that 80% of business-related data exist in an unstructured format. The same could probably be said for health care data, a large proportion of which is text-based. In contrast to most consumer service industries, medicine adopted a practice of generating evidence from experimental (randomized trials) and quasi-experimental studies to inform patients and clinicians. The evidence-based movement is founded on the belief that scientific inquiry is superior to expert opinion and testimonials. In this way, medicine was ahead of many other industries in terms of recognizing the value of data and information guiding rational decision making. However, health care has lagged in uptake of newer techniques to leverage the rich information contained in EHRs. There are 4 ways big data may advance the economic mission of health care delivery by improving quality and efficiency. First, big data may greatly expand the capacity to generate new knowledge. The cost of answering many clinical questions prospectively, and even retrospectively, by collecting structured data is prohibitive. Analyzing the unstructured data contained within EHRs using computational techniques (eg, natural language processing to extract medical concepts from free-text documents) permits finer data acquisition in an automated fashion. For instance, automated identification within EHRs using natural language processing was superior in detecting postoperative complications compared with patient safety indicators based on discharge coding. Big data offers the potential to create an observational evidence base for clinical questions that would otherwise not be possible and may be especially helpful with issues of generalizability. The latter issue limits the application of conclusions derived from randomized trials performed on a narrow spectrum of participants to patients who exhibit very different characteristics. Second, big data may help with knowledge dissemination. Most physicians struggle to stay current with the latest evidence guiding clinical practice. The digitization of medical literature has greatly improved access; however, the sheer