Unstructured data

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1016/J.IJINFOMGT.2014.10.007•

Beyond the hype

[...]

Amir H. Gandomi¹, Murtaza Haider¹•Institutions (1)

Ryerson University¹

01 Apr 2015-International Journal of Information Management

TL;DR: The need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats is highlighted and the need to devise new tools for predictive analytics for structured big data is reinforced.

...read moreread less

3,982 citations

Journal Article•10.1136/SVN-2017-000101•

Artificial intelligence in healthcare: past, present and future

[...]

Fei Jiang¹, Yong Jiang², Hui Zhi³, Yi Dong⁴, Hao Li, Sufeng Ma, Yilong Wang, Qiang Dong⁴, Haipeng Shen¹, Yongjun Wang - Show less +6 more•Institutions (4)

University of Hong Kong¹, Capital Medical University², Li Ka Shing Faculty of Medicine, University of Hong Kong³, Fudan University⁴

1 Dec 2017

TL;DR: The current status of AI applications in healthcare, in the three major areas of early detection and diagnosis, treatment, as well as outcome prediction and prognosis evaluation, are surveyed and its future is discussed.

...read moreread less

Abstract: Artificial intelligence (AI) aims to mimic human cognitive functions. It is bringing a paradigm shift to healthcare, powered by increasing availability of healthcare data and rapid progress of analytics techniques. We survey the current status of AI applications in healthcare and discuss its future. AI can be applied to various types of healthcare data (structured and unstructured). Popular AI techniques include machine learning methods for structured data, such as the classical support vector machine and neural network, and the modern deep learning, as well as natural language processing for unstructured data. Major disease areas that use AI tools include cancer, neurology and cardiology. We then review in more details the AI applications in stroke, in the three major areas of early detection and diagnosis, treatment, as well as outcome prediction and prognosis evaluation. We conclude with discussion about pioneer AI systems, such as IBM Watson, and hurdles for real-life deployment of AI.

...read moreread less

3,329 citations

Book•

The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

[...]

Ronen Feldman¹, James Sanger•Institutions (1)

Hebrew University of Jerusalem¹

1 Dec 2006

TL;DR: Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.

...read moreread less

Abstract: 1. Introduction to text mining 2. Core text mining operations 3. Text mining preprocessing techniques 4. Categorization 5. Clustering 6. Information extraction 7. Probabilistic models for Information extraction 8. Preprocessing applications using probabilistic and hybrid approaches 9. Presentation-layer considerations for browsing and query refinement 10. Visualization approaches 11. Link analysis 12. Text mining applications Appendix Bibliography.

...read moreread less

1,979 citations

Proceedings Article•10.1145/2939672.2939673•

Collaborative Knowledge Base Embedding for Recommender Systems

[...]

Fuzheng Zhang¹, Nicholas Jing Yuan¹, Defu Lian², Xing Xie¹, Wei-Ying Ma¹ - Show less +1 more•Institutions (2)

Microsoft¹, University of Electronic Science and Technology of China²

13 Aug 2016

TL;DR: A heterogeneous network embedding method is adopted, termed as TransR, to extract items' structural representations by considering the heterogeneity of both nodes and relationships and a final integrated framework, which is termed as Collaborative Knowledge Base Embedding (CKE), to jointly learn the latent representations in collaborative filtering.

...read moreread less

Abstract: Among different recommendation techniques, collaborative filtering usually suffer from limited performance due to the sparsity of user-item interactions. To address the issues, auxiliary information is usually used to boost the performance. Due to the rapid collection of information on the web, the knowledge base provides heterogeneous information including both structured and unstructured data with different semantics, which can be consumed by various applications. In this paper, we investigate how to leverage the heterogeneous information in a knowledge base to improve the quality of recommender systems. First, by exploiting the knowledge base, we design three components to extract items' semantic representations from structural content, textual content and visual content, respectively. To be specific, we adopt a heterogeneous network embedding method, termed as TransR, to extract items' structural representations by considering the heterogeneity of both nodes and relationships. We apply stacked denoising auto-encoders and stacked convolutional auto-encoders, which are two types of deep learning based embedding techniques, to extract items' textual representations and visual representations, respectively. Finally, we propose our final integrated framework, which is termed as Collaborative Knowledge Base Embedding (CKE), to jointly learn the latent representations in collaborative filtering as well as items' semantic representations from the knowledge base. To evaluate the performance of each embedding component as well as the whole system, we conduct extensive experiments with two real-world datasets from different scenarios. The results reveal that our approaches outperform several widely adopted state-of-the-art recommendation methods.

...read moreread less

1,907 citations

Journal Article•10.1001/JAMA.2013.393•

The inevitable application of big data to health care.

[...]

Travis B. Murdoch¹, Allan S. Detsky•Institutions (1)

University of Calgary¹

03 Apr 2013-JAMA

TL;DR: The application of big data to health care is discussed, using an economic framework to highlight the opportunities it will offer and the roadblocks to implementation, and suggests that leveraging the collection of patient and practitioner data could be an important way to improve quality and efficiency of health care delivery.

...read moreread less

Abstract: THE AMOUNT OF DATA BEING DIGITALLY COLLECTED AND stored is vast and expanding rapidly. As a result, the science of data management and analysis is also advancing to enable organizations to convert this vast resource into information and knowledge that helps them achieve their objectives. Computer scientists have invented the term big data to describe this evolving technology. Big data has been successfully used in astronomy (eg, the Sloan Digital Sky Survey of telescopic information), retail sales (eg, Walmart’s expansive number of transactions), search engines (eg, Google’s customization of individual searches based on previous web data), and politics (eg, a campaign’s focus of political advertisements on people most likely to support their candidate based on web searches). In this Viewpoint, we discuss the application of big data to health care, using an economic framework to highlight the opportunities it will offer and the roadblocks to implementation. We suggest that leveraging the collection of patient and practitioner data could be an important way to improve quality and efficiency of health care delivery. Widespread uptake of electronic health records (EHRs) has generated massive data sets. A survey by the American Hospital Association showed that adoption of EHRs has doubled from 2009 to 2011, partly a result of funding provided by the Health Information Technology for Economic and Clinical Health Act of 2009. Most EHRs now contain quantitative data (eg, laboratory values), qualitative data (eg, text-based documents and demographics), and transactional data (eg, a record of medication delivery). However, much of this rich data set is currently perceived as a byproduct of health care delivery, rather than a central asset to improve its efficiency. The transition of data from refuse to riches has been key in the big data revolution of other industries. Advances in analytic techniques in the computer sciences, especially in machine learning, have been a major catalyst for dealing with these large information sets. These analytic techniques are in contrast to traditional statistical methods (derived from the social and physical sciences), which are largely not useful for analysis of unstructured data such as text-based documents that do not fit into relational tables. One estimate suggests that 80% of business-related data exist in an unstructured format. The same could probably be said for health care data, a large proportion of which is text-based. In contrast to most consumer service industries, medicine adopted a practice of generating evidence from experimental (randomized trials) and quasi-experimental studies to inform patients and clinicians. The evidence-based movement is founded on the belief that scientific inquiry is superior to expert opinion and testimonials. In this way, medicine was ahead of many other industries in terms of recognizing the value of data and information guiding rational decision making. However, health care has lagged in uptake of newer techniques to leverage the rich information contained in EHRs. There are 4 ways big data may advance the economic mission of health care delivery by improving quality and efficiency. First, big data may greatly expand the capacity to generate new knowledge. The cost of answering many clinical questions prospectively, and even retrospectively, by collecting structured data is prohibitive. Analyzing the unstructured data contained within EHRs using computational techniques (eg, natural language processing to extract medical concepts from free-text documents) permits finer data acquisition in an automated fashion. For instance, automated identification within EHRs using natural language processing was superior in detecting postoperative complications compared with patient safety indicators based on discharge coding. Big data offers the potential to create an observational evidence base for clinical questions that would otherwise not be possible and may be especially helpful with issues of generalizability. The latter issue limits the application of conclusions derived from randomized trials performed on a narrow spectrum of participants to patients who exhibit very different characteristics. Second, big data may help with knowledge dissemination. Most physicians struggle to stay current with the latest evidence guiding clinical practice. The digitization of medical literature has greatly improved access; however, the sheer

...read moreread less

1,730 citations

...

Expand

Year	Papers
2026	3
2025	78
2024	112
2023	131
2022	219
2021	437

Topic Tools

Papers published on a yearly basis

Papers

Beyond the hype

Artificial intelligence in healthcare: past, present and future

The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

Collaborative Knowledge Base Embedding for Recommender Systems

The inevitable application of big data to health care.

Related Topics (5)

Performance Metrics