Technology Intelligence Analysis Based on Document Embedding Techniques for Oil and Gas Domain

doi:10.4043/29707-MS

Technology Intelligence Analysis Based on Document Embedding Techniques for Oil and Gas Domain

- 28 Oct 2019

3

TL;DR: The novelty of this proposed methodology is the possibility of exploring new insights when correlating different entities in a technology intelligence scenario for the Oil and Gas domain, using a simple yet efficient approach based on document embedding techniques.

Abstract: we propose a methodology based on document embedding techniques for applying Technology Intelligence Analysis in Oil and Gas (O&G) domain. We build a specialized corpus in O&G domain and train a Vector Space Model (VSM) to represent each document as a vector, in such a way that the distance between two vectors captures their semantic similarity. We explore different analysis on this VSM to infer relations between documents, in order to obtain new insights in a strategic context. this proposed methodology is based on Natural Language Processing (NLP) techniques to obtain strategic insights in a technology intelligence analysis scenario. It consists on generating a vector space model (VSM) induced from a domain-specific Oil and Gas corpus, composed of thousands of scientific articles collected from the Elsevier online database. We explore an approach to represent different entities - such as articles, authors and keywords - in the same vector space, making it possible to correlate them and infer relations of similarity based on their cosine distance. An evaluation metric is also provided in order to assist the training process and hyperparameters optimization. Oil and Gas highly technical vocabulary represents a challenge to NLP applications, in which some terms may assume a completely different meaning from the general - context domain. In this scenario, gathering an Oil and Gas corpus and training specialized vector space models for this specific domain allows increasing the quality in Technology Intelligence Analysis. The most significant finding is that we were able to explicit the semantic relationships between different entities of interest in the same VSM, also linking these relationships together with some additional metadata. An interesting application is to compare the publications of authors affiliated to two or more O&G companies at a given time. These non-trivial correlations are important to gain strategic insights considering a Technology Intelligence Analysis scenario. the novelty of this proposed methodology is the possibility of exploring new insights when correlating different entities in a technology intelligence scenario for the Oil and Gas domain, using a simple yet efficient approach based on document embedding techniques. This method applies some advanced NLP techniques to quickly process more than a hundred thousand documents in a few seconds, without requiring complex hardware resources, which would be impractical using traditional techniques.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Other•10.1002/9781119879893.refs

References

26 Aug 2022

Journal Article•10.1017/s0890060424000040

Finite-element analysis case retrieval based on an ontology semantic tree

Xuesong Xu, +4 more

- Ai Edam Artificial Intelligence for Engi...

TL;DR: A novel method for measuring semantic similarity between FEA cases based on an ontology semantic tree is proposed. The method utilizes named entity recognition technology and a multitree algorithm to retrieve relevant cases.

...read moreread less

Journal Article•10.1016/J.COMPIND.2020.103347

Portuguese word embeddings for the oil and gas industry: Development and evaluation

Diogo da Silva Magalhães Gomes, +9 more

- 01 Jan 2021

- Computers in Industry

TL;DR: In this paper, a representative set of word embedding models for the specific domain of oil and gas in Portuguese is proposed, and the results suggest that their domain-specific models outperformed the general model on their ability to represent specialized terminology.

...read moreread less

References

•Journal Article

Visualizing Data using t-SNE

Laurens van der Maaten, +1 more

- 01 Jan 2008

- Journal of Machine Learning Research

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

...read moreread less

45.8K

•Proceedings Article

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013

TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

...read moreread less

27.5K

Software Framework for Topic Modelling with Large Corpora

Radim Řehůřek, +1 more

- 22 May 2010

TL;DR: This work describes a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion, and implements several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation in a way that makes them completely independent of the training corpus size.

...read moreread less

4.7K

Journal Article•10.1109/MCI.2018.2840738

Recent Trends in Deep Learning Based Natural Language Processing [Review Article]

Tom Young, +3 more

- 20 Jul 2018

- IEEE Computational Intelligence Magazine

TL;DR: This paper reviews significant deep learning related models and methods that have been employed for numerous NLP tasks and provides a walk-through of their evolution.

...read moreread less

3.4K

•Journal Article•10.1613/JAIR.2934

From frequency to meaning: vector space models of semantics

Peter D. Turney, +1 more

- 01 Jan 2010

- Journal of Artificial Intelligence Resea...

TL;DR: The goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs, and to provide pointers into the literature for those who are less familiar with the field.

...read moreread less

3.3K