Proceedings Article10.1109/CYBER.2015.7288049
Managing data lakes in big data era: What's a data lake and why has it became popular in data management ecosystem
Huang Fang
- 08 Jun 2015
- pp 820-824
191
TL;DR: The concept of a data lake is emerging as a popular way to organize and build the next generation of systems to master new big data challenges, but there are lots of concerns and questions for large enterprises to implement data lakes.
read more
Abstract: The concept of a data lake is emerging as a popular way to organize and build the next generation of systems to master new big data challenges, but there are lots of concerns and questions for large enterprises to implement data lakes. The paper discusses the concept of data lakes and shares the author's thoughts and practices of data lakes.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Open Data Lake to Support Machine Learning on Arctic Big Data
Anifat M. Olawoyin,Carson K. Leung,Alfredo Cuzzocrea +2 more
TL;DR: This paper proposes a conceptual model for an open data lake to integrate diverse Arctic big data, utilizing a data-driven approach to manage structured, semi-structured, and unstructured data, and supports machine learning applications on these data.
AIRM: A New AI Recruiting Model for the Saudi Arabia Labor Market
monirah Ali aleisa,Natalia Beloff,Martin White +2 more
- 02 Sep 2021
TL;DR: In this paper, a new data storage technology approach, and a new Artificial Intelligence architecture, with three layers to extract relevant information from data of both recruiters and job seekers by exploiting machine learning, in particular clustering algorithms to group data points, natural language processing to convert text to numerical representations, and recurrent neural networks to produce matching keywords, and equations to generate a similarity score.
Towards a More Generic and Elastic Metadata Management Model in a Data Lake Environment
Safiatou Sore,Frédéric Ouédraogo,Moustapha Bikienga,Yaya Traoré +3 more
- 02 Feb 2024
TL;DR: A generic and scalable metadata management model for data lakes is presented to address the challenges associated with the "data swamp". The model promotes generality and scalability by dynamically provisioning and resizing resources based on demand.
Data lakes versus data warehouses: choosing the right approach for big data analytics
Saliha Mezzoudj,Meriem Khelifa,Yassmina Saadna +2 more
Abstract: Abstract In the era of big data, organizations face critical decisions when selecting between data lakes and data warehouses to meet their analytics requirements. This article presents a comprehensive comparative analysis of these two predominant data management architectures, emphasizing their structural differences, functional capabilities, and suitability for diverse analytics workloads. Data lakes offer scalable, cost-effective storage for raw, unstructured, and semi-structured data, supporting advanced analytics and machine learning applications. In contrast, data warehouses provide optimized, schema-on-write frameworks for fast querying and reliable reporting on structured data. Through detailed examination of architectural designs, integration with big data tools including Hadoop, Spark, and Kafka, and evaluations based on performance, scalability, cost, and governance, this paper provides organizations with evidence-based guidance to align their data strategies with business objectives. Case studies from healthcare and retail sectors illustrate practical implications of each approach, while emerging trends such as lakehouse architectures, AI integration, blockchain security, edge computing, and quantum computing highlight future directions. The findings support for a hybrid data management solution that leverages the strengths of both data lakes and warehouses to enable robust, scalable, and innovative big data analytics.
A Review on Data Lake
Shravan R Poojary,Pradeep Nayak,Shravitha,Shreya Rai,Shrujan Kumar,Chandana N M +5 more
TL;DR: One of the contentious ideas to emerge in the big data era, data lakes have the ability to alter the data environment and it is worthwhile to conduct research on them.
Related Papers (5)
Rihan Hai,Sandra Geisler,Christoph Quix +2 more
- 26 Jun 2016
Coral Walker,Hassan H. Alrehamy +1 more
- 26 Aug 2015
Cédrine Madera,Anne Laurent +1 more
- 01 Nov 2016