Data lakes

Detalles Bibliográficos
Otros Autores: Laurent, Anne, 1976- (-), Laurent, Dominique, Madera, Cédrine
Formato: Libro electrónico
Idioma:Inglés
Publicado: London : Hoboken : ISTE, Ltd. ; Wiley 2020.
Colección:Wiley ebooks.
Computer engineering series, databases and big data set ; 2.
Acceso en línea:Conectar con la versión electrónica
Ver en Universidad de Navarra:https://innopac.unav.es/record=b42152094*spi
Tabla de Contenidos:
  • Cover
  • Half-Title Page
  • Dedication
  • Title Page
  • Copyright Page
  • Contents
  • Preface
  • 1. Introduction to Data Lakes: Definitions and Discussions
  • 1.1. Introduction to data lakes
  • 1.2. Literature review and discussion
  • 1.3. The data lake challenges
  • 1.4. Data lakes versus decision-making systems
  • 1.5. Urbanization for data lakes
  • 1.6. Data lake functionalities
  • 1.7. Summary and concluding remarks
  • 2. Architecture of Data Lakes
  • 2.1. Introduction
  • 2.2. State of the art and practice
  • 2.2.1. Definition
  • 2.2.2. Architecture
  • 2.2.3. Metadata.
  • 2.2.4. Data quality
  • 2.2.5. Schema-on-read
  • 2.3. System architecture
  • 2.3.1. Ingestion layer
  • 2.3.2. Storage layer
  • 2.3.3. Transformation layer
  • 2.3.4. Interaction layer
  • 2.4. Use case: the Constance system
  • 2.4.1. System overview
  • 2.4.2. Ingestion layer
  • 2.4.3. Maintenance layer
  • 2.4.4. Query layer
  • 2.4.5. Data quality control
  • 2.4.6. Extensibility and flexibility
  • 2.5. Concluding remarks
  • 3. Exploiting Software Product Lines and Formal Concept Analysis for the Design of Data Lake Architectures
  • 3.1. Our expectations
  • 3.2. Modeling data lake functionalities.
  • 3.3. Building the knowledge base of industrial data lakes
  • 3.4. Our formalization approach
  • 3.5. Applying our approach
  • 3.6. Analysis of our first results
  • 3.7. Concluding remarks
  • 4. Metadata in Data Lake Ecosystems
  • 4.1. Definitions and concepts
  • 4.2. Classification of metadata by NISO
  • 4.2.1. Metadata schema
  • 4.2.2. Knowledge base and catalog
  • 4.3. Other categories of metadata
  • 4.3.1. Business metadata
  • 4.3.2. Navigational integration
  • 4.3.3. Operational metadata
  • 4.4. Sources of metadata
  • 4.5. Metadata classification
  • 4.6. Why metadata are needed.
  • 4.6.1. Selection of information (re)sources
  • 4.6.2. Organization of information resources
  • 4.6.3. Interoperability and integration
  • 4.6.4. Unique digital identification
  • 4.6.5. Data archiving and preservation
  • 4.7. Business value of metadata
  • 4.8. Metadata architecture
  • 4.8.1. Architecture scenario 1: point-to-point metadata architecture
  • 4.8.2. Architecture scenario 2: hub and spoke metadata architecture
  • 4.8.3. Architecture scenario 3: tool of record metadata architecture
  • 4.8.4. Architecture scenario 4: hybrid metadata architecture.
  • 4.8.5. Architecture scenario 5: federated metadata architecture
  • 4.9. Metadata management
  • 4.10. Metadata and data lakes
  • 4.10.1. Application and workload layer
  • 4.10.2. Data layer
  • 4.10.3. System layer
  • 4.10.4. Metadata types
  • 4.11. Metadata management in data lakes
  • 4.11.1. Metadata directory
  • 4.11.2. Metadata storage
  • 4.11.3. Metadata discovery
  • 4.11.4. Metadata lineage
  • 4.11.5. Metadata querying
  • 4.11.6. Data source selection
  • 4.12. Metadata and master data management
  • 4.13. Conclusion
  • 5. A Use Case of Data Lake Metadata Management
  • 5.1. Context.
  • 5.1.1. Data lake definition.