Engineering agile big-data systems

To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, th...

Descripción completa

Detalles Bibliográficos
Otros Autores: Feeney, Kevin, editor (editor)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Gistrup, Denmark : River Publishers 2018.
Edición:1st ed
Colección:River Publishers series in software engineering.
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009703334806719
Tabla de Contenidos:
  • Front Cover
  • Half Title Page
  • RIVER PUBLISHERS SERIES IN SOFTWARE ENGINEERING
  • Title Page
  • Copyright Page
  • Contents
  • Preface
  • Acknowledgements
  • List of Contributors
  • List of Figures
  • List of Tables
  • List of Abbreviations
  • Chapter 1 - Introduction
  • 1.1 State of the Art in Engineering Data-Intensive Systems
  • 1.1.1 The Challenge
  • 1.2 State of the Art in Semantics-Driven Software Engineering
  • 1.2.1 The Challenge
  • 1.3 State of the Art in Data Quality Engineering
  • 1.3.1 The Challenge
  • 1.4 About ALIGNED
  • 1.5 ALIGNED Partners
  • 1.5.1 Trinity College Dublin
  • 1.5.2 Oxford University - Department of Computer Science
  • 1.5.3 Oxford University - School of Anthropology and Museum Ethnography
  • 1.5.4 University of Leipzig - Agile Knowledge Engineering and Semantic Web (AKSW)
  • 1.5.5 Semantic Web Company
  • 1.5.6 Wolters Kluwer Germany
  • 1.5.7 Adam Mickiewicz University in Pozna´n
  • 1.5.8 Wolters Kluwer Poland
  • 1.6 Structure
  • Chapter 2 - ALIGNED Use Cases - Data and SoftwareEngineering Challenges
  • 2.1 Introduction
  • 2.2 The ALIGNED Use Cases
  • 2.2.1 Seshat: Global History Databank
  • 2.2.2 PoolParty Enterprise Application Demonstrator System
  • 2.2.3 DBpedia
  • 2.2.4 Jurion and Jurion IPG
  • 2.2.5 Health Data Management
  • 2.3 The ALIGNED Use Cases and Data Life Cycle. Major Challenges and Offered Solutions
  • 2.4 The ALIGNED Use Cases and Software Life Cycle. Major Challenges and Offered Solutions
  • 2.5 Conclusions
  • Chapter 3 - Methodology
  • 3.1 Introduction
  • 3.2 Software and Data Engineering Life Cycles
  • 3.2.1 Software Engineering Life Cycle
  • 3.2.2 Data Engineering Life Cycle
  • 3.3 Software Development Processes
  • 3.3.1 Model-Driven Approaches
  • 3.3.2 Formal Techniques
  • 3.3.3 Test-Driven Development
  • 3.4 Integration Points and Harmonisation
  • 3.4.1 Integration Points.
  • 3.4.2 Barriers to Harmonisation
  • 3.4.3 Methodology Requirements
  • 3.5 An ALIGNED Methodology
  • 3.5.1 A General Framework for Process Management
  • 3.5.2 An Iterative Methodology and Illustration
  • 3.6 Recommendations
  • 3.6.1 Sample Methodology
  • 3.7 Sample Synchronisation Point Activities
  • 3.7.1 Model Catalogue: Analysis and Search/Browse/Explore
  • 3.7.2 Model Catalogue: Design and Classify/Enrich
  • 3.7.3 Semantic Booster: Implementation and Store/Query
  • 3.7.4 Semantic Booster: Maintenance and Search/Browse/Explore
  • 3.8 Summary
  • 3.8.1 Related Work
  • 3.9 Conclusions
  • Chapter 4 - ALIGNED MetaModel Overview
  • 4.1 Generic Metamodel
  • 4.1.1 Basic Approach
  • 4.1.2 Namespaces and URIs
  • 4.1.3 Expressivity of Vocabularies
  • 4.1.4 Reference Style for External Terms
  • 4.1.5 Links with W3C PROV
  • 4.2 ALIGNED Generic Metamodel
  • 4.2.1 Design Intent Ontology (DIO)
  • 4.3 Software Engineering
  • 4.3.1 Software Life Cycle Ontology
  • 4.3.2 Software Implementation Process Ontology (SIP)
  • 4.4 Data Engineering
  • 4.4.1 Data Life Cycle Ontology
  • 4.5 DBpedia DataID (DataID)
  • 4.6 Unified Quality Reports
  • 4.6.1 Reasoning Violation Ontology (RVO) Overview
  • 4.6.2 W3C SHACL Reporting Vocabulary
  • 4.6.3 Data Quality Vocabulary
  • 4.6.4 Test-Driven RDF Validation Ontology (RUT)
  • 4.6.5 Enterprise Software Development (DIOPP)
  • 4.6.6 Unified Governance Domain Ontologies
  • 4.6.7 Semantic Booster and Model Catalogue Domain Ontology
  • 4.6.7.1 Model catalogue
  • 4.6.7.2 Booster
  • 4.6.8 PROV16
  • 4.6.9 SKOS17
  • 4.6.10 OWL18
  • 4.6.11 RDFS19
  • 4.6.12 RDF20
  • Chapter 5 - Tools
  • 5.1 Model Catalogue
  • 5.1.1 Introduction
  • 5.1.2 Model Catalogue
  • 5.1.2.1 Architecture
  • 5.1.2.2 Searching and browsing the catalogue
  • 5.1.2.3 Editing the catalogue contents
  • 5.1.2.4 Administration.
  • 5.1.2.5 Eclipse integration and model-driven development
  • 5.1.2.6 Semantic reasoning
  • 5.1.2.7 Automation and search
  • 5.1.3 Semantic Booster
  • 5.1.3.1 Introduction
  • 5.1.3.2 Semantic Booster
  • 5.2 RDFUnit
  • 5.2.1 RDFUnit Integration
  • 5.2.1.1 JUnit XML report-based integration
  • 5.2.1.2 Custom apache maven-based integration
  • 5.2.1.3 The shapes constraint language (SHACL)
  • 5.2.1.4 Comparison of SHACL to schema definition usingRDFUnit test patterns
  • 5.2.1.5 Comparison of SHACL to auto-generated RDFUnit testsfrom RDFS/OWL axioms
  • 5.2.1.6 Progress on the SHACL specification andstandardisation process
  • 5.2.1.7 SHACL support in RDFUnit
  • 5.3 Expert Curation Tools and Workflows
  • 5.3.1 Requirements
  • 5.3.1.1 Graduated application of semantics
  • 5.3.1.2 Graph - object mapping
  • 5.3.1.3 Object/document level state management and versioning
  • 5.3.1.4 Object-based workflow interfaces
  • 5.3.1.5 Integrated, automated, constraint validation
  • 5.3.1.6 Result interpretation
  • 5.3.1.7 Deferred updates
  • 5.3.2 Workflow/Process Models
  • 5.3.2.1 Process model 1 - linked data object creation
  • 5.3.2.2 Process model 2 object - linked data object updates
  • 5.3.2.3 Process model 3 - updates to deferred updates
  • 5.3.2.4 Process model 4 - schema updates
  • 5.3.2.5 Process model 5 - validating schema updates
  • 5.3.2.6 Process model 6 - named graph creation
  • 5.3.2.7 Process model 7 - instance data updates and named graphs
  • 5.4 Dacura Approval Queue Manager
  • 5.5 Dacura Linked Data Object Viewer
  • 5.5.1 CSP Design of Seshat Workflow Use Case
  • 5.5.2 Specification
  • 5.6 Dacura Quality Service
  • 5.6.1 Technical Overview of Dacura Quality Service
  • 5.6.2 Dacura Quality Service API
  • 5.6.2.1 Resource and interchange format
  • 5.6.2.2 URI
  • 5.6.2.3 Literals
  • 5.6.2.4 Literal types
  • 5.6.2.5 Quads
  • 5.6.2.6 POST variables.
  • 5.6.2.7 Tests
  • 5.6.2.8 Required schema tests
  • 5.6.2.9 Schema tests
  • 5.6.2.10 Errors
  • 5.6.2.11 Endpoints
  • 5.7 Linked Data Model Mapping
  • 5.7.1 Interlink Validation Tool
  • 5.7.1.1 Interlink validation
  • 5.7.1.2 Technical overview
  • 5.7.1.3 Configuration via iv config.txt
  • 5.7.1.4 Configuration via external datasets.txt
  • 5.7.1.5 Execute the interlink validator tool
  • 5.7.2 Dacura Linked Model Mapper
  • 5.7.3 Model Mapper Service
  • 5.7.3.1 Modelling tool - creating mappings
  • 5.7.3.2 Importing semi-structured data with data harvesting tool
  • 5.8 Model-Driven Data Curation
  • 5.8.1 Dacura Quality Service Frame Generation
  • 5.8.2 Frames for UserInterface Design
  • 5.8.3 SemiFormal Frame Specification
  • 5.8.4 Frame API Endpoints
  • Chapter 6 - Use Cases
  • 6.1 Wolters Kluwer - Re-Engineering a Complex Relationa lDatabase Application
  • 6.1.1 Introduction
  • 6.1.2 Problem Statement
  • 6.1.3 Actors
  • 6.1.4 Implementation
  • 6.1.4.1 PoolParty notification extension
  • 6.1.4.2 rsine notification extension
  • 6.1.4.2.1 Results
  • 6.1.4.3 RDFUnit for data transformation
  • 6.1.4.4 PoolParty external link validity
  • 6.1.4.5 Statistical overview
  • 6.1.5 Evaluation
  • 6.1.5.1 Productivity
  • 6.1.5.2 Quality
  • 6.1.5.3 Agility
  • 6.1.5.4 Measuring overall value
  • 6.1.5.5 Data quality dimensions and thresholds
  • 6.1.5.6 Model agility
  • 6.1.5.7 Data agility
  • 6.1.6 JURION IPG
  • 6.1.6.1 Introduction
  • 6.1.6.2 Architecture
  • 6.1.6.3 Tools and features
  • 6.1.6.4 Implementation
  • 6.1.6.5 Evaluation
  • 6.1.6.6 Experimental evaluation
  • 6.2 Seshat - Collecting and Curating High-Value Datasets with the Dacura Platform
  • 6.2.1 Use Case
  • 6.2.1.1 Problem statement
  • 6.2.2 Architecture
  • 6.2.2.1 Tools and features
  • 6.2.3 Implementation
  • 6.2.3.1 Dacura data curation platform
  • 6.2.3.2 General description
  • 6.2.3.3 Detailed process.
  • 6.2.4 Overview of the Model Catalogue
  • 6.2.4.1 Model catalogue in the demonstrator system
  • 6.2.5 Seshat Trial Platform Evaluation
  • 6.2.5.1 Measuring overall value
  • 6.2.5.2 Data quality dimensions and thresholds
  • 6.3 Managing Data for the NHS
  • 6.3.1 Introduction
  • 6.3.2 Use Case
  • 6.3.2.1 Quality
  • 6.3.2.2 Agility
  • 6.3.3 Architecture
  • 6.3.4 Implementation
  • 6.3.4.1 Model catalogue
  • 6.3.4.2 NIHR health informatics collaborative
  • 6.3.5 Evaluation
  • 6.3.5.1 Productivity
  • 6.3.5.2 Quality
  • 6.3.5.3 Agility
  • 6.4 Integrating Semantic Datasets into Enterprise Information Systems with PoolParty
  • 6.4.1 Introduction
  • 6.4.2 Problem Statement
  • 6.4.2.1 Actors
  • 6.4.3 Architecture
  • 6.4.4 Implementation
  • 6.4.4.1 Consistency violation detector
  • 6.4.4.2 RDFUnit test generator
  • 6.4.4.3 PoolParty integration
  • 6.4.4.4 Notification adaptations
  • 6.4.4.5 RDFUnit
  • 6.4.4.6 Validation on import
  • 6.4.5 Results
  • 6.4.5.1 RDF constraints check
  • 6.4.5.2 RDF validation
  • 6.4.5.3 Improved notifications
  • 6.4.5.4 Unified governance
  • 6.4.6 Evaluation
  • 6.4.6.1 Measuring overall value
  • 6.4.6.2 Data quality dimensions and thresholds
  • 6.4.6.3 Evaluation tasks
  • 6.5 Data Validation at DBpedia
  • 6.5.1 Introduction
  • 6.5.2 Problem Statement
  • 6.5.2.1 Actors
  • 6.5.3 Architecture
  • 6.5.4 Tools and Features
  • 6.5.5 Implementation
  • 6.5.6 Evaluation
  • 6.5.6.1 Productivity
  • 6.5.6.2 Quality
  • 6.5.6.3 Agility
  • Chapter 7 - Evaluation
  • 7.1 Key Metrics for Evaluation
  • 7.1.1 Productivity
  • 7.1.2 Quality
  • 7.1.3 Agility
  • 7.1.4 Usability
  • 7.2 ALIGNED Ethics Processes
  • 7.3 Common Evaluation Framework
  • 7.3.1 Productivity
  • 7.3.2 Quality
  • 7.3.3 Agility
  • 7.4 ALIGNED Evaluation Ontology
  • Appendix A - Requirements
  • Index
  • About the Editors
  • Back Cover.