Engineering agile big-data systems
To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, th...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Gistrup, Denmark :
River Publishers
2018.
|
Edición: | 1st ed |
Colección: | River Publishers series in software engineering.
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009703334806719 |
Tabla de Contenidos:
- Front Cover
- Half Title Page
- RIVER PUBLISHERS SERIES IN SOFTWARE ENGINEERING
- Title Page
- Copyright Page
- Contents
- Preface
- Acknowledgements
- List of Contributors
- List of Figures
- List of Tables
- List of Abbreviations
- Chapter 1 - Introduction
- 1.1 State of the Art in Engineering Data-Intensive Systems
- 1.1.1 The Challenge
- 1.2 State of the Art in Semantics-Driven Software Engineering
- 1.2.1 The Challenge
- 1.3 State of the Art in Data Quality Engineering
- 1.3.1 The Challenge
- 1.4 About ALIGNED
- 1.5 ALIGNED Partners
- 1.5.1 Trinity College Dublin
- 1.5.2 Oxford University - Department of Computer Science
- 1.5.3 Oxford University - School of Anthropology and Museum Ethnography
- 1.5.4 University of Leipzig - Agile Knowledge Engineering and Semantic Web (AKSW)
- 1.5.5 Semantic Web Company
- 1.5.6 Wolters Kluwer Germany
- 1.5.7 Adam Mickiewicz University in Pozna´n
- 1.5.8 Wolters Kluwer Poland
- 1.6 Structure
- Chapter 2 - ALIGNED Use Cases - Data and SoftwareEngineering Challenges
- 2.1 Introduction
- 2.2 The ALIGNED Use Cases
- 2.2.1 Seshat: Global History Databank
- 2.2.2 PoolParty Enterprise Application Demonstrator System
- 2.2.3 DBpedia
- 2.2.4 Jurion and Jurion IPG
- 2.2.5 Health Data Management
- 2.3 The ALIGNED Use Cases and Data Life Cycle. Major Challenges and Offered Solutions
- 2.4 The ALIGNED Use Cases and Software Life Cycle. Major Challenges and Offered Solutions
- 2.5 Conclusions
- Chapter 3 - Methodology
- 3.1 Introduction
- 3.2 Software and Data Engineering Life Cycles
- 3.2.1 Software Engineering Life Cycle
- 3.2.2 Data Engineering Life Cycle
- 3.3 Software Development Processes
- 3.3.1 Model-Driven Approaches
- 3.3.2 Formal Techniques
- 3.3.3 Test-Driven Development
- 3.4 Integration Points and Harmonisation
- 3.4.1 Integration Points.
- 3.4.2 Barriers to Harmonisation
- 3.4.3 Methodology Requirements
- 3.5 An ALIGNED Methodology
- 3.5.1 A General Framework for Process Management
- 3.5.2 An Iterative Methodology and Illustration
- 3.6 Recommendations
- 3.6.1 Sample Methodology
- 3.7 Sample Synchronisation Point Activities
- 3.7.1 Model Catalogue: Analysis and Search/Browse/Explore
- 3.7.2 Model Catalogue: Design and Classify/Enrich
- 3.7.3 Semantic Booster: Implementation and Store/Query
- 3.7.4 Semantic Booster: Maintenance and Search/Browse/Explore
- 3.8 Summary
- 3.8.1 Related Work
- 3.9 Conclusions
- Chapter 4 - ALIGNED MetaModel Overview
- 4.1 Generic Metamodel
- 4.1.1 Basic Approach
- 4.1.2 Namespaces and URIs
- 4.1.3 Expressivity of Vocabularies
- 4.1.4 Reference Style for External Terms
- 4.1.5 Links with W3C PROV
- 4.2 ALIGNED Generic Metamodel
- 4.2.1 Design Intent Ontology (DIO)
- 4.3 Software Engineering
- 4.3.1 Software Life Cycle Ontology
- 4.3.2 Software Implementation Process Ontology (SIP)
- 4.4 Data Engineering
- 4.4.1 Data Life Cycle Ontology
- 4.5 DBpedia DataID (DataID)
- 4.6 Unified Quality Reports
- 4.6.1 Reasoning Violation Ontology (RVO) Overview
- 4.6.2 W3C SHACL Reporting Vocabulary
- 4.6.3 Data Quality Vocabulary
- 4.6.4 Test-Driven RDF Validation Ontology (RUT)
- 4.6.5 Enterprise Software Development (DIOPP)
- 4.6.6 Unified Governance Domain Ontologies
- 4.6.7 Semantic Booster and Model Catalogue Domain Ontology
- 4.6.7.1 Model catalogue
- 4.6.7.2 Booster
- 4.6.8 PROV16
- 4.6.9 SKOS17
- 4.6.10 OWL18
- 4.6.11 RDFS19
- 4.6.12 RDF20
- Chapter 5 - Tools
- 5.1 Model Catalogue
- 5.1.1 Introduction
- 5.1.2 Model Catalogue
- 5.1.2.1 Architecture
- 5.1.2.2 Searching and browsing the catalogue
- 5.1.2.3 Editing the catalogue contents
- 5.1.2.4 Administration.
- 5.1.2.5 Eclipse integration and model-driven development
- 5.1.2.6 Semantic reasoning
- 5.1.2.7 Automation and search
- 5.1.3 Semantic Booster
- 5.1.3.1 Introduction
- 5.1.3.2 Semantic Booster
- 5.2 RDFUnit
- 5.2.1 RDFUnit Integration
- 5.2.1.1 JUnit XML report-based integration
- 5.2.1.2 Custom apache maven-based integration
- 5.2.1.3 The shapes constraint language (SHACL)
- 5.2.1.4 Comparison of SHACL to schema definition usingRDFUnit test patterns
- 5.2.1.5 Comparison of SHACL to auto-generated RDFUnit testsfrom RDFS/OWL axioms
- 5.2.1.6 Progress on the SHACL specification andstandardisation process
- 5.2.1.7 SHACL support in RDFUnit
- 5.3 Expert Curation Tools and Workflows
- 5.3.1 Requirements
- 5.3.1.1 Graduated application of semantics
- 5.3.1.2 Graph - object mapping
- 5.3.1.3 Object/document level state management and versioning
- 5.3.1.4 Object-based workflow interfaces
- 5.3.1.5 Integrated, automated, constraint validation
- 5.3.1.6 Result interpretation
- 5.3.1.7 Deferred updates
- 5.3.2 Workflow/Process Models
- 5.3.2.1 Process model 1 - linked data object creation
- 5.3.2.2 Process model 2 object - linked data object updates
- 5.3.2.3 Process model 3 - updates to deferred updates
- 5.3.2.4 Process model 4 - schema updates
- 5.3.2.5 Process model 5 - validating schema updates
- 5.3.2.6 Process model 6 - named graph creation
- 5.3.2.7 Process model 7 - instance data updates and named graphs
- 5.4 Dacura Approval Queue Manager
- 5.5 Dacura Linked Data Object Viewer
- 5.5.1 CSP Design of Seshat Workflow Use Case
- 5.5.2 Specification
- 5.6 Dacura Quality Service
- 5.6.1 Technical Overview of Dacura Quality Service
- 5.6.2 Dacura Quality Service API
- 5.6.2.1 Resource and interchange format
- 5.6.2.2 URI
- 5.6.2.3 Literals
- 5.6.2.4 Literal types
- 5.6.2.5 Quads
- 5.6.2.6 POST variables.
- 5.6.2.7 Tests
- 5.6.2.8 Required schema tests
- 5.6.2.9 Schema tests
- 5.6.2.10 Errors
- 5.6.2.11 Endpoints
- 5.7 Linked Data Model Mapping
- 5.7.1 Interlink Validation Tool
- 5.7.1.1 Interlink validation
- 5.7.1.2 Technical overview
- 5.7.1.3 Configuration via iv config.txt
- 5.7.1.4 Configuration via external datasets.txt
- 5.7.1.5 Execute the interlink validator tool
- 5.7.2 Dacura Linked Model Mapper
- 5.7.3 Model Mapper Service
- 5.7.3.1 Modelling tool - creating mappings
- 5.7.3.2 Importing semi-structured data with data harvesting tool
- 5.8 Model-Driven Data Curation
- 5.8.1 Dacura Quality Service Frame Generation
- 5.8.2 Frames for UserInterface Design
- 5.8.3 SemiFormal Frame Specification
- 5.8.4 Frame API Endpoints
- Chapter 6 - Use Cases
- 6.1 Wolters Kluwer - Re-Engineering a Complex Relationa lDatabase Application
- 6.1.1 Introduction
- 6.1.2 Problem Statement
- 6.1.3 Actors
- 6.1.4 Implementation
- 6.1.4.1 PoolParty notification extension
- 6.1.4.2 rsine notification extension
- 6.1.4.2.1 Results
- 6.1.4.3 RDFUnit for data transformation
- 6.1.4.4 PoolParty external link validity
- 6.1.4.5 Statistical overview
- 6.1.5 Evaluation
- 6.1.5.1 Productivity
- 6.1.5.2 Quality
- 6.1.5.3 Agility
- 6.1.5.4 Measuring overall value
- 6.1.5.5 Data quality dimensions and thresholds
- 6.1.5.6 Model agility
- 6.1.5.7 Data agility
- 6.1.6 JURION IPG
- 6.1.6.1 Introduction
- 6.1.6.2 Architecture
- 6.1.6.3 Tools and features
- 6.1.6.4 Implementation
- 6.1.6.5 Evaluation
- 6.1.6.6 Experimental evaluation
- 6.2 Seshat - Collecting and Curating High-Value Datasets with the Dacura Platform
- 6.2.1 Use Case
- 6.2.1.1 Problem statement
- 6.2.2 Architecture
- 6.2.2.1 Tools and features
- 6.2.3 Implementation
- 6.2.3.1 Dacura data curation platform
- 6.2.3.2 General description
- 6.2.3.3 Detailed process.
- 6.2.4 Overview of the Model Catalogue
- 6.2.4.1 Model catalogue in the demonstrator system
- 6.2.5 Seshat Trial Platform Evaluation
- 6.2.5.1 Measuring overall value
- 6.2.5.2 Data quality dimensions and thresholds
- 6.3 Managing Data for the NHS
- 6.3.1 Introduction
- 6.3.2 Use Case
- 6.3.2.1 Quality
- 6.3.2.2 Agility
- 6.3.3 Architecture
- 6.3.4 Implementation
- 6.3.4.1 Model catalogue
- 6.3.4.2 NIHR health informatics collaborative
- 6.3.5 Evaluation
- 6.3.5.1 Productivity
- 6.3.5.2 Quality
- 6.3.5.3 Agility
- 6.4 Integrating Semantic Datasets into Enterprise Information Systems with PoolParty
- 6.4.1 Introduction
- 6.4.2 Problem Statement
- 6.4.2.1 Actors
- 6.4.3 Architecture
- 6.4.4 Implementation
- 6.4.4.1 Consistency violation detector
- 6.4.4.2 RDFUnit test generator
- 6.4.4.3 PoolParty integration
- 6.4.4.4 Notification adaptations
- 6.4.4.5 RDFUnit
- 6.4.4.6 Validation on import
- 6.4.5 Results
- 6.4.5.1 RDF constraints check
- 6.4.5.2 RDF validation
- 6.4.5.3 Improved notifications
- 6.4.5.4 Unified governance
- 6.4.6 Evaluation
- 6.4.6.1 Measuring overall value
- 6.4.6.2 Data quality dimensions and thresholds
- 6.4.6.3 Evaluation tasks
- 6.5 Data Validation at DBpedia
- 6.5.1 Introduction
- 6.5.2 Problem Statement
- 6.5.2.1 Actors
- 6.5.3 Architecture
- 6.5.4 Tools and Features
- 6.5.5 Implementation
- 6.5.6 Evaluation
- 6.5.6.1 Productivity
- 6.5.6.2 Quality
- 6.5.6.3 Agility
- Chapter 7 - Evaluation
- 7.1 Key Metrics for Evaluation
- 7.1.1 Productivity
- 7.1.2 Quality
- 7.1.3 Agility
- 7.1.4 Usability
- 7.2 ALIGNED Ethics Processes
- 7.3 Common Evaluation Framework
- 7.3.1 Productivity
- 7.3.2 Quality
- 7.3.3 Agility
- 7.4 ALIGNED Evaluation Ontology
- Appendix A - Requirements
- Index
- About the Editors
- Back Cover.