HDInsight essentials learn how to build and deploy a modern big data architecture to empower your business

If you want to discover one of the latest tools designed to produce stunning Big Data insights, this book features everything you need to get to grips with your data. Whether you are a data architect, developer, or a business strategist, HDInsight adds value in everything from development, administr...

Descripción completa

Detalles Bibliográficos
Otros Autores: Nadipalli, Rajesh, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England : Packt Publishing 2015.
Edición:2nd ed
Colección:Professional Expertise Distilled
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629244806719
Tabla de Contenidos:
  • Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Hadoop and HDInsight in a Heartbeat; Data is everywhere; Business value of big data; Hadoop concepts; Brief history of Hadoop; Core components; Hadoop cluster layout; HDFS overview; Writing a file to HDFS; Reading a file from HDFS; HDFS basic commands; YARN overview; YARN application life cycle; YARN workloads; Hadoop distributions; HDInsight overview; HDInsight and Hadoop relationship; Hadoop on Windows deployment options; Microsoft Azure HDInsight Service
  • HDInsight EmulatorHortonworks Data Platform (HDP) for Windows; Summary; Chapter 2: Enterprise Data Lake using HDInsight; Enterprise Data Warehouse architecture; Source systems; Data warehouse; Storage; Processing; User access; Provisioning and monitoring; Data governance and security; Pain points of EDW; The next generation Hadoop-based Enterprise data architecture; Source systems; Data Lake; Storage; Processing; User access; Provisioning and monitoring; Data governance, security, and metadata; Journey to your Data Lake dream; Ingestion and organization; Transformation (rules driven)
  • Access, analyze, and reportTools and technology for Hadoop ecosystem; Use case powered by Microsoft HDInsight; Problem statement; Solution; Source systems; Storage; Processing; User access; Benefits; Summary; Chapter 3: HDInsight Service on Azure; Registering for an Azure account; Azure storage; Provisioning an HDInsight cluster; Cluster topology; Provisioning using Azure Powershell; Creating a storage container; Provisioning a new HDInsight cluster; HDInsight management dashboard; Dashboard; Monitor; Configuration; Exploring clusters using the remote desktop; Running a sample MapReduce
  • Deleting the clusterHDInsight Emulator for the development; Installing HDInsight Emulator; Installation verification; Using HDInsight Emulator; Summary; Chapter 4: Administering Your HDInsight Cluster; Monitoring cluster health; Name Node status; The Name Node Overview page; Datanode Status; Utilities and logs; Hadoop Service Availability; YARN Application Status; Azure storage management; Configuring your storage account; Monitoring your storage account; Managing access keys; Deleting your storage account; Azure Powershell; Access Azure Blob storage using Azure Powershell; Summary
  • Chapter 5: Ingest and Organize Data LakeEnd-to-end Data Lake solution; Ingesting to Data Lake using HDFS command; Connecting to a Hadoop client; Getting your files on the local storage; Transferring to HDFS; Loading data to Azure Blob storage using Azure PowerShell; Loading files to Data Lake using GUI tools; Storage access keys; Storage tools; CloudXplorer; Key benefits; Registering your storage account; Uploading files to your Blob storage; Using Sqoop to move data from RDBMS to Data Lake; Key benefits; Two modes of using Sqoop; Using Sqoop to import data (SQL to Hadoop)
  • Organizing your Data Lake in HDFS