From flat files to deconstructed databases the evolution and future of the big data ecosystem
"Julien Le Dem (WeWork) discusses the key open source components of the big data ecosystem--including Apache Calcite, Parquet, Arrow, Avro, and Kafka as well as batch and streaming systems--and explains how they relate to each other and how they make the ecosystem more of a database and less of...
Autor Corporativo: | |
---|---|
Otros Autores: | |
Formato: | Vídeo online |
Idioma: | Inglés |
Publicado: |
[Place of publication not identified] :
O'Reilly Media
2019.
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009822786306719 |
Sumario: | "Julien Le Dem (WeWork) discusses the key open source components of the big data ecosystem--including Apache Calcite, Parquet, Arrow, Avro, and Kafka as well as batch and streaming systems--and explains how they relate to each other and how they make the ecosystem more of a database and less of a filesystem. (Parquet is the columnar data layout to optimize data at rest for querying. Arrow is the in-memory representation for maximum throughput execution and overhead-free data exchange. Calcite is the optimizer to make the most of our infrastructure capabilities.) Julien also explores the emerging components that are still missing or haven't become standard yet to fully materialize the transformation to an extremely flexible database that lets you innovate with your data. This session was recorded at the 2019 O'Reilly Strata Data Conference in San Francisco."--Resource description page. |
---|---|
Notas: | Title from resource description page (Safari, viewed January 31, 2020). |
Descripción Física: | 1 online resource (1 streaming video file (43 min., 49 sec.)) : digital, sound, color |