Data Analysis with Python and PySpark

The Spark data processing engine is an amazing analytics factory: raw data comes in, insight comes out. PySpark wraps Spark's core engine with a Python-based API. It helps simplify Spark's steep learning curve and makes this powerful tool available to anyone working in the Python data ecos...

Descripción completa

Detalles Bibliográficos
Otros Autores: Rioux, Jonathan, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Shelter Island, New York : Manning Publications 2022.
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009655506606719
Descripción
Sumario:The Spark data processing engine is an amazing analytics factory: raw data comes in, insight comes out. PySpark wraps Spark's core engine with a Python-based API. It helps simplify Spark's steep learning curve and makes this powerful tool available to anyone working in the Python data ecosystem. "Data analysis with Python and PySpark" helps you solve the daily challenges of data science with PySpark. You'll learn how to scale your processing capabilities across multiple machine while ingesting data from any source -- whether that's Hadoop clusters, cloud data storage, or local data files. Once you've covered the fundamentals, you'll explore the full versatility of PySpark by building machine learning pipelines, and blending Python, pandas, and PySpark code.
Descripción Física:1 online resource (xix, 434 pages) : illustrations
ISBN:9781617297205