Tag

spark

Robust Machine Learning with ML Pipelines

From Data Analysis with Python and PySpark by Jonathan Rioux

This chapter covers using transformer and estimators to prepare data into ML features.

Big Data is Just a Lot of Small Data: using pandas UDF, part 2

From Data Analysis with Python and PySpark by Jonathan Rioux

This article covers

·         Using pandas Series UDF to accelerate column transformation compared to Python UDF.

·         Addressing the cold start of some UDF using Iterator of Series UDF.

Big Data is Just a Lot of Small Data: using pandas UDF

From Data Analysis with Python and PySpark by Jonathan Rioux

This article covers

·   Using pandas Series UDF to accelerate column transformation compared to Python UDF.

·   Addressing the cold start of some UDF using Iterator of Series UDF.

Your Data under a Different Lens: window functions

From Data Analysis with Python and PySpark by Jonathan Rioux

This article covers window functions and the kind of data transformation they enable.

Processing Covid-19 Data with Apache Spark

In this video, Jean-Georges showcases how to use JHU data to predict new Covid-19 cases using Apache Spark.

Aggregating Your Data with Spark

From Spark in Action, Second Edition by Jean-Georges Perrin

Consuming records with Spark

From Spark in Action, Second Edition by Jean Georges Perrin

This article explores consuming records in files with Spark.

The Inner Workings of Spark

spark_in_act

From Spark in Action, Second Edition by Jean George Perrin

Ingesting Data from Files with Spark, Part 3

By Jean Georges Perrin

This is the third in a series of 4 articles on the topic of ingesting data from files with Spark. This section deals with ingesting a XML file.

Ingesting Data from Files with Spark, Part 2

By Jean Georges Perrin This is the second in a series of 4 articles on the topic of ingesting data from files with Spark. This section deals with ingesting a JSON file.

© 2023 Manning — Design Credits