From Data Analysis with Python and PySpark by Jonathan Rioux
This chapter covers using transformer and estimators to prepare data into ML features.
From Data Analysis with Python and PySpark by Jonathan Rioux
This article covers
· Using pandas Series UDF to accelerate column transformation compared to Python UDF.
· Addressing the cold start of some UDF using Iterator of Series UDF.
From Data Analysis with Python and PySpark by Jonathan Rioux
This article covers
· Using pandas Series UDF to accelerate column transformation compared to Python UDF.
· Addressing the cold start of some UDF using Iterator of Series UDF.
From Data Analysis with Python and PySpark by Jonathan Rioux
This article covers window functions and the kind of data transformation they enable.
In case you missed it, here is Phil Wilkins’ live Twitch coding stream recap. For more, check out the book: Logging in Action. For more live coding streams, subscribe to Manning’s Twitch channel here: https://www.twitch.tv/manningpublications
From Graph-Powered Machine Learning by Alessandro Negro
This article discusses managing data in graph-powered machine learning projects.
By Graph-Powered Machine Learning Alessandro Negro
This article discusses creating a bigraph for a user-item dataset.
In this video, Jean-Georges showcases how to use JHU data to predict new Covid-19 cases using Apache Spark.
From Azure Storage, Streaming, and Batch Analytics by Richard Nuckolls
This article delves into Azure’s tools for data engineering and why you should consider using them.
From Mastering Large Datasets with Python by J.T. Wolohan
This article covers
· Using map to do complex data transformations
· Chaining together small functions into pipelines
· Applying these pipelines in parallel on large datasets