From Data Analysis with Python and PySpark by Jonathan Rioux
This chapter covers using transformer and estimators to prepare data into ML features.
From Distributed Machine Learning Patterns by Yuan Tang
In this article, we introduce the collective communication pattern, which is a great alternative to parameter servers when the machine learning model we are building is not too large without having to tune the ratio between the number of workers and parameter servers.
From Data Analysis with Python and PySpark by Jonathan Rioux
This article covers
· Using pandas Series UDF to accelerate column transformation compared to Python UDF.
· Addressing the cold start of some UDF using Iterator of Series UDF.
From Data Analysis with Python and PySpark by Jonathan Rioux
This article covers
· Using pandas Series UDF to accelerate column transformation compared to Python UDF.
· Addressing the cold start of some UDF using Iterator of Series UDF.
From Distributed Machine Learning Patterns by Yuan Tang
In this article, we introduce the parameter server pattern which comes handy for situations where the model is too large to fit in a single machine such as one we would have to build for tagging entities in the 8 millions of YouTube videos.
From Data Analysis with Python and PySpark by Jonathan Rioux
This article covers window functions and the kind of data transformation they enable.