Six Questions for Jesse C. Daniel, author of Data Science with Python and Dask
From Spark in Action, Second Edition by Jean-Georges Perrin
This article teaches you how to perform an aggregation using Apache Spark. You first look at the definition of an aggregation. You may already know and use aggregations in your job, and this might be a reminder for you. If this is the case, you can safely skim through it: Apache Spark’s aggregations are standard. The second part of this section shows you how to transform a SQL aggregation statement to Spark.
From Mastering Large Datasets with Python by J.T. Wolohan
This article covers
· Using map to do complex data transformations
· Chaining together small functions into pipelines
· Applying these pipelines in parallel on large datasets