From Spark in Action, Second Edition by Jean-Georges Perrin
This article teaches you how to perform an aggregation using Apache Spark. You first look at the definition of an aggregation. You may already know and use aggregations in your job, and this might be a reminder for you. If this is the case, you can safely skim through it: Apache Spark’s aggregations are standard. The second part of this section shows you how to transform a SQL aggregation statement to Spark.
From Machine Learning for Business by Doug Hudgeon and Richard Nichol
In this article, you’ll see how SageMaker and the Random Cut Forest algorithm can be used to create a model that will highlight the invoice lines that Brett should query with the law firm. The result will be a repeatable process that Brett can apply to every invoice that will keep the lawyers working for his bank on their toes and will save the bank hundreds of thousands of dollars per year. Off we go!
By Jean Georges Perrin This is the second in a series of 4 articles on the topic of ingesting data from files with Spark. This section deals with ingesting a JSON file.