By Jean Georges Perrin This is the second in a series of 4 articles on the topic of ingesting data from files with Spark. This section deals with ingesting a JSON file.
From Spark with Java by Jean Georges Perrin
You’ve probably seen a simple use-case where Spark ingests data from a CSV file, then performs a simple operation, and then stores the result in the database. In this article, you’re going to see what happened behind the scenes.
Privacy, Twitter, and Machine Learning
Andrew Trask, author of Grokking Deep Learning
By Frances Lefkowitz, Manning Development Editor
Andrew Trask is a researcher pursuing a Doctorate at Oxford University, where he focuses on Deep Learning with an emphasis on human language. He is also a leader at OpenMined.org, an open-source community of researchers and developers working on creating free and accessible tools for secure AI. Previously, Andrew was analytics product manager at Digital Reasoning, where he trained the world’s largest artificial neural network (with over 160 billion parameters) and helped guide the analytics for the Synthesys cognitive computing platform, which tackles problems in government intelligence, finance, and healthcare. Grokking Deep Learning is his first book.
Find Andrew online at his blog (iamtrask.github.io) and @iamtrask on Twitter.
From Kafka Streams in Action by Bill Bejeck
This article discusses KSQL, a brand-new open source, Apache 2.0 streaming SQL engine that enables stream processing with Kafka. Basically, it makes it easy to read, write, and process streaming data in real-time, at scale, using SQL-like semantics.