From Data Analysis with Python and PySpark by Jonathan Rioux
This chapter covers using transformer and estimators to prepare data into ML features.
From Data Analysis with Python and PySpark by Jonathan Rioux
This article covers
· Using pandas Series UDF to accelerate column transformation compared to Python UDF.
· Addressing the cold start of some UDF using Iterator of Series UDF.
From Data Analysis with Python and PySpark by Jonathan Rioux
This article covers
· Using pandas Series UDF to accelerate column transformation compared to Python UDF.
· Addressing the cold start of some UDF using Iterator of Series UDF.
From Data Analysis with Python and PySpark by Jonathan Rioux
This article covers window functions and the kind of data transformation they enable.
From Designing Cloud Data Platforms by Danil Zburivsky and Lynda Partner
In this article, we’ll layer some of the critical and more advanced functionality needed for most data platforms today. Without this added layer of sophistication your data platform would work but it wouldn’t scale easily, nor would it meet the growing data velocity challenges. It would also be limited in terms of the types of data consumers (people and systems who consume the data from the platform) it supports, as they’re also growing in both numbers and variety.
From Making Sense of Edge Computing by Cody Bumgardner
Conceptually, edge computing is concerned with when it’s best to migrate computational functionally toward source of data and when it is best to move the data itself. This abstract concept of function versus data migration drives not only the fundamental motivations of edge computing, but also the broader field of distributed systems. The act of distributing processes makes even the simplest tasks more complicated.