Tag

big data

Robust Machine Learning with ML Pipelines

From Data Analysis with Python and PySpark by Jonathan Rioux

This chapter covers using transformer and estimators to prepare data into ML features.

Big Data is Just a Lot of Small Data: using pandas UDF, part 2

From Data Analysis with Python and PySpark by Jonathan Rioux

This article covers

·         Using pandas Series UDF to accelerate column transformation compared to Python UDF.

·         Addressing the cold start of some UDF using Iterator of Series UDF.

Big Data is Just a Lot of Small Data: using pandas UDF

From Data Analysis with Python and PySpark by Jonathan Rioux

This article covers

·   Using pandas Series UDF to accelerate column transformation compared to Python UDF.

·   Addressing the cold start of some UDF using Iterator of Series UDF.

Your Data under a Different Lens: window functions

From Data Analysis with Python and PySpark by Jonathan Rioux

This article covers window functions and the kind of data transformation they enable.

How Fluentd fits into the Modern Software Landscape

In case you missed it, here is Phil Wilkins’ live Twitch coding stream recap. For more, check out the book: Logging in Action. For more live coding streams, subscribe to Manning’s Twitch channel here: https://www.twitch.tv/manningpublications

Managing Data Sources in Machine Learning

From Graph-Powered Machine Learning by Alessandro Negro

This article discusses managing data in graph-powered machine learning projects.

Creating a Bipartite Graph for a User-Item Dataset

By Graph-Powered Machine Learning Alessandro Negro

This article discusses creating a bigraph for a user-item dataset.

Processing Covid-19 Data with Apache Spark

In this video, Jean-Georges showcases how to use JHU data to predict new Covid-19 cases using Apache Spark.

Why Choose Azure for Data Engineering?

From Azure Storage, Streaming, and Batch Analytics by Richard Nuckolls

This article delves into Azure’s tools for data engineering and why you should consider using them.

Function Pipelines for Mapping Complex Transformations

From Mastering Large Datasets with Python by J.T. Wolohan

This article covers

· Using map to do complex data transformations

· Chaining together small functions into pipelines

· Applying these pipelines in parallel on large datasets

© 2023 Manning — Design Credits