Subject

Data

A Deep Learning System from an Engineer’s Perspective

From Engineering Deep Learning Systems by Chi Wang and Donald Szeto

This article presents what prospective readers can expect to learn from this book and why you should learn it.

Read it if you’re a software developer interested in transitioning your skills to the field of deep learning system design or an engineering-minded data scientist who want to build more effective delivery pipelines.

Robust Machine Learning with ML Pipelines

From Data Analysis with Python and PySpark by Jonathan Rioux

This chapter covers using transformer and estimators to prepare data into ML features.

Why Should You Program with Julia?

An excerpt from Julia as a Second Language by Erik Engheim

This article covers:

What type of problems Julia solves.
The limits of statically-typed languages.
Why the world needs a fast dynamically-typed language.
How Julia increases programmer productivity.

Read it if you’re interested in the Julia language and its strengths and weaknesses.

What Is a Data Mesh and What Is It Used for?

An excerpt from Data Mesh in Action by Jacek Majchrzak, Sven Balnojan, and Marian Siwiak

This excerpt covers

●      What is a “Data Mesh”? Our definition of a Data Mesh

●      What are the key concepts of the Data Mesh paradigm?

●      What are the advantages of the Data Mesh?

Training and Deployment Pipeline, Part 2

From Deep Learning Patterns and Practices by Andrew Ferlitsch

This article covers:

Feeding models training data in a production environment.
Scheduling for continuous retraining.
Using version control and evaluating models before and after deployment.
Deploying models for large scale on-demand and batch requests, in both monolithic and distributed deployments.

Training and Deployment Pipeline

From Deep Learning Patterns and Practices by Andrew Ferlitsch

This article covers:

●      Feeding models training data in a production environment.

●      Scheduling for continuous retraining.

●      Using version control and evaluating models before and after deployment.

●      Deploying models for large scale on-demand and batch requests, in both monolithic and distributed deployments.

Collective Communication Pattern: Improving Performance When Parameter Servers Become a Bottleneck

From Distributed Machine Learning Patterns by Yuan Tang

In this article, we introduce the collective communication pattern, which is a great alternative to parameter servers when the machine learning model we are building is not too large without having to tune the ratio between the number of workers and parameter servers.

Clustering Data into Groups, Part 3

From Data Science Bookcamp by Leonard Apeltsin

This 3-part article series covers:

Clustering data by centrality
Clustering data by density
Trade-offs between clustering algorithms
Executing clustering using the scikit-learn library
Iterating over clusters using Pandas

Clustering Data into Groups, Part 2

From Data Science Bookcamp by Leonard Apeltsin

This 3-part article series covers:

Clustering data by centrality
Clustering data by density
Trade-offs between clustering algorithms
Executing clustering using the scikit-learn library
Iterating over clusters using Pandas

Clustering Data into Groups, Part 1

From Data Science Bookcamp by Leonard Apeltsin

This 3-part article series covers:

Clustering data by centrality
Clustering data by density
Trade-offs between clustering algorithms
Executing clustering using the scikit-learn library
Iterating over clusters using Pandas

© 2022 Manning — Design Credits