Tag

data science

Robust Machine Learning with ML Pipelines

From Data Analysis with Python and PySpark by Jonathan Rioux

This chapter covers using transformer and estimators to prepare data into ML features.

Why Should You Program with Julia?

An excerpt from Julia as a Second Language by Erik Engheim

This article covers:

What type of problems Julia solves.
The limits of statically-typed languages.
Why the world needs a fast dynamically-typed language.
How Julia increases programmer productivity.

Read it if you’re interested in the Julia language and its strengths and weaknesses.

Clustering Data into Groups, Part 3

From Data Science Bookcamp by Leonard Apeltsin

This 3-part article series covers:

Clustering data by centrality
Clustering data by density
Trade-offs between clustering algorithms
Executing clustering using the scikit-learn library
Iterating over clusters using Pandas

Clustering Data into Groups, Part 2

From Data Science Bookcamp by Leonard Apeltsin

This 3-part article series covers:

Clustering data by centrality
Clustering data by density
Trade-offs between clustering algorithms
Executing clustering using the scikit-learn library
Iterating over clusters using Pandas

Clustering Data into Groups, Part 1

From Data Science Bookcamp by Leonard Apeltsin

This 3-part article series covers:

Clustering data by centrality
Clustering data by density
Trade-offs between clustering algorithms
Executing clustering using the scikit-learn library
Iterating over clusters using Pandas

Big Data is Just a Lot of Small Data: using pandas UDF, part 2

From Data Analysis with Python and PySpark by Jonathan Rioux

This article covers

·         Using pandas Series UDF to accelerate column transformation compared to Python UDF.

·         Addressing the cold start of some UDF using Iterator of Series UDF.

Big Data is Just a Lot of Small Data: using pandas UDF

From Data Analysis with Python and PySpark by Jonathan Rioux

This article covers

·   Using pandas Series UDF to accelerate column transformation compared to Python UDF.

·   Addressing the cold start of some UDF using Iterator of Series UDF.

Cleaning Data

From Pandas Workout by Reuven Lerner

This article discusses cleaning data to use with Pandas.

Your Data under a Different Lens: window functions

From Data Analysis with Python and PySpark by Jonathan Rioux

This article covers window functions and the kind of data transformation they enable.

Ask Dr. Chong: become a leader in data science part 1

In case you missed it, here is Jike Chong and Yue Cathy Chang’s live Twitch coding stream recap. For more, check out the book: How to Lead in Data Science. For more live coding streams, subscribe to Manning’s Twitch channel… Continue Reading →

© 2022 Manning — Design Credits