From Causal Machine Learning by Robert Ness

Enhance machine learning with causal reasoning to get more robust and explainable outcomes. Power causal inference with machine learning to create next gen AI..There has never been a better time to get into building causal AI.

Read on for more.

Take 25% off Causal Machine Learning by entering fccness into the discount code box at checkout at

What is causal AI?

Causal reasoning is a crucial element to how humans understand, explain, and make decisions about the world.  Causal AI means both improving machine learning with causal reasoning, and automating causal reasoning with machine learning.  Today’s learning machines have superhuman prediction ability but aren’t particularly good at causal reasoning, even when we train them on obscenely large amounts of data.  In this book, you will learn how to write algorithms that capture causal reasoning in the context of machine learning and automated data science.

Though humans rely heavily on causal reasoning to navigate the world, our cognitive biases make our causal inferences highly error-prone.  We developed empiricism, the scientific method, and experimental statistics to address our tendencies to make errors in causal inference tasks such as finding and validating causal relations, distinguishing causality from mere correlation, and predicting the consequences of actions, decisions, and policies.  Yet even empiricism still requires humans to interpret and explain observational data (data we observe in passing).  The way we interpret causality from experiments is also error-prone.  Causal AI attempts to use statistics, probability, and computer science to help us surpass these errors in our reasoning.

The difficulty of answering causal questions has motivated the work of millennia of philosophers, centuries of scientists, and decades of statisticians. But now, a convergence of statistical and computational advances has shifted the focus from discourse to algorithms that we can train on data and deploy to software.  It is now a fascinating time to learn how to build causal AI.

Why should I or my team care about causal data science and AI?

I want to present some high-level reasons motivating the study of causal modeling.  These reasons apply to researchers, independent contributors, and managers working on data science, machine learning, and other domains of data-driven automated decision-making in general.

Better data science

Organizations in big tech and tech-powered retail have realized the importance of causal inference and are paying big salaries to people with a causal inference skill set.  The main reason is that the goal of data science – extracting actionable insights from data – is a causal task.  Causal modeling helps the data scientist achieve that goal in several ways.

Simulated experiments and causal effect inference

Causal effect inference – quantifying how much a cause (e.g. a promotion) affects an effect (e.g. sales) is the most common goal of applied data science. The gold standard for causal effect inference is randomized experiments, such as an A/B test. The concepts of causal inference explain why randomized experiments work so well; randomization eliminates non-causal sources of statistical correlation.

Figure 1. Causal data science is a valuable tool no matter how small or big your data or how easy it is to run experiments.

Counterfactual Data science

Counterfactual questions have the form, “given what happened, what would have happened if things had been different?” Causal modeling provides a logical way to predict counterfactual outcomes.  Data science that can infer counterfactuals can answer critical business questions more directly.

Better attribution, credit assignment, and root cause Analysis

The “attribution problem” in marketing is perhaps best articulated by a quote attributed to entrepreneur and advertising pioneer John Wanamaker:

Half the money I spend on advertising is wasted; the trouble is I don’t know which half.

In other words, it is difficult to know what advertisement, promotion, or other action caused a specific customer behavior, sales number, or another key business outcome. Even in online marketing, where the data has gotten much richer and more granular than in Wanamaker’s time, attribution remains a challenge. For example, a user may have clicked after seeing an ad, but was it that single ad view that led to the click, or was there a cumulative effect of all the nudges to click they received over multiple channels?  Causal modeling addresses the attribution problem by using formal causal logic to answer “why” questions, such as “why did this user click?”

Attribution goes by other names in other domains, such as “credit assignment” and “root cause analysis.”  The core meaning is the same; we want to understand why a particular event outcome happened.

More explainable Machine Learning

The behavior of modern machine learning behavior can be hard to explain.  Explicability is particularly important in the context of business and engineering.  If your team deploys a predictive algorithm and it behaves in a way that hurts your business, you don’t want to be stuck spouting machine learning technobabble and handwaving when your boss asks you what happened.  You want a concise explanation that hopefully suggests ways to avoid the problem in the future.  As an engineer, you want that explanation distilled down to a concise bug report that shows in simple terms the nature of the error, what the correct output should have been, what inputs will reproduce the error, and where the code logic starts to go awry given those inputs.  Armed with that explanation of the issue, you can efficiently fix the problem.

Explicability also matters to third-party users of a machine learning-based service.  For example, suppose a product feature presents a user with a recommendation.  That user could need to know why the feature made them a particular recommendation.  An explanation is an essential element in providing recourse so the user can get better results in the future.  For example, video streaming services often explain recommended content with “Because you watched X,” where X is viewed content similar to the recommended content.  Instead, imagine richer content based on favored genres, actors, and themes.  Instead of promoting rabbit holes of similar content, such explanations might suggest how you might explore unfamiliar content that could expand your tastes and generate more valuable recommendations in the future.

There are multiple approaches to explanation, such as analyzing node activation in neural networks.  But causal models are eminently explainable because they directly encode easy-to-understand causal relationships in the modeling domain.  Indeed, causality is the core of explanation; to explain an event means to provide the cause of the event.  Causal models provide explanations in the language of the domain you are modeling (semantic explanations) rather than in terms of the model’s architecture (“nodes” and “activations” – syntactic explanations).

More Valuable Machine Learning

When a machine learning engineer trains and validates a machine learning algorithm, she deploys it to a production environment, like any other set of code.  Once she does, it becomes an artifact that has value to the organization.

All else being equal, a model artifact is more valuable if it has causal elements than if it does not.  Robustness and explicability contribute to value by reducing the cost of maintenance.  If it is robust, it breaks less often, and if it is explainable, you can figure out how to fix it when it does.

In addition, causal invariance allows the modeler to decompose some causal models into smaller composable modules.  These modules can be individually and independently tested and validated, aligning with software engineering best practices.  Computer operations on these artifacts can execute separately, enabling more efficient use of modern cloud computing infrastructure. For large machine learning model artifacts, if we get additional training data or discover an issue with the initial training data, one typically has to retrain the model from scratch, which is often expensive.  In contrast, we would only need to retrain the modules of the causal model that are relevant to the new data.  Finally, your team can reuse components from old problems in models attacking new problems if those problems overlap.

Figure 2. Causal models can be decomposed into components. This ability has benefits when contrasted with large machine learning artifacts.

Why I wrote this book

I wrote this book because I wanted a code-first approach to causal inference.  By that, I don’t mean simply importing a causal inference library and a dataset to some “do_causal_inference” method.  I wanted a code-first book that separated causal modeling from the statistical estimation and inference.  Modeling is about turning your deep knowledge about a problem into code that can automate data-driven decision making.  In contrast, modern day statistical and machine learning frameworks, once given a model, automate away statistical learning and inference.  So I thought there ought to be a book that focused on how to build causal models and left the statistical mechanics of causal inference given the model to these frameworks.  The book didn’t exist, so I wrote it.

How is this book different from other causal inference books?

Causal inference research relies mainly on three different skill sets; the ability to turn your domain knowledge into code, deep skills in probability theory, and deep skills in statistical theory, namely estimation theory and design of experiments.

In this book, I take the following approach:

  • For common problems of causal inference, we can work with software tools that handle the probability theory and statistical theory for us.  We can focus on learning to use those tools.
  • When we need to do more bespoke causal modeling, we can rely on generative machine learning tools.  Those tools will help us handle the probability and statistical theory for us using blackbox inference techniques, including cutting edge deep learning-based methods like variational inference.

Causal inference as a field is daunting because it asks the practitioner to acquire many skills.  Different books handle that balance of skill acquisition differently.  This book focuses on:

  1. Showing you how to turn domain knowledge into code representing testable causal assumptions.
  2. Showing you at a high level how those causal assumptions guide algorithmic causal inference.
  3. Working with machine learning software libraries that implement those inference algorithms.

Who Should Read This Book?

This book is for:

  • Data scientists, machine learning engineers, and code-savvy product managers looking to solve causal inference problems in industry with production-quality code.
  • Researchers who want to apply causal inference to their domain of expertise without having to get Ph.D.-level depth into statistical estimation theory and design of experiments.
  • Statisticians and economists who know a few causal inference methods and want a birds-eye view that ties it all together.
  • People who want to get in on the ground floor of causal AI.

What is the required mathematical and programming background?

This book assumes a level of familiarity with probability and statistics typical of a data scientist. Specifically, it assumes you have basic knowledge of:

  • Probability distributions.
  • Joint probability and conditional probability and how they relate together (chain rule, Bayes rule).
  • What it means to draw samples from a distribution.
  • Expectation, independence, and conditional independence.
  • Statistical ideas such as random samples, identically and independently sampled data, and statistical bias.

I’ve included a set of primers on these topics at As we progress, I’ll point you to specific primers of relevance.

Rest assured, this book doesn’t require a deep background in probability and statistics theory. The relationship between causality and statistics is like the relationship between engineering and math. Engineering involves a lot of math, but you need only a bit to learn core engineering concepts. After learning those concepts and digging into an applied problem, you can focus on learning the extra math you need to go deep on that problem.

What programming tools do you use and what is the expected level of usage?

This book assumes you are familiar with data science scripting in Python. The three open source Python libraries we rely on in this book are DoWhy, pgmpy, and Pyro. DoWhy is a library for causal inference and other causal modeling tasks developed by Microsoft Research.  pgmpy is a probabilistic graphical modeling library built on SciPy and NetworkX.

Again, our code-first goal is different because rather than going deep into statistical theory needed to do causal inference, we rely on these supporting libraries to do the statistics for us.  DoWhy tries to be as end-to-end as possible in terms of mapping domain knowledge inputs to causal inference outputs.  When we want to do more bespoke modeling, we’ll use pgmpy or Pyro.  These libraries provide probabilistic inference algorithms that take care of the estimation theory.  pgmpy has graph-based inference algorithms that are extremely reliable.  Pyro, as an extension of Pytorch, extends causal modeling to deep generative models on high dimensional data and variational inference—a cutting-edge deep learning-based inference technique.

If your background is in R or Julia, you should still find this book useful.  There are numerous R packages that overlap in functionality with DoWhy in R and Julia.  Graphical modeling software in these languages, such as bnlearn, can substitute for pgmpy.  Similarly, the ideas we develop with Pyro would work with similar probabilistic programming languages, such as Stan, PyMC, and Turing.jl. I include tutorials in other languages and libraries in

If you want to learn more, check out the book here.