data science – the signal

Machine learning models typically predict outcomes based on what they’ve seen — but what about what they haven’t?

Google tackled this issue by integrating causal reasoning into its ML training, optimizing when to show Google Drive results in Gmail search.

The result?

A 9.15% increase in click-through rates without costly A/B tests.

Let’s break it down.

The problem: biased observational data

Traditional ML models train on historical user behavior, assuming that past actions predict future outcomes.

But this approach is inherently biased because it only accounts for what actually happened — not what could have happened under different conditions.

Example: Gmail sometimes displays Google Drive results in search. If a user clicks, does that mean they needed the result? If they don’t click, would they have clicked if Drive results were presented differently?

Standard ML models can’t answer these counterfactual questions.

Google’s approach: Causal ML in action

Instead of treating all users the same, Google’s model categorized them into four response types based on their likelihood to click:

Compliers — Click only if Drive results are shown.
Always-Takers — Click regardless of whether results are shown.
Never-Takers — Never click Drive results.
Defiers — Click only if Drive results are not shown (a rare edge case).

The challenge? You can’t directly observe these categories — a user only experiences one version of reality.

Google solved this by estimating counterfactual probabilities, essentially asking: How likely is a user to click if the result were shown, given that it wasn’t?

The key insight: optimizing for the right users

Instead of optimizing blindly for clicks, the model focused on:

Prioritizing Compliers (since they benefit the most from Drive results).
Accounting for Always-Takers (who don’t need Drive suggestions to click).
Avoiding Never-Takers (who won’t click regardless).

This logic was embedded into the training objective function, ensuring that the model learned from causal relationships rather than just surface-level patterns.

The Results: Smarter Personalization Without Experiments

By integrating causal logic into ML training, Google achieved:

+9.15% increase in click-through rate (CTR)
Only +1.4% increase in resource usage (not statistically significant)
No need for costly A/B testing

This proves that causal modeling can reduce bias in implicit feedback, making machine learning models more adaptive, efficient, and user-friendly — all without disrupting the user experience.

Why This Matters

Most companies rely on A/B testing to optimize product features, but sometimes that approach can be expensive, or just not possible at all.

Causal ML offers a way to refine decisions without running thousands of real-world experiments.

Google’s work shows that the future of ML isn’t just about better predictions — it’s about understanding why users behave the way they do and making decisions accordingly.

Source

Training Machine Learning Models With Causal Logic

Tag: data science

How Google Used Causal ML to Optimize Gmail Search (Without A/B Testing)