Finished reading: The Book of Why by Judea Pearl 📚

This was the topic of a book club at work but I’m really glad I read it. My scepticism going in was probably typical of someone not all that familiar with causal analysis, believing that we can just throw all the variables at a regression model and get an answer - anything uncorrelated will have a small coefficient and we can dispose of it. This book - while it takes a slightly arrogant/high-and-mighty approach to getting there - carefully explains that this approach works *only* if there is no dependency between the variables. This is, of course, structured into the regression model assumptions that the covariates are “*independent* and identically distributed” (i.i.d.) but who checks assumptions? It goes into depth about the different ways that covariates can be connected; how to route around some of them; and how to figure out which ones to include.

Some of the examples seemed a bit too strawman for my liking, but I do think the general foundation is pretty solid. It’s a bit odd to have what should really be a textbook in causal analysis as a prose-heavy combination of history and wordy examples, but then again I can’t say I’d have picked up the textbook and read it cover-to-cover like this.

Overall, I think this should be on any data scientist’s reading list at some point. I have a bunch of follow-on reading to get through now, but I’m much less likely to make the simple errors in my own statistical analyses (even if I do need to find an analyst who *can* work it out).