Finished reading: The Book of Why by Judea Pearl 📚
This was the topic of a book club at work but I’m really glad I read it. My scepticism going in was probably typical of someone not all that familiar with causal analysis, believing that we can just throw all the variables at a regression model and get an answer - anything uncorrelated will have a small coefficient and we can dispose of it. This book - while it takes a slightly arrogant/high-and-mighty approach to getting there - carefully explains that this approach works only if there is no dependency between the variables. This is, of course, structured into the regression model assumptions that the covariates are “independent and identically distributed” (i.i.d.) but who checks assumptions? It goes into depth about the different ways that covariates can be connected; how to route around some of them; and how to figure out which ones to include.
Some of the examples seemed a bit too strawman for my liking, but I do think the general foundation is pretty solid. It’s a bit odd to have what should really be a textbook in causal analysis as a prose-heavy combination of history and wordy examples, but then again I can’t say I’d have picked up the textbook and read it cover-to-cover like this.
Overall, I think this should be on any data scientist’s reading list at some point. I have a bunch of follow-on reading to get through now, but I’m much less likely to make the simple errors in my own statistical analyses (even if I do need to find an analyst who can work it out).