The Theory That Would Not Die, by Sharon Bertsch McGrayne
What is known today as Bayes' theorem was invented, like most things, by the marquis de Laplace, but Bayes really did come up with the most important part of it. (By modern conventions we'd call it the Bayes-Price-Laplace rule, but it's too late.) The idea was known then as "inverse probability": probability began with problems where the possible outcomes are known (e.g. a game of chance), and tried to calculate the likelihood of a given outcome. The inverse is to begin with an outcome and try to calculate the probability of it happening. This is one of the core problems of science - given 100 white swans and one black, how likely is it that 1% of swans are black, or that this one is a rare mutant, or that none are black and this observation a trick of the light? Laplace cheerfully proceeded to invent frequentism, the sworn enemy of Bayesian statistics, and followed the latter for the rest of his career (believing, incorrectly, that the two methods always converge on the same solution). Looking for problems to put to the new statistical methodology, Laplace examined if rates of male and female births are identical, and found that male births are consistently higher, but couldn't explain why. (And apparently this is still the case! We still don't know why! I've been misled by every schoolchild probability exercise that begins "assume male and female births equally likely…")
This book, a pretty good history of the topic, then moves on to the development of modern statistical research methods, led by R.A. Fisher, Pearson père (Karl) and fils (Egon) and Jerzy Neyman, fanatical anti-Bayesians all. Thus began the modern frequentist-Bayesian war. Bayes method is uncontroversial when the prior probabilities are known, but the debate is over the legitimacy of making up a "subjective" prior based on one's intuition, or using a uniform prior, when one doesn't. The opposition of the leading statisticians to Bayesian methods won out until WWII, when code-breaking became a mathematicians's game. (At least for early adopters, such as the Poles. "The British agency charged with cracking German military codes and ciphers clung to the tradition that decryption was a job for gentlemen with linguistic skills. Instead of hiring mathematicians, the Government Code and Cypher School employed art historians, scholars of ancient Greek and medieval German, crossword puzzlers, and chess players.") An actuarial mathematician, Marián Rejewski, used group theory to figure out how the wheels of the German Enigma were wired. Alan Turing used Bayesian methods at Bletchley Park to calculate the expected probability of the occurrence of various letters, allowing the breaking of codes which were switched far too often to be broken by brute force.
After the war, the Bayesian cause was taken up by luminaries such as Jimmie Savage, Harold Jeffreys and E.T. Jaynes. ("Jaynes, who said he always checked to see what Laplace had done before tackling an applied problem, turned off many colleagues with his Bayesian fervor.") Savage was a "full-blown subjectivist": science begins as subjective, varying opinions, he believed; and as more data comes in views slowly converge on the truth. The popularity of quantum mechanics, which like stat mech uses frequentist statistics, worked against the neo-Bayesians. Erwin Schrödinger said, "The individual case is entirely devoid of interest." Robert Schlaifer was a historian who got a job teaching statistics at Harvard Business School and basically taught himself on the fly, inventing the field of Bayesian decision theory as he went. John Tukey used Bayesian methodology, but didn't call himself one. Am I losing you? The back end of this book starts to sag, as it goes through a laundry list of statisticians and their allegiances to Bayesianism, as well as the emergence of numerical computational methods which made calculating posteriors much more feasible: MCMC, Gibbs sampling, the Metropolitan-Hastings algorithm, and software like Stan or BUGS. The breadth of the coverage is overwhelming, and hampered by its shallowness - there are almost no mathematical details in the book, aside from the simplest arithmetic examples. And what's worse is how the end slowly descends into the breathless language of tech marketing: here is a list of unrelated problems, they are unsolvable by traditional methods, but guess which magical solution is being used to solve them?
Ultimately, Bayesian thinking is a useful way to make decisions and a powerful statistical paradigm for solving problems with limited information. While it may be the preferred approach now in a lot of fields, it is also not a panacea, nor are frequentist methods being put to pasture. If the Bayesian wars have reached some kind of conclusion, it is that knowing when to reach for MLE or MAP, p-values or posteriors, or some combination of both (paging Empirical Bayesians!), is more important than blind devotion to a method.