## Posts Tagged ‘**probability**’

## what is Bayesian inference?

Canto: So as a dumb non-scientific science aficionado, I’ve come across Bayesian inference and probability a few times before, and even might have come to an understanding of it before losing it again, but I’m wanting to get my head around it, especially in terms of consciousness and how we make sense of the external world via the complex interpreting and understanding systems in our heads. My vague sense of it is that it’s a kind of open-ended system of inferring what’s happening by continually updating the ‘understanding system’ with new data. Is that anything like it?

Jacinta: Okay, we’ve been reading Anil Seth’s *Being You, *subtitled ‘a new science of consciousness’, which argues for consciousness, or at least perception, as ‘controlled hallucination’. Bayesian reasoning is tightly described as ‘inference to the best explanation’, so yes, we take percepts that strike us as surprising or out of the ordinary, and do work on them through memory or the widening of perspective to make them fit with previous experience – the best explanation we can make of the meaning of that percept. I think by ‘controlled hallucination’, Seth is suggesting that the impressionistic blast of data that impinges on our senses at any moment gets its ‘control’, loses its hallucinatory impact, as a result of what we call experience, the connections between this blast and previous blasts.

Canto: So that due to familiarity we stop thinking of them as blasts, though they might’ve seemed that way as new-borns. And might seem again under the influence of drugs.

Jacinta: Yes, which can scramble the regular controls. But returning to Thomas Bayes and his reasoning, Seth describes it as *abductive, *as opposed to the deductive reasoning of classical logic, or the inductive reasoning derived from experience (extrapolation from an apparently unending series of observations, such as the regular waxing and waning of the moon). Here’s what Seth says about abduction:

Abductive reasoning – the sort formalised by Bayesian inference – is all about finding the best explanation for a set of observations, when these observations are incomplete, uncertain or otherwise ambiguous. Like inductive reasoning, abductive reasoning can also get things wrong. In seeking the ‘best explanation’, abductive reasoning can be thought of as reasoning backward, from observed effects to their most likely causes, rather than forward, from causes to their effects – as is the case for deduction and induction.

Anil Seth,

Being you,p98

Canto: Ah right, so what we experience first are effects – stuff in our heads, and we have to make the best guess about their causes – stuff in the world. Or what we believe to be in the world. So, as new-borns we see – in our heads – the faces and bodies of these people making a fuss over us, though we apparently don’t even know what faces and bodies are, let alone parents. But over time and much repetition we come to see these faces and bodies aren’t there to harm us (if we’re lucky) and, with further information over vast swathes of time, that they’re our parents, and that we’re one of the species called *Homo sapiens, *etc etc

Jacinta: Well it’s good that you’ve gone back to earliest childhood, because it makes a mockery, in a way, of inferring ‘the most likely cause for the observed data’, to quote Seth, as obviously infants don’t ‘think’ that way.

Canto: And neither do adults – it’s more automatic than ‘thinking’, it’s a way of understanding and surviving in their world…

Jacinta: We need to think of inference as something more basic, far more basic than an intellectual process, of course. Anyway, here’s how Seth describes it. We go from what we already know, which is termed the *prior*, to what we might know in the future (the *posterior*) by means of what we’re now learning (the *likelihood*). The uniting concept here is ‘knowledge’, in its different stages. The *prior *isn’t necessarily stable, it can be modified or overturned by new learning. You could describe the prior also as a belief. You may believe that, say Ukraine will win the current war – whatever winning means in this context – but further learning may alter that belief one way or another. We’re looking for the best posterior probability, and so, in the Ukrainian example, we’re thoroughly examining future likelihoods – media sources and expert opinions as to the current state of events and what they might lead to – as well as battling with particular tendencies to be optimistic or pessimistic.

Canto: But doesn’t Bayesian inference, or probability, have a mathematical aspect? It doesn’t seem, from what you’ve said, that there’s anything remotely quantifiable here. How can you quantify beliefs or knowledge?

Jacinta: Well, Seth is looking at quantities here only in terms of some percept, say, as being more or less likely to be of a particular thing-in-the-world, say a particular species of bird, based on experience, the likelihood of that species being spotted in that place, at that time, and so on. I know that mathematics is involved in Bayesian probability – just look it up online – but the concept of inferring to the most likely conclusion from best current and past data seems to be mathematical only in that broadest sense. And I must admit I’m more interested in Seth’s concept of consciousness than in the mathematics of probability, Bayesian or otherwise.

Canto: Ah, but I’m wondering if, since all the physicists are telling me the universe is, if not mathematical, inexplicable without mathematics, maybe the full comprehension of consciousness requires maths too?

Jacinta: Okay since our topic is Bayesian inference we might need to wade into the mathematical shallows here. So Thomas Bayes presented an alternative to what is now, and maybe then, called *frequentist *statistical analysis. Here’s a rough example taken from a video referenced below. A ‘frequentist GP would use basic statistics derived from a model, say ‘a certain number/percentage of my male patients above a particular age have heart problems’ to infer that the patient before her’s symptoms are quite likely the result of a heart condition. A Bayesian GP would have a similar model but would also take into account her prior knowledge of this particular patient, which would make the diagnosis more likely or unlikely depending on the content of this prior knowledge.

Canto: Yeah that’s the mathematical shallows all right.

Jacinta: Well it might surprise you how mathematical even examples like this can be made. But put another way, the Bayesian approach is experiential rather than simple statistical number-crunching. ‘Frequentist’ is given away by the title, so maybe it strives to be objective.

Canto: Quantitative vs qualitative?

Jacinta: Well, yes that’s part of it, but there is a Bayesian theorem, which I may as well stick in here for completeness’ sake.

There are different descriptions of the theorem – this one doesn’t give much indication of the importance of prior knowledge/experience. Anyway, returning to Seth and consciousness, these Bayesian inferences would be constantly updated in the case of infants as you say, as new knowledge is being produced at a rapid clip, that this animal is a dog, say, and is mostly harmless but not always, and this item isn’t food though it’s nice to suck on, but that item tastes horrible – though they wouldn’t know what *taste *is…

Canto: Which really explains why all these neural connections are laid down do quickly in early childhood – they’re really essential for survival.

Jacinta: And, as Seth points out, the best scientific methods involve Bayesian inference – theories upgraded or discarded by experimental evidence or new discoveries that don’t fit. But our thinking – that, when we’re infants, these people constantly around us are more significant, for us, than the people who pass by or occasionally visit, doesn’t have to rise to the level of theory. They’re just understandings, more or less accurate, and constantly updated – for example we might learn that these adults or pets aren’t always on our side, for example when we try to eat the dog, or whatever. Anyway, we could go into a little bit of detail about the probabilities, from zero to one, of priors, likelihoods and posteriors, and about probability distributions, of the Gaussian kind, which shift as more information comes to mind, but maybe we’ll come back to it in a future post. My head hurts already.

**References**

Anil Seth, *Being you: a new science of consciousness, *2021

Bayesian vs frequentist statistics (video), Ox Educ

Frequentism and Bayesianism: What’s the Big Deal? | SciPy 2014 | Jake VanderPlas (video)

## nothing so simple? the gambler’s fallacy

Humans are capable of reasoning, but not always or often very well. Daniel Kahneman’s famous book *Thinking, fast and slow *provides us with many examples, and not being much of a clear thinker myself, where probability and all that Bayesian stuff is concerned, I’ll start with something really simple before ascending, one day, to the simply simple. And not being much of a gambler, I’d never heard of the gambler’s fallacy before. It appears to be a simple and obvious fallacy, but I’m sure I can succeed in making it more confusing than it should be.

The fallacy involves believing that what has occurred before might dictate what happens in the future, in a particular context. It’s best explained by the tossing of a coin. With a fair coin, the probability of it landing tails up, *on any toss, *is .5, given that, in probability language, absolute certainty is given a value of 1, and no possibility at all is given 0. The key here is what I’ve italicised – the fallacy lies in believing that the coin, as if it’s a thinking being, has an interest in maintaining a result, *over many tosses*, of 50% tails – so that if results skew towards zero, say after 6 heads results in a row, the probability of the next toss being tails will rise above .5.

Put another way: assuming a fair coin, the probability of it landing heads on one toss is .5. That should mean that over time, with x number of tosses, assuming x to be a very large number, the result for a heads should approach 50%. So it would seem quite reasonable, if you were keeping count, to bet on a result that brings the average closer to 50%. That’s without imagining that the coin *wants *to get to 50%. It just *should, *shouldn’t it?

The clear answer is *no. *There can be no influence from the past on any new coin toss. How can there be? That would be truly weird if you think about it. The overall results may approach 50%, according to the law of large numbers, but that’s *independent *of particular tosses. If you look at it this way, creating a dependency, you decide to bet on a pair of tosses. It could be HH, TT, HT or TH. Those are the only four options and the probability of each of them is .25 (i.e .5 x .5). So you might think that, after two heads in a row, it would be wise to bet on tails. But this bet would still have a .5 probability of succeeding, and the result HHT, taken together, would be .5 x .5 x .5, which is .125 or one eighth, the same as all the other seven results of three coin tosses. The probability doesn’t change before each toss, no matter the result of the previous toss.

So far, so clear, but it would be hard not to be influenced into betting against a run continuing. That’s not irrational, is it? But nor is it rational, considering there’s alway a 50/50 chance with each toss. It’s just a bet. And yet… I’m reminded of Swann in a *A la recherche du temps perdu*, as my mind clouds over…

## Bayesian probability, sans maths (mostly)

Okay time to get back to sciency stuff, to try to get my head around things I should know more about. Bayesian statistics and probability have been brought to the periphery of my attention many times over the years, but my current slow reading of Daniel Kahneman’s *Thinking fast and slow *has challenged me to master it once and for all (and then doubtless to forget about it forevermore).

I’ve started a couple of pieces on this topic in the past week or so, and abandoned them along with all hope of making sense of what is no doubt a doddle for the cognoscenti, so I clearly need to keep it simple for my own sake. The reason I’m interested is because critics and analysts of both scientific research and political policy-making often complain that Bayesian reasoning is insufficiently utilised, to the detriment of such activities. I can’t pretend that I’ll be able to help out though!

So Thomas Bayes was an 18th century English statistician who left a theorem behind in his unpublished papers, apparently underestimating its significance. The person most responsible for utilising and popularising Bayes’ work was the French polymath Pierre-Simon Laplace. The theorem, or rule, is captured mathematically thusly:

where *A* and *B* are events, and *P(B)*, that is, the probability of event *B*, is not equal to zero. In statistics, the probability of an event’s occurrence ranges from 0 to 1 – meaning zero probability to total certainty.

I do, at least, understand the above equation, which, wordwise, means that the probability of *A *occurring, given that *B *has occurred, is equal to the probability of *B *occurring, given that *A *has occurred, multiplied by the probability of *A’s *occurrence, all divided by the probability of B’s occurrence. However, after tackling a few video mini-lectures on the topic I’ve decided to give up and focus on Kahneman’s largely non-mathematical treatment with regard to decision-making. The theorem, or rule, presents, as Kahneman puts it, ‘the logic of how people should change their mind in the light of evidence’. Here’s how Kahneman first describes it:

Bayes’ rule specifies how prior beliefs… should be combined with the diagnosticity of the evidence, the degree to which it favours the hypothesis over the alternative.

D Kahneman,Thinking fast and slow,p154

In the most simple example – if you believe that there’s a 65% chance of rain tomorrow, you really need to believe that there’s a 35% chance of no rain tomorrow, rather than any alternative figure. That seems logical enough, but take this example re US Presidential elections:

… if you believe there’s a 30% chance that candidate x will be elected President, and an 80% chance that he’ll be re-elected if he wins first time, then you must believe that the chances that he will be elected twice in a row are 24%.

This is also logical, but not obvious to a surprisingly large percentage of people. What appears to ‘throw’ people is a story, a causal narrative. They imagine a candidate winning, somewhat against the odds, then proving her worth in office and winning easily next time round – this story deceives them into defying logic and imagining that the chance of her winning twice in a row is greater than that of winning first time around – which is a logical impossibility. Kahneman places this kind of irrationalism within the frame of system 1 v system 2 thinking – roughly equivalent to intuition v concentrated reasoning. His solution to the problem of this kind of suasion-by-story is to step back and take greater stock of the ‘diagnosticity’ of what you already know, or what you have predicted, and how it affects any further related predictions. We’re apparently very bad at this.

There are many examples throughout the book of failure to reason effectively from information about *base rates*, often described as ‘base-rate neglect’. A base rate is a statistical fact which should be taken into account when considering a further probability. For example, when given information about the character of a a fictional person T, information that was deliberately designed to suggest he was stereotypical of a librarian, research participants gave the person a much higher probability of being a librarian rather than a farmer, even though they knew, or should have known, that the number of persons employed as farmers was higher by a large factor than those employed as librarians (the base rate of librarians in the workforce). Of course the degree to which the base rate was made salient to participants affected their predictions.

Here’s a delicious example of the application, or failure to apply, Bayes’ rule:

A cab was involved in a hit-and-run at night. Two cab companies, Green Cabs and Blue Cabs, operate in the city. You’re given the following data:

– 85% of the cabs in the city are Green, 15% are Blue.

– A witness identified the cab as Blue. The court tested the reliability of the witness under the circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colours 80% of the time and failed 20% of the time.

What is the probability that the car involved in the accident was Blue rather than Green?

D Kahneman,Thinking fast and slow,p166

It’s an artificial scenario, granted, but if we accept the accuracy of those probabilities, we can say this: given that the base rate of Blue cars is 15%, and the probability of the witness identifying the car accurately is 80%, we have this figure for the dividend – (.15/.85) x (.8/.2) =.706. Dividing this by the range of probabilities plus the dividend (1.706) gives approximately 41%.

So how close were the research participants to this figure? Most participants ignored the statistical data – the base rates – and gave the figure of 80%. They were more convinced by the witness. However, when the problem was framed differently, by providing causal rather than statistical data, participants’ guesses were more accurate. Here’s the alternative presentation of the scenario:

You’re given the following data:

– the two companies operate the same number of cabs, but Green cabs are involved in 85% of accidents

– the information about the witness is the same as previously presented

The mathematical result is the same, but this time the guesses were much closer to the correct figure. The difference lay in the framing. Green cabs *cause *accidents. That was the fact that jumped out, whereas in the first scenario, the fact that most clearly jumped out was that the witness identified the offending car as Blue. The statistical data in scenario 1 was largely ignored. In the second scenario, the witness’s identification of the Blue car moderated the tendency to blame the Green cars, whereas in scenario 1 there was no ‘story’ about Green cars causing accidents and the blame shifted almost entirely to the Blue cars, based on the witness’s story. Kahneman named his chapter about this tendency ‘Causes trump statistics’.

So there are *causal *and *statistical *base rates, and the lesson is that in much of our intuitive understanding of probability, we simply pay far more attention to causal base rates, largely to our detriment. Also, our causal inferences tend to be stereotyped, so that only if we are faced with surprising causal rates, in particular cases and not presented statistically, are we liable to adjust our probabilistic assessments. Kahneman presents some striking illustrations of this in the research literature. Causal information creates bias in other areas of behaviour assessment too, of course, as in the phenomenon of regression to the mean, but that’s for another day, perhaps.