Bayesian Averaging
This article is part of the series on distributed Bayesian reasoning. It assumes you have read the previous article on the Basic Math.
The Problem: Small Samples
In Basic Math, we used the informed probability
But this is actually a little Naive. Suppose only one person actually voted on both π΄ and π΅ and accepts both. Then
Certainly not. A single vote from a single user is not a great deal of information. We need a more sophisticated way of estimating the priors of the meta-reasoner based on the evidence we have in the form of arguments and votes.
The Bayesian approach to answering this question requires us to have priors: we actually need to start with an estimate of this probability β or rather, a distribution of possible probabilities β even before we have any data! Then we can use the rules of Bayesian belief updating to combine our priors with our data to come up with a posterior belief.
The Beta-Bernoulli Model
It turns out, we are actually dealing with a textbook example of a problem that can be solved with a simple Bayesian hierarchical model. The solution, using a beta-Bernoulli distribution, is amply described elsewhere (I learned about them from this book). Here is the solution:
Let:
- Ο = our prior estimate of the probability that the average juror accepts π΄ before getting any vote data
- ΞΊ = our prior estimate of the concentration of likely values around Ο (high ΞΊ means low variance)
- π =
= the number of users who have voted on π΄ - z =
= the number of those users who also agree with π΄
Then our posterior estimate of the probability that the average user accepts π΄ is given they have voted on it is:
What should we use as our prior Ο? That depends on the context. If this method is being implemented in a social platform, then this can be based on historical data. For example if in the past, the average accept/reject ratio for arguments submitted to the platform was 80%, then having nothing else to go on, 80% is a good estimate of Ο. Our estimate of ΞΊ can also be made using historical data.
What we have done here is sometimes called Bayesian Averaging. The above formula essentially gives us a weighted average of our prior Ο and the observed ratio z/π, with our data z/π getting higher weight the larger the value of N relative to ΞΊ.
The Bayesian-Average Probability Function
When calculating values of π up to this point, we have just taking ratios of counts from our votes table (the π function). For example, the formula for π(π΄=a) is just:
Where c() is the total number of voters. To use a Bayesian approach to estimating probabilities, instead of taking a ratio, we plug these same two counts into
Letβs define a new function πα΅₯ that does this for us.
So where, by definition
We have instead:
And where by definition of conditional probability:
We have instead
Now letβs compute an actual value of πα΅₯(π΄=1). First, we need to choose priors. Letβs suppose that historically, on average 80% voters accept root claims initially. So Ο=80%. And letβs suppose the variation in this distribution can be represented by ΞΊ=10. So
In this case, the large amount of votes overwhelms our relatively weak prior, and so our result is very close to
Two-Level Bayesian Averaging
Reviewing where we are going with this, recall from the Basic Math article that the justified opinion formula in the case of an argument tree with a single premise argument is:
Now we are saying that
But what are our priors Ο and ΞΊ?
Recall that we have just used Bayesian averaging to estimate of the probability that the average person accepts π΄ (
However if we use
The priors for
Now we can set
What is our prior estimate of ΞΊ? We might think that it should be proportional to the number of people who voted on π΄, but this is mistaken. A large number of votes on π΄ provide strong evidence for estimating Ο = πα΅₯(π΄=a). But our estimate for ΞΊ is based on our prior expectations about the degree to which people are influenced by arguments. This information can come from observation of actual variance in the case of past arguments. If this is historical very high, then ΞΊ should be low, and vice versa.
For simplicity, letβs use the same prior ΞΊ=10 that we used before.
We can now finally calculate:
This is slightly lower than
Clearly, we can extend this reasoning to long argument threads, though we will not do this here.
Further Development
This document is a work in progress β these models have not been fully developed. In fact, we are looking for collaborators. If you are an expert in Bayesian hierarchical models and causal inference, please contact collaborations@deliberati.io.