Distributed Bayesian Reasoning Basic Math

14 minute read

In this article we develop the basic mathematical formula for calculating the opinion of the meta-reasoner in arguments involving a single main argument thread.

Background Reading

To understand this article you should first read:

For a deeper understanding of some of the assumptions we make in this article, it may also help to read:

You also need familiar with basic syntax and concepts from probability theory, specifically:

Sample Argument

Simplified Argument

Suppose a sensational murder trial is being discussed in an online platform that allows the general public to vote on what they think the verdict should be and why.

Initially, 1,000 users vote on the root claim (𝐴) the defendant is guilty, before any discussion has taken place on the platform. Then after this initial vote, somebody submits an argument claiming (𝐵) the defendant signed a confession, and users are asked to vote on this claim.

150 out of the 1,000 users vote on 𝐵. Of these 150 users, a small number changed their vote on 𝐴 after voting on 𝐵 (presumably, because they found 𝐵 convincing).

The final votes are tabulated in the following table. We represent votes using the numeric values 0=reject, 1=accept, and -1=didn’t vote.

  A=-1 A=0 A=1 SUM
B=-1 0 455 395 850
B=0 0 25 25 50
B=1 0 20 80 100
SUM 0 500 500 1000
B≥0 0 45 105 150

According to this table, all 1,000 users voted on 𝐴, with 500 rejecting 𝐴 (𝐴=0) and 500 accepting 𝐴 (𝐴=1). But only 150 users voted on 𝐵 (𝐵≥0).

Raw Probabilities

Our first step is to convert these counts into probabilities

Let’s define a function 𝑐 that returns the values of a cell in this table. For example:

𝑐(A=1)=500𝑐(A=1,B=0)=25𝑐()=1000

From this, we can define a function 𝑃 that tells us the probability that a random user voted in some way. For example:

P(A=1)=𝑐(A=1)÷c()=500÷1000=50%

We can also define conditional probabilities, for example the probability that a random user accepts 𝐴 given they accept 𝐵 is:

P(A=1|B=1)=P(A=1,B=1)P(B=1)

We can calculate conditional probabilities just by taking the ratio of counts, because for example:

P(A=1|B=1)=P(A=1,B=1)P(B=1)=𝑐(A=1,B=1)÷c()𝑐(B=1)÷c()=𝑐(A=1,B=1)𝑐(B=1)

So

P(A=1|B=1)=80100=80%

Note that, P represents the probability that a randomly selected user, from among those who voted, votes in some way. If this sample of users is small or biased, P may not be a good estimate of what an average person actually believes. We will ignore this detail in this article, but address it in Bayesian Averaging.

Informed Probabilities

P(A=1) is only 50%, but P(A=1|B=1) is 80%. This means that users who accept claim 𝐵 are more likely to accept claim 𝐴. So 𝐵 apparently is an effective supporting argument for 𝐴. On the other hand P(A=1|B=0) = 25/50 = 50%. Users who reject 𝐵 are not more likely to accept 𝐴.

Notably, among users who either accept OR reject 𝐵, 70% of users accept 𝐴:

P(A=1|B0)=c(A=1,B0)c(B0)=105150=70%

While only 50% of users accept 𝐴 overall. Apparently simply voting on 𝐵 made users more likely to accept 𝐴.

What’s happening here is that, among users who voted on B, a large number accept B as true, and as we’ve seen users who accept B are more likely to accept A. What makes the group of users who voted on B different is that all of them are informed about 𝐵. Whether they accept it as true or not, they have at least been presented with the claim that (𝐵) the defendant signed a confession and had a chance to reject it, or to accept it and revise their belief accordingly. This is not necessarily the case for the larger group of users: perhaps the media coverage of the murder never mentioned any confession, and most users never learned about it until they were asked to vote on claim B.

This is just made-up data, but it is meant to illustrate something that is often the case in reality: arguments can change mindsespecially if they provide new information.

Our goal is to calculate the beliefs of the Meta-Reasoner: a hypothetical fully-informed user who shares the knowledge of all the other users. So the opinion of users who voted on 𝐵 is probably a better estimate of a fully-informed opinion.

So we’ll call the users who voted on 𝐵 the informed users, and our first step is estimating the beliefs of the meta-reasoner will be to represent the opinion of the average informed user with the informed probability function Pi:

Pi(A)=P(A|B0)

So

Pi(A=1)=𝑃(A=1|B0)

Which we have already calculated to be 70%.

The Law of Total Probability

The informed opinion on 𝐴 depends on 1) the probability that an informed user actually accepts 𝐵, and 2) the probability that a user who accepts 𝐵 also accepts 𝐴. In fact we can rewrite the equation for Pi(A) in terms of these probabilities. Since the set of users who accept 𝐵 and the set that reject 𝐵 partition the set of users who voted on 𝐵, the law of total probability says that:

(1)Pi(A)=𝑃(A=1|B0)=b0Pi(B=b)Pi(A|B=b)=Pi(B=0)P(A|B=0)+Pi(B=1)P(A|B=1)

We have already calculated P(A=1|B=0)=50% and P(A=1|B=1)=80% above, so it remains only to calculate Pi(B=1):

Pi(B=1)=P(B=1|B0)=𝑐(B=1)𝑐(B0)=100150=66%

Plugging these values into (1), we again get 70%.

Pi(A=1)=Pi(B=0)P(A=1|B=0)+Pi(B=1)P(A=1|B=1)=(166%)(50%)+(66%)(80%)=70%

Formula (1) is important because it shows us exactly how the probability that users accept 𝐵 determines the probability that they accept 𝐴. And critically, it shows us what the probability of accepting 𝐴 would be if the probability of accepting 𝐵 were different.

Distributed Reasoning

Now suppose a second group of 10 users holds an argument about whether to accept 𝐵, and during this argument users voted on claim 𝐺, the signature was forged. And suppose these users unanimously accept 𝐺 and found it very convincing: only 1/10 users accept 𝐵 after accepting 𝐺.

Clearly, the opinion of the meta-reasoner about 𝐵 will be equal to the opinion of the second group of voters, since this opinion is more informed, reflecting any new information conveyed by 𝐺.

Let’s define a function Ph that gives us the beliefs of the meta-reasoner. The beliefs of the meta-reasoner about B is the informed opinion on B, which is the opinion of users who also voted on G:

(2)Ph(B)=P(B|G0)

Let’s put the vote counts from the sub-jury in a table:

  B=0 B=1 B ≥ 0
𝐺=0 0 0 0
𝐺=1 9 1 10
𝐺≥0 9 1 10

And now we can calculate:

Ph(B=1)=P(B=1|G0)=c(B=1,G0)c(G0)=110=10%

Recall that (1) tells us how belief in B determines the first group of users’ belief in A. So to calculate the probability that a member of the first jury would accept 𝐴 if they held the beliefs of the second jury about 𝐵, we simply substitute of Ph(B=b) in place of Pi(B=b) in (1):

(3)Ph(A)=b0Pi(B=b)P(A|B=b)|Pi=Ph=b0Ph(B=b)P(A|B=b)

Plugging in the numbers:

Ph(A=1)= 𝑃h(B=0)𝑃(A=1|B=0)+ 𝑃h(B=1)𝑃(A=1|B=1)= (110%)(50%)+(10%)(80%)=53%

The meta-reasoner’s belief 𝑃(A=1) is very close to 𝑃(A=1|B=0)=50% – the average belief of users who voted on 𝐵 but rejected it – because a fully-informed user would probably reject 𝐵.

Causal Assumptions

Conditional Independence

Formula (3) is only valid if we assume the meta-reasoner forms their belief about (𝐴) the defendant is guilty entirely based on their belief about (𝐵) the defendant signed a confession. So their belief in (𝐺) the signature was forged does not effect their belief in 𝐴 directly, but only indirectly through 𝐵. In other words 𝐴 is conditionally independent of G given 𝐵. We discuss the justification for making these causal assumptions in the Meta-Reasoner.

Unfortunately, we can’t make the same sort of assumptions about (𝐶) the defendant retracted her confession. 𝐶 does not effect belief in 𝐴 only through 𝐵: learning that the defendant retracted her confession may make less of an impression on a user who never believed the defendant signed a confession in the first place. So the effect of accepting 𝐶 on a user’s acceptance of 𝐴 depends on whether or not that user accepts 𝐵.

The reason we can make the conditional independence assumption about 𝐺 and not 𝐶 is that 𝐺 is the premise of a premise argument, whereas 𝐶 is the premise of a warrant argument. The difference between premise arguments and warrant arguments is discussed in more detail in the Argument Model.

Formula for a 2-Argument Thread

Our next task is to calculate the opinion of the meta-reasoner after argument 𝐶 has been made.

First, we need to update our definition of the informed opinion. Previously, we defined the informed opinion as the opinion of users who voted on 𝐵; now that we have a second premise 𝐶 in the argument thread, we should include 𝐶 in the definition of informed opinion.

However, for users who reject 𝐵, what they think about 𝐶 is irrelevant, because (𝐶) the defendant retracted her confession is only argued as a way of convincing people who accept (𝐵) the defendant signed a confession that they still shouldn’t accept 𝐴. We discuss this important concept in the the section Argument Threads are Dialogs in the argument model.

So we’ll define the informed opinion as the opinion of users who either reject 𝐵, or accept 𝐵 and have voted on 𝐶:

Pi(A)=P(A|(B=0(B=1C0)))

We can then rewrite the formula for Pi(A) using the law of total probability and some probability calculus. The derivation is similar to the derivation of (1) and is shown in the appendix:

(4)Pi(A)= Pi(B=0)P(A|B=0)+ Pi(B=1)c0Pi(C=c|B=1)×P(A|B=1,C=c)

Now, suppose a third sub-jury holds a sub-trial about whether to accept 𝐶, giving us Ph(C=c). We can then plug in the opinions of the sub-juries Ph(B=b) and Ph(C=c) in place of Pi(B=b) and Pi(𝐶=𝑐|B=1) in (4):

(5)Ph(A)= Ph(B=0)P(A|B=0)+ Ph(B=1)c0Ph(C=c)×P(A|B=1,C=c)

This gives us us the posterior belief of the meta-reasoner Ph(A) as a function of the prior probability function P and the evidence from the sub-juries Ph(B=b) and Ph(C=c).

Using the shorthand F[P,Ph(B=b),Ph(C=c)] to refer to the formula in (5), we illustrated this calculation in the chart below:

Argument Thread

To show a sample calculation, suppose we obtain the following probabilities for users that have voted on 𝐴 and 𝐵, and 𝐶.

𝐵 𝐶 𝑃(𝐴|𝐵,𝐶)
0 -1 50%
1 0 80%
1 1 65%

And suppose that the beliefs from the sub-juries are Ph(B=1)=80% and Ph(𝐶=1)=60%. Plugging these into (5):

Ph(A=1)= (180%)×50%+80%×(160%)×80%+80%×60%×65%= 66.8%

Intuitively, this result reflects the fact that, although B is an effective argument (P(A=1|B=1)=80%) and the sub-jury mostly accepts it (Ph(B=1)=80%), C is a fairly effective counter-argument (P(A=1|B=1,C=1)=65%).

Formula for Long Threads

To generalize (5), we first rewrite it in the more easily-generalizable form:

Ph(A)= b0Ph(B=b)× if b=0 then P(A|B=0) else   c0Ph(C=c|B=1)×P(A|B=1,C=c)

Now suppose underneath the claim α there is a thread with 𝑛 premises β={β1,β2,...,βn}. Then:

(6)Ph(α=1)=b10Ph(β1=b1)× if b1=0 then P(α=1|β1=0) else   b20Ph(β2=b2)× if b2=0 then P(α=1|β1=1,β2=0) else    ...      bn0Ph(βn=bn)×P(α=1|β1=1,β2=1,...,βn=1)

Note this function Ph is recursive. The recursion terminates when it reaches a terminal claim in the argument graph – a claim without any premise arguments underneath it – in which case β will be ∅ and the function will therefore return

Pi(α=1)=P(α=1|)=P(α=1)

or the raw probability that a user accepts the terminal claim α.

Next in this Series

We can now calculate the posterior beliefs of the meta-reasoner for fairly complex argument trees, comprising arbitrarily long argument threads, and arbitrarily deep nesting of juries and sub-juries. But what about cases where there are multiple premise arguments under a claim (each starting a thread), or even multiple warrant arguments under a premise argument?

We will address this issue, as well as the problem of sampling error, in the article on Bayesian Averaging

Appendix

Derivation 1

Let’s define a new variable J that indicates that a user has participated in the sub-jury and voted on G:

JG0

Note also that all participants in the sub-jury vote on 𝐵, so

JG0B0

Our causal assumptions are that:

  1. simply voting on 𝐵 (and thus being informed of the arguments for/against 𝐴) effects probability of accepting 𝐴 and.

  2. 𝐵 is the only variable that directly effects the probability of accepting 𝐴 (the (#conditional-independence)[conditional-independence] assumption).

These assumptions give us this causal graph:

𝐽 → 𝐵 → 𝐴

We previously defined

Ph(B)=P(B|G0)=P(B|J)

We now want to calculate

Ph(A)=Pi(A|do(J))

That is, the probability that a user who voted on B would accept A if they voted on G (even though no user has actually done so).

Ph(A)=Pi(A|do(J))=bPi(B=b|J)front-door adj. formula         ×jPi(A|J=j,B=b)Pi(J=j)=bPi(B=b|J)Pi(A|B=b)TODO=b0P(B=b|J)P(A|B=b)definition of Pi=b0Ph(B=b)P(A|B=b)definition of Ph

Which is (3).

Derivation 2

Given this definition for Pi

Pi(A)=P(A|B=0(B=1C0))

We can rewrite Pi(A) as:

Pi(A)==Pi(A|b0)    definition of Pi=b0Pi(B=b)Pi(A|B=b)    law of total prob=Pi(B=0)P(A|B=0)    definition of Pi   +Pi(B=1)P(A|B=1C0)=Pi(B=0)P(A|B=0)    +Pi(B=1)c0P(C=c|B=1)    law of total prob.×P(A|B=1,C=c)

Which is (4).

Updated: