Bayes_Rule

Peiyuan Zhu

2023-11-02

Introduction

In Mathematical Theory of Evidence Glenn Shafer talked about how Dempster’s rule of combination generalizes Bayesian conditioning. In this document we investigate numerically how a simple Bayesian model can be encoded into the language of belief function.

Recall the Bayes Rule of conditioning in simple terms:

\[P(H|E) = \dfrac{P(H) \cdot P(E|H)} {P(E)}\] Let’s see how this is translated in the belief functions setup.

1. Simple Bayes Example

In particular, the Bayesian belief functions concentrates their masses on the singletons only, unlike more general basic mass assignment functions. For instance, in a frame \(\Theta=\{a,b,c\}\), basic mass assignment \(m(\{a\})=0.2\), \(m(\{b\})=0.3\) and \(m(\{c\})=0.5\) defines a Bayesian belief function.

In the Bayesian language, this is the prior distribution \(P(H)\). Function bca is used to set the distribution of H.

#> The prior distribution H
#>   H specnb mass
#> 1 a      1  0.2
#> 2 b      2  0.3
#> 3 c      3  0.5

The law of conditional probability is a special case of Dempster’s rule of combination that all the masses focus on the event is conditioned. For instance, basic mass assignment focuses all the masses on subset \(E =\{b,c\}\). Hence, using function bca, we set \(m(\{b,c\})=1\).

#> Setting an Event E = {b,c} with mass = 1
#>   Event specnb mass
#> 1 b + c      4    1

Now we set the computation of Bayes’s Theorem in motion.

In a first step, we use function dsrwon to combine our two basic mass assignments H and Event. The non-normalized Dempster Rule of combination gives a mass distribution H_Event composed of two parts:

  1. the distribution of the product \(P(H) \cdot P(E|H)\) on \(\Theta\);
  2. a mass allotted to the empty set \(m(\varnothing)\).
#> The combination of H and Event E
#>   H_Event specnb mass
#> 1       ø      1  0.2
#> 2       b      3  0.3
#> 3       c      4  0.5

It turns out that we can obtain the marginal \(P(E)\) from \(m(\varnothing)\): \[P(E) = 1 - m(\varnothing)\].

Hence, \(P(E)\) is nothing else than the normalization constant of Dempster’s rule of combination.

In our second step of computation we us function nzdsr, to apply the normalization constant to distribution H_Event, which gives the posterior distribution \(P(H|E)\)

#> The posterior distribution P(H|E)
#>   H_given_E specnb  mass
#> 1         b      2 0.375
#> 2         c      3 0.625

Note that H_given_E is defined only on singletons and the mass allocated to \(\Theta\) is zero. Hence \(bel(\cdot) = P(\cdot) = Pl(\cdot)\), as shown by the following table.

#>         bel disbel unc  plau rplau
#> a     0.000  1.000   0 0.000 0.000
#> b     0.375  0.625   0 0.375 0.600
#> c     0.625  0.375   0 0.625 1.667
#> frame 1.000  0.000   0 1.000   Inf

2. Example with two variables

In the first example, the conditioning event was a subset of the frame \(\Theta\) of variable H. We now show the computation of Bayes’s rule of conditioning by Dempster’s Rule in the case of two variables.

Let’s say we have the variable H defined on \(\Theta = \{a, b, c\}\) as before.

#> The prior distribution
#>   X specnb mass
#> 1 a      1  0.2
#> 2 b      2  0.3
#> 3 c      3  0.5

let’s add a second variable E with three outcomes \(\Lambda =\{d, e, f\}\) .

\(P(\{d|a\})=0.1\), \(P(\{d|b\})=0.2\) and \(P(\{d|c\})=0.7\).

This distribution will be encoded in the product space \(\Theta \times \Lambda\) by setting

\(m(\{a,d\}) = 0.1\); \(m(\{b,d\}) = 0.2\); \(m(\{c,d\}) = 0.7\)

We now do this using function bcaRel.

#> Specify information on variables, description matrix and mass vector
#> Identifying variables and frames
#>      varnb size
#> [1,]     1    3
#> [2,]     4    3
#> Note that variables numbers must be in increasing order
#> The description matrix of the relation between X and E
#>      a b c d e f
#> [1,] 1 0 0 1 0 0
#> [2,] 0 1 0 1 0 0
#> [3,] 0 0 1 1 0 0
#> [4,] 1 1 1 1 1 1
#> Note Columns of matrix must follow variables ordering.
#> Mass specifications
#>      specnb mass
#> [1,]      1  0.1
#> [2,]      2  0.2
#> [3,]      3  0.7
#> [4,]      4  0.0
#> The relation between Evidence E and X
#>   rel_EX specnb mass
#> 1    a d      1  0.1
#> 2    b d      2  0.2
#> 3    c d      3  0.7

Now we combine Prior \(P(X)\) with rel_EX. But first, we need to extent X to the space \(\Theta \times \Lambda\).

#> Prior X extended in product space of (X,E
#>            X_xtnd specnb mass
#> 1 a d + a e + a f      1  0.2
#> 2 b d + b e + b f      2  0.3
#> 3 c d + c e + c f      3  0.5

Combine X extended and E_X in the product space \(\Theta \times \Lambda\).

#> Mass distribution of the combination of X extended and E_X
#>   comb_X_EX specnb mass
#> 1         ø      1 0.57
#> 2       a d      2 0.02
#> 3       b d      3 0.06
#> 4       c d      4 0.35

As we can see, we have

  1. the distribution of the product \(P(H) \cdot P(E|H)\) on \(\Theta \times \Lambda\);

  2. a mass allotted to the empty set \(m(\varnothing)\), which is \(1 - P(E)\).

Using function nzdsr, we apply the normalization constant to obtain the desired result. Then, using function elim, we obtain the marginal of X, which turns out to be \(P(X | E = d)\)

#> The normalized mass distribution of the combination of X extended and E_X
#>   norm_comb_X_EX specnb               mass
#> 1            a d      1 0.0465116279069768
#> 2            b d      2   0.13953488372093
#> 3            c d      3  0.813953488372093
#> The posterior distribution P(X|E) for (a,d), (b,d), (c,d), after eliminating variable E
#>   dist_XgE specnb               mass
#> 1        a      1 0.0465116279069768
#> 2        b      2   0.13953488372093
#> 3        c      3  0.813953488372093