In this post, I we'll briefly learn how to use Bayesian networks with the bnlearn package in R. You may find more information about the package please refer to this page.
We'll start by loading the bnlearn package.
> library(bnlearn)
The bnlearn package implements the following learning algorithms.
- Constrain-based: PC, GS, IAMB, MMPC, Hilton-PC
- Score-based: Hill-Climbing, Tabu Search
- Pairwise: ARACNE, Chow-Liu
- Hybrid: MMHC, RSMAX2
We use score based learning algorithm, Hill-Climbing with hc() function. First, we'll generate simple dataset for this tutorial.
> print(simd)
state element type color accept 1 Target long A red yes 2 Target long A red yes 3 Target normal C yellow yes 4 Target normal B green yes 5 Outlier short A red no 6 Target long B green yes 7 Outlier short A red no 8 Outlier long C yellow no 9 Target long C yellow yes 10 Outlier normal A red no
In this dataset, the 'state' has relationships with 'element' and 'accept' columns. And 'type' with 'color' column. When you create a data frame with categorical data, column should be a factor type. Otherwise, the data frame cannot be used in BN structure creation.
> str(simd)
'data.frame': 10 obs. of 5 variables:
$ state : Factor w/ 2 levels "Outlier","Target": 2 2 2 2 1 2 1 1 2 1
$ element: Factor w/ 3 levels "long","normal",..: 1 1 2 2 3 1 3 1 1 2
$ type : Factor w/ 3 levels "A","B","C": 1 1 3 2 1 2 1 3 3 1
$ color : Factor w/ 3 levels "green","red",..: 2 2 3 1 2 1 2 3 3 2
$ accept : Factor w/ 2 levels "no","yes": 2 2 2 2 1 2 1 1 2 1
Next, we'll create learning structure with hc() function.
> hc_simd <- hc(simd)
> class(hc_simd)
[1] "bn"
> hc_simd
Bayesian network learned via Score-based methods
model:
[state][type][element|state][color|type][accept|state]
nodes: 5
arcs: 3
undirected arcs: 0
directed arcs: 3
average markov blanket size: 1.20
average neighbourhood size: 1.20
average branching factor: 0.60
learning algorithm: Hill-Climbing
score: BIC (disc.)
penalization coefficient: 1.151292546
tests used in the learning procedure: 22
optimized: TRUE
We can see the structure in a plot.
> plot(hc_simd)
In this plot, state, element, accept, type, and color are called nodes. Directions between nodes are described in arcs that is a matrix containing a data of from-to elements direction.
> hc_simd$arcs
from to
[1,] "type" "color"
[2,] "state" "accept"
[3,] "state" "element"
As above arcs show that 'type' to 'color', and 'state' to 'accept' and 'element' relationships exist in our data. 'type' and 'state' are two independent groups that are not dependent on each other.
Next, we'll fit the model with data.
> simd_fitted <- bn.fit(hc_simd, data=simd)
> simd_fitted
Bayesian network parameters
Parameters of node state (multinomial distribution)
Conditional probability table:
Outlier Target
0.4 0.6
Parameters of node element (multinomial distribution)
Conditional probability table:
state
element Outlier Target
long 0.2500000000 0.6666666667
normal 0.2500000000 0.3333333333
short 0.5000000000 0.0000000000
Parameters of node type (multinomial distribution)
Conditional probability table:
A B C
0.5 0.2 0.3
Parameters of node color (multinomial distribution)
Conditional probability table:
type
color A B C
green 0 1 0
red 1 0 0
yellow 0 0 1
Parameters of node accept (multinomial distribution)
Conditional probability table:
state
accept Outlier Target
no 1 0
yes 0 1
Based on the above train data, we can perform conditional probability queries.
Let's say element="short", and accept="no", and in this condition we check state probability of "Outlier" and "Target". To query this condition we use cpquery() function.
> cpquery(simd_fitted,
event=((element=="short")&(accept=="no")),
evidence=(state=="Outlier"))
[1] 0.4856262834
state's probability of becoming "Outlier" is 48%.
> cpquery(simd_fitted,
event=((element=="short")&(accept=="no")),
evidence=(state=="Target"))
[1] 0
state's probability of becoming "Target" is 0%.
In this post, we've briefly learned how to apply Bayesian networks with bnlearn package in R.
No comments:
Post a Comment