DAG with No Tears
NOTE: This post assumes you are acquainted with graphical models
Motivation
One of the challenging problems in graphical models is estimating the structure of Bayesian Networks (DAG’s) given data. This is called structure learning. The main idea is to construct a DAG (Directed Acyclic Graph) given the data. Until now this problem has been viewed as a combinatorial optimisation (due to the acyclicity constraint) and thus most of the solutions involved some kind of search over DAG space.
Novelty
The Authors propose a method which essentially converts this combinatorial problem into a continuous one.
The important contribution is characterising the acyclicity constraint as a smooth function. Which converts the problem into a constrained optimisation problem.
Problem Formulation
Given: n d-dimensional data points \(\{x_{i}\}^{i=n}_{i=1}\), or a matrix \(X_{n \times d}\).
The DAG can be represented by a 2D matrix with adjacency graph W. So, our
Aim: To fill up a matrix \(W_{d \times d}\) such that it would result in a DAG
Objective:
$$min_{W \in R^{d \times d}} F(W)$$
Here,
- \(h\) is the function which captures acyclicity constraint
- \(F\) is the score function
Acyclity Constraint
This is probably the most important part of the paper. They provide a nice expression for the function \(h\) $$h(W) = tr(e^{W \circ W}) - d$$
And \(h(W) = 0\) if and only if \(W\) is a DAG
Where,
- \(\circ\) denotes hadamard product (Element wise multiplication)
- \(tr\) is trace
Score Function and Optimisation
[WIP]
To-Do
- Score function and optimisation
- Algorithm
- Thresholding
- grammar check