Ideas: March 2016

Saturday, March 12, 2016

Bayesian Technique

Here we will explore the relationship between maximizing likelihood p.d.f - $p(\vec t|\vec w)$ , maximizing posterior p.d.f - $p(\vec w|\vec t)$ , minimization of the sum-of-squares error function - $E_D(\vec w)$ and the regularization technique.

When we maximise the posterior probability density function w.r.t the parameter vector using the “bayesian technique” - we need both the likelihood function and the prior. (The denominator in the bayes theorem is just a normalization constant so that doesn’t really matter.)

$p(\vec w|\vec t) \propto p(\vec t|\vec w)p(\vec w)$

The model

We have a set of inputs, $\vec x= [x_1, . . . , x_n]^T$ with corresponding target values $\vec t =[t_1, .... t_n]^T$ .

We assume that there exists some deterministic function $y$ such that we can model the relationship between these two as the sum of $y(x,w)$ with additive gaussian noise,

$t_i = y(x_i,\vec w) + \mathcal{N}(0,\beta^{-1})$

$β$ is the precision (inverse variance) of the additive univariate Gaussian noise.

We define $y$ as the linear combination of basis functions,
$y(x_i,\vec w) = w_1\phi_1(x_i)+w_2\phi_2(x_i)+...+w_p\phi_p(x_i) = \vec w^T\vec \phi(x_i)$
We define the parameter vector as $\vec w_{p \times 1}=[w_1,w_2,...,w_p]^T$ and basis vector as $\vec \phi(x_i)=[\phi_1(x_i),\phi_2(x_i),...,\phi_p(x_i)]^T$ .

This parameter vector is very important as the posterior p.d.f is the updated probability of $\vec w$ given some training data. which is found from the prior data of $\vec w$ . While the likelihood p.d.f of getting that training data given $\vec w$ .

We usually choose $\phi_1(x)=1$ because we need a bias term in the model. (to control the extent of the shift in $y$ itself - check this answer out)

For the data set as a whole we can write the set of model outputs as a vector $\vec y_{n \times 1}$ ,

$\vec y(\vec x,\vec w) = \Phi\vec w$

Here the basis matrix $\Phi_{n \times p}$ is a function of $\vec x$ and is defined with its $i$ th-row being = $[\phi_1(x_i),\phi_2(x_i),...,\phi_p(x_i)]$ for $n$ such rows.

Likelihood function

We assume that these data points $(x_i,t_i)$ are drawn independently from the distribution we would have to multiply the individual data point’s p.d.f - which are gaussian.

$p(\vec t|\vec x,\vec w, β) =\prod^n_{i=1}\mathcal{N}(t_i|\vec w^T\vec \phi(x_i), β^{-1})$

Note that the $i$ th data points p.d.f is centered around $\vec w^T\vec \phi(x_i)$ as the mean.

Does the product of $n$ univariate gaussians forms a multivariate distribution in { $t_i$ }?? I say this because we choose a gaussian prior, thus the likelihood should also be gaussian right?

Prior

We choose the corresponding conjugate prior, as we have a likelihood function which is the exponential of a quadratic function of $\vec w$ .

No clue why but for now for this to make sense let’s say that the likelihood function is also gaussian - product of all those gaussians.

Thus the prior p.d.f is a normal distribution - $\mathcal{N}(m_o,S_o)$

Posterior

The posterior p.d.f is a $\mathcal{N}(m_N,S_N)$ (as we choose a conjugate prior)

After solving for $m_N$ and $S_N$ we get,

(The complete derivation is available in Bishop - (2.116)) - coming soon

$m_N = S_N(S^{-1}_0 m_0 + βΦ^T\vec t)$
$S^{-1}_N = S^{-1}_0 + βΦ^TΦ$

The sizes are,
The mean vectors, $m_N$ and $m_o$ are both $p \times 1$ and they can be thought of as the optimal parameter vector and pseudo observations respectively.
The covariance matrices, $S_N$ and $S_o$ are both $p \times p$

We shall consider a particular form of Gaussian prior in order to simplify the treatment. Specifically, we assume a zero-mean isotropic Gaussian governed by a single precision parameter $α$ ,
$\mathcal{N}(\vec 0,α^{-1}I)$
So we basically take $m_o = \vec 0$ and $S_o = α^{-1}I_{p \times p}$

Thus if we use this prior we can simplify the mean vector and the covariance matrix of posterior p.d.f to,

$m_N = βS_NΦ^T\vec t$
$S^{-1}_N = αI + βΦ^TΦ$

Now if we take log of the posterior pdf - $\mathcal{N}(m_N,S_N)$ , in order to maximize it with respect to w, we find that what we obtain is equivalent to the minimization of the sum-of-squares error function with the addition of a quadratic regularization term, corresponding to $λ = \frac{α}{β}$ .

$\ln p(\vec w|\vec t) = \frac{-\beta}{2}\sum_{i=1}^N(t_i - \vec w^T\vec \phi(x_i))^2 - \frac{α}{β}\vec w^T\vec w = -βE_D(\vec w)-\frac{α}{β}\vec w^T\vec w$

Thus we conclude that while maximising likelihood function is equivalent to the minimization of the sum-of-squares error function, maximising posterior p.d.f is equivalent to the regularization technique.

The regularization technique is used to control the over-fitting phenomenon by adding a penalty term to the error function in order to discourage the coefficients from reaching large values.

This penalty term arises naturally when we maximize posterior p.d.f w.r.t $\vec w$

Here the minimization of the sum-of-squares error function - $E_D(\vec w)$ is also same as Maximization of the likelihood p.d.f. Taking log of $\ln p(t|\vec w, β)$ we get,

$\ln p(t|w, β) = \frac{N}{2}\ln \beta - \frac{N}{2}\ln(2π) - \beta E_D(\vec w)$

thus maximizing likelihood is equal to maximizing $- E_D(\vec w)$ (rest are all constants w.r.t $\vec w$ )

Important Resources

Less Wrong is a site I came across while researching into the author (Eliezer Yudkowsky) of one of my all time favorite book hpmor. I honestly believe in a lot of the ideas put forward in his book Rationality: From AI to Zombies .
Stack Exchange is a collection of extremely useful Q&A sites. Even though the approach to software development should not be totally dependent on Stack Overflow (As detailed here), no will argue that these Q&A sites are really really invaluable to solving certain classes of problems which would otherwise take a lot of time.
The various “kiss” Sites provide me with a lot of opportunity to relax and also turns out to be a big distraction, as of now it consists of the 3 sites,
- To devour Anime
- To read Manga
- To Enjoy Drama
- To watch Cartoons

If you want to keep some anime series permanently stored with you, AnimeOut does a brilliant job of encoding the anime series. We can get the best quality for storage there.

Check out my anime list at MAL.

Please note that Piracy is not advisable in the sense that the more that people pirate, the less money the creators make and lower quality entertainment we receive as they have to cut costs and target larger mainstream audiences.

I love listening to Animenz, check out his piano covers.
I find these stories of the Xianxia genre really interesting, Check out the novels such as Desolate Era, Martial World, Tales of Demons and Gods …

I find korean dramas and the xianxia novels give the same sort of pleasure. It’s like I feel I enjoyed it a lot but I didn’t really gain anything new (insights) from it. Unlike great anime and manga which actually change the way you think. Korean dramas literally make me hate myself when I finish watching them, I really can’t compare actually speaking.

Some books which make me ask myself - why am I blogging when such wonderful resources are available to learn?!
* Linear Algebra and Its Applications, 4th Edition 4th Edition
by Gilbert Strang
* Modern Control Engineering (5th Edition) 5th Edition by Katsuhiko Ogata

Music which are unforgettable,
* Forever and Always - Taylor Swift

Novels which you can’t put down,
The Count of Monte Cristo

I’ll add more as I think of/find them.

Written with StackEdit.

Wednesday, March 2, 2016

PI/PID control via State Space Formulation

Example I

Consider the first example of
$G(s) =\frac{k}{2s+1} = \frac{b}{s+a}$

We shall try to control this system with a PI control.

$G(t) = K_I \int e dt + K_C e(t) = [K_I, K_C] [\int e dt, e(t)]^T$

Let $x_1 = \int e dt$ , this means that $\dot x_1 = x_2 = e(t)$

$G(t) = [K_I, K_C][\int e dt, e(t)]^T = [K_I, K_C] [x_1, x_2]^T$

As per the State Space formulation,

$\dot X = AX + BU$

Note,
$\dot x_2 = \dot e$
$e = r -y = \frac{-bu}{s+a}$
$\dot e +ea = -bu$
$\dot x_2 = -x_2a -bu$
and from the definition given above,
$\dot x_1 = x_2$

Thus we can form both A and B matrices.
$A = \begin{array}{cc} 0 & 1 \\ 0 & -a \\ \end{array}$

$B = \begin{array}{cc} 0 \\ -b \\ \end{array}$

Also $U = -kX$ ,

$\dot X = AX + BU$
$sX = AX + B(-kX)$
$X(sI -A+kB) = 0$

Thus the eigenvalues for the closed loop system will be the solutions to the equation,
$|sI -A+kB| = 0$

Desired closed loop Transfer Function is,
$D(s) = \frac{1}{\lambda s +1}$
But as we have a second order system, we restate the desired closed loop T.F for comparison purposes,
$D(s) = \frac{(z s+1)}{(\lambda s +1)(z s+1)}$

Thus to find the k value we can now compare the desired to actual closed loop pole equations.

Example II

Given system T.F $G(s) = \frac{1}{s}$ ,
we need to find the Proportional control for this system.

$u(t) = K_c e(t)$

we define $x_1 = e(t)$
from the diagram we can see,
$e = r - y$
$e = \frac{-u}{s}$
$\dot e = -u$

which means $\dot x_1 = -u$

Thus,
$A_{1 \times 1} = 0$
$B_{1 \times 1} = -1$

The desired closed loop T.F is,
$G_{CL}(s) = \frac{1}{\lambda s +1}$

Similarly as the first example find $k$ by solving,
$|sI -A+kB| = \lambda s +1$

Example III

This time consider a 2nd order system,

$G(s) = \frac{k}{(z_1s+1)(z_2s+1)}$

While the desired closed loop system is,

$G_{CL}(s) = \frac{1}{\lambda s +1}$

We restate the specifications because the system is a second order system.

$G_{CL}(s) = \frac{(z_1s+1)(z_2s+1)}{(z_1s+1)(z_2s+1)(\lambda s +1)}$

We will now control this system using PID,

$u = K_I \int e dt + K_C e(t) +K_D \frac{de}{dt}$
$u =[K_I, K_C, K_D] [\int e dt, e(t), \frac{de}{dt}]^T$

Lets define the following terms,
$x_1 = \int e dt ; \dot x_1 = x_2 = e(t) ; \dot x_2 = x_3 = \frac{de}{dt}$

So we can rewrite $u$ as,
$u =[K_I, K_C, K_D] [x_1, x_2, x_3]^T$

As $e = r - y = \frac{-k u}{(z_1s+1)(z_2s+1)}$
double differentiating,
$\ddot e = f(x_1,x_2,x_3) + g(u)$

From these we can find $A_{3 \times 3}$ and $B_{3 \times 1}$

Using these we can find $k$ by solving,
$|sI -A+kB| = (\lambda s +1) (z_1s+1)(z_2s+1)$

Inference

So we have used PI/PID control via State Space formulation for the above system specifications.
IMC based tuning or synthesis may also be used for these problems
The tuning procedure is straightforward and relatively simple. The solution of all systems take the same form.
We prefer PI/PID control via State Space formulation when the desired closed loop system specifications are performance based.