Infinity and beyond!: November 2017

Friday 24 November 2017

Moore-Penrose Pseudoinverse

Generalization of the inverse of a matrix.

I believe, we pay too much attention to implementation, and too less
attention in the study of the concept that is implemented. I have been
on then teams of many Data Science and Machine Learning projects, and
I would always reiterate on one simple idea; that is, “If you do not
know the math, you don’t know it at all.”

This is a piece of philosophy I deeply believe in. With the advent of packages like numpy, matplotlib, scikit-learn etc., implementing a machine learning model with a moderately difficult data set and problem is fairly simple.
The magic then stays in being able to tweak the algorithm and getting something new (or weird) out of the model. And, for you to be capable of doing so, you will have to know the mechanism behind it.
The Moore-Penrose pseudoinverse in the soul of PCA (Principal Component Analysis), one of the most popularly used Dimensionality reduction techniques.

How do we define the inverse of a matrix?
Provided that the matrix is a square matrix and non-singular, we simple divide the adjoint of the matrix with its determinant.
Mathematically, for $A_{m \times m}$ and $|A| \neq 0$ , the inverse of $A$ is defined as,
$A^{-1} = \frac{ \text{adj.} A}{|A|} \tag*{(1)}$

Of course, the above method is computationally very expensive. Hence, we can get the inverse of the matrix recursively using the Fadeev-LeVerrier equation ( Read about that in this blog of mine).

Now, how do we deal with matrices that are non-square? How do you find the inverse of a matrix that looks like this,
$B = \begin{bmatrix} x_{11} & x_{13} \\ x_{21} & x_{23}\\ x_{d1} & x_{d3} \end{bmatrix}_{3\times 2}$
This is where the Generalization of inverse of a matrix happens, named the Moore-Penrose Pseudoinverse.

For every $A_{m \times n}$ , there exists a pseudoinverse $A^{\dagger}_{n \times m}$ . ( $A^\dagger$ is read as “A dagger”).
$A^{\dagger}$ is mathematically defined as,
$A^{\dagger}=(A^{T}A)^{-1}A^{T} \tag*{(2)}$
This is dimensionally consistent. Please check and verify.

Now, say we have,
$Q = \begin{bmatrix} 1 \\ 2 \end{bmatrix}$
It is impossible to find $Q^{-1}$ by the conventional method $(1)$ . So, we use the Generalized Inverse at $(2)$ .
So,
$Q^{T} = \begin{bmatrix} 1 && 2 \end{bmatrix}$
So, $Q^{T}Q$ comes out to be $[5]_{1 \times 1}$ . So, $(Q^{T}Q)^{-1}$ comes out to be $\frac{1}{5}$ .
Hence,
$Q^{\dagger}=(Q^{T}Q)^{-1}Q^{T} =\frac{1}{5} \begin{bmatrix} 1 && 2 \end{bmatrix} = \begin{bmatrix} \frac{1}{5} && \frac{2}{5} \end{bmatrix}$ which is the pseudoinverse or the generalized inverse.

For a square matrix (i.e., $m \times m$ ),
$A^{\dagger}=A^{-1}$
In detail,
$(A^{T}A)^{-1}A^{T}=\frac{\text{adj. A}}{|A|}$

Some properties of the generalized inverse are,
1. $AA^{\dagger}A=A$
2. $A^{\dagger}AA^{\dagger}=A^{\dagger}$
3. $(AA^{\dagger})=AA^{\dagger}$
One important point to remember is, $A^{\dagger}$ always exists and is unique.

Cheers!

Friday 17 November 2017

The Linear Quadratic Regulator

Optimal Control and Linear-Quadratic-Regulator (LQR)

Today, I will not write an introductory passage to write off my blog. Because, writing an introduction to Optimal Control in itself will required a blog. However, I will add in small tidbits as and when needed.
To understand the topic, we need some basic definitions with us.

1.
A control system can be represented in terms of State Space, as follows,
$\boxed{\dot{x}(t) = A(t)x(t)+B(t)u(t) \\ y(t)=C(t)x(t)+D(t)u(t)}$
In the above formulation,
$x(.)$ is the state vector; $x(t) \in \mathcal{R}^{n}$ .
$y(.)$ is the output vector; $y(t) \in \mathcal{R}^{q}$ .
$u(.)$ is the input vector; $u(t) \in \mathcal{R}^{p}$ .
$A(.)$ is the System Matrix; $\mathrm{dim}[A(.)]=n\times n$ .
$B(.)$ is the Input Matrix; $\mathrm{dim}[B(.)]=n\times p$ .
$C(.)$ is the Output Matrix; $\mathrm{dim}[C(.)]=q\times n$ .
$A(.)$ is the Feed-forward Matrix; $\mathrm{dim}[D(.)]=q\times p$ .
Now, for a system to be controllable, we first define a matrix $\mathcal{Q_c}$ , called the controllability matrix, such that,
$\boxed{\mathcal{Q}_c=\begin{bmatrix} B & AB & A^{2}B & \ldots & A^{n-1}B \\ \end{bmatrix}}$
The system is controllable if $\mathcal{Q}_c$ has full row rank (i.e. rank( $\mathcal{Q}_c$ ) $=n$ ).

We will assume that we deal with Controllable systems only.

Usually, a single input system’s state feedback controller is designed using the Eigen-value method, or Pole Placement method.

2.
Pole placement method is the methodology of finding the control vector $\mathcal{U}$ in the form $-\mathcal{k}x$
So, the state space representation changes as,
$\dot{x}=(A-Bk)x$
$k$ is found as,
$|sI-(A-Bk)|=(s-\mu_1)(s-\mu_2)...(s-\mu_n)$
Here, $\mu$ are the desired pole locations. Note that $k$ is defined as $k=\begin{bmatrix} k_1 & k_2 & k_3 & \ldots & k_n\ \end{bmatrix}$

However, for a multi-input system the feedback gain i.e. $k$ is not unique.
Linear Quadratic Control strategy is used to deal with this issue.

Now, we dive into the Linear Quadratic Regulator (LQR) formulation, for an $m$ -input and $n$ -state system with $x \in \mathcal{R}^n$ , $u \in \mathcal{R}^m$ . Consider a system,
$\dot{x}=A(t)x(t)+B(t)u(t) \text{ provided }x(0)=x_0 \tag*{(1)}$
Our aim is to find an open loop control $u(\tau)$ , for $\tau \in [t_0, t_f]$ such that we minimize:
$\boxed{J(u, x_0, t_0, t_f) = \int_{t_0}^{t_f}[x^{T}(t)Q(t)x(t)+u^{T}(t)R(t)u(t)]dt+x(t_f)^{T}Sx(t_f) } \tag*{(2)}$
where $Q(t)$ and $S$ are symmetric positive semi-definite $n \times n$ matrices.
$R(t)$ is a symmetric positive definite $m \times m$ matrix. Note that $x_0$ , $t_0$ and $t_f$ are fixed and given data.
The controller aim is to basically keep $x(t)$ close to 0 especially at $t_f$ , which is the final time.
In $(2)$ ,

$x^T(t)Q(t)x(t)$ works against the transient response.
$x^T(t_f)Sx(t_f)$ works against the finite state.
$u^T(t)R(t)u(t)$ works against the control effort.

The above formulation can regulate the output $y(t) = C(t)x(t)$ near $0$ .
Note that, we can define, $S$ and $Q(t)$ as $C^T(t)W(t)C(t)$ where, $W(t) \in R^{r \times r}$
We can now have a theorem as follows,

For a system with fixed initial and final conditions, $\dot{x}=f(x,u,t)$ ; and clearly $x(t_0)=x_0$ . We define our time horizon as $[t_0,t_f]$ such that $t \in [t_0,t_f]$ . We find $u(t)$ such that our cost function, $J$ is minimized. $J$ is defined as,
$J(u(.),x_0)= \phi(x(t_f))+\int_{t_0}^{t_f}L(x(t), u(t), t)dt$
Here, the first term of $J$ is the final cost and the second term is the recurring cost.

Now, we will formulate some important functions that will convert the $J$ which is a constrained optimal control problem to a unconstrained optimal control problem. [THIS MAY NOT MAKE SENSE TO YOU, WHICH IS NATURAL. HOLD ON].
$\boxed{\dot{\lambda}=-H_x=-\frac{\partial{L}}{\partial{x}}-\lambda^T\frac{\partial{f}}{\partial{x}}} \tag*{(3)}$
Note that, $\lambda(t)$ ( $\in R^n$ ) is called the Lagrangian.
$H$ is the Hamiltonian operator. Defined in terms of $L$ and $\lambda$ as in $(3)$ . Or it can be defined as,
$\boxed{H(x,u,t) := L(x,u,t)+\lambda^T(t)f(x,u,t)} \tag*{(4)}$
The above definition is in terms of $L$ as defined in the theorem. So, we define $\lambda$ in the same lines. Just for convenience of computation.
$\boxed{\lambda^T(t_f)=\frac{\partial \phi}{\partial x}(x(t_f))} \tag*{(5)}$
$(5)$ can be written as $\lambda^T(t_f)=\phi_x(x(t_f))$
Equation $(3)$ , $(4)$ and $(5)$ together form a set of $2n$ differential equations (in $x$ and $\lambda$ , obviously) with split boundary conditions at $t_0$ and $t_f$ . Now, we can easily define $u(t)$ in terms of $x$ or/and $\lambda$ .
As mentioned earlier, the solution is found by converting $J$ from a constrained optimal problem to a constrained optimal problem using a Lagrange multiplier function $\lambda(t)$ :
$\boxed{\bar{J}(u,x_0)=J(u(.),x_0)+\int_{t_0}^{t_f} \lambda^T [f(x,u,t)-\dot{x}]dt} \tag*{(6)}$
Notice that,
$\frac{d}{dt}(\lambda^T(t)\dot{x}(t))=\dot{\lambda}^T(t)x(t)+\lambda^T(t)\dot{x} \tag*{(7)}$
Therefore,
$\int_{t_0}^{t_f}\lambda^T\dot{x}\ dt=\lambda^T(t_f)\dot{x}(t_f)-\lambda^T(t_0)\dot{x}(t_0)-\int_{t_0}^{t_f}\dot{\lambda}^Tx\ dt \tag*{(8)}$
As the Hamiltonian Function is defined in $(4)$ , thus,
$\boxed{\bar{J}=\phi(x(t_f))-\lambda^T(t_f)x(t_f) + \lambda^T(t_0)x(t_0)+\int_{t_0}^{t_f}[H(x(t),u(t),t)+\dot{\lambda}(t)x(t)] \ dt} \tag*{(9)}$
The necessary condition for an optimal solution is $\delta \bar{J}$ of the modified cost with respect to all variations of the system be minimal at all times from $t_0$ to $t_f$ .
We will define $\delta \bar{J}$ analytically in the next post and formulate the Riccati Equation that will lay the foundation to some amazing control strategies.
Cheers!

Sunday 5 November 2017

i!

Define the Factorial of a Complex number.

In usual sense, factorial is defined as,
$n! = \prod_{k=1}^{n} k=1 \cdot2\cdot3\ldots(n-2)\cdot(n-1)\cdot n \tag*{(1)}$
Now, the not so usual definition is based on the famous Gamma Function,
$\Gamma(t) = \int_0^\infty e^{-x}x^{t-1}\mathrm{d}x \tag*{(2)}$
There is an unique and very useful property,
$\Gamma(n+1)=n! \tag*{(3)}$
To extend $(3)$ into the complex domain, we will first have to go through Analytic Continuation, please read about Analytic Continuation here.
Therefore, after analytic continuation, we can write it as,
$z! \overset{\text{def}}{=} \Gamma(z+1) \tag*{(4)}$
For, $z \in \mathcal C-{-1,-2...}$
So, now, clearly,
$i!= \Gamma(i+1) \tag*{(5)}$
By $(3)$
$\Gamma(i+1)=\int_0^\infty e^{-x} x^{(i+1)-1}\mathrm{d}x \tag*{(6)}$
Clearly,
$\Gamma(i+1)=\int_0^\infty e^{-x} x^{i}\mathrm{d}x \tag*{(6)}$
For easier computation, please catch that,
$x^i = e^{i \ln x} = \cos ( \ln x)+i \sin( \ln x)$
Let’s break it down,
$i! = \int_0^\infty e^{-x} (\cos ( \ln x)+i \sin( \ln x))\mathrm{d}x$
$\implies i! = \int_0^\infty e^{-x} \cos ( \ln x)\mathrm{d}x+i\int_0^\infty e^{-x} \sin ( \ln x)\mathrm{d}x$
If you have reached this far, you obviously know how to solve the above integral.
Cheers!