Ideas: September 2015

Saturday, September 19, 2015

Discrete images and image transforms

Here I will talk about Images and the forms in which we manipulate it.

An image can be defined as a two-dimensional function $f(x , y)$ , where $x$ and $y$ are spatial (plane) coordinates, and the amplitude of/at any pair of coordinates is called the intensity of the image at that point.

The term gray level is used often to refer to the intensity of monochrome images. Monochrome images contain just shades of a single colour.

Color images are formed by a combination of individual such images.
For example in the RGB colour system a color image consists of three individual monochrome images, referred to as the red (R), green (G), and blue (B) primary (or component) images. So many of the techniques developed for monochrome images can be extended to color images by processing the three component images individually.

A interesting thing that I came across was that the additive color space based on the RGB color, doesn’t actually cover all the colours, we used this model to fool the human eyes which have 3 types of cones which are (due to evolutionary reasons) sensitive to Red, Green and Blue light.
If we mixed Red and Green, it would look yellowish to us, but a spectrum detector would not be fooled.
Additive color is a result of the way the eye detects color, and is not a property of light.

Lets check out some basic MATLAB code to see how the 3 channels of red blue and green make up a colour image.

Images are read into the MATLAB environment using function imread,
we use % to make comments.

img = imread('lenna.png'); % img has a RGB color image
imtool(img)

with imshow command we can view the image,

Original Image

using the imtool command we can see how each pixel has the 3 corresponding intensities, the variable img will also tell us how there are 512*512 pixels, and each pixel has 3 values.

imtool in use

As you can see each pixel has an intensity value for each of the 3 channels (RGB), we can use the following MATLAB code to see the 3 monochrome images which make up the original image.

red = img(:,:,1); % Red channel
green = img(:,:,2); % Green channel
blue = img(:,:,3); % Blue channel
a = zeros(size(img, 1), size(img, 2));
just_red = cat(3, red, a, a);
just_green = cat(3, a, green, a);
just_blue = cat(3, a, a, blue);
back_to_original_img = cat(3, red, green, blue);
figure, imshow(img), title('Original image')
figure, imshow(just_red), title('Red channel')
figure, imshow(just_green), title('Green channel')
figure, imshow(just_blue), title('Blue channel')
figure, imshow(back_to_original_img), title('Back to original image')

Now with this code red, blue and green are 512*512 matrices, the 3 of which together was img. a is just a black image of same size, it’s used to make the grey monochrome images red, blue and green into shades of the colors red, blue and green respectively. The cat function concatenates arrays along specified dimension (the first argument can be 1,2,3… and its for concatenation along rows, columns and as an array of matrices in the case of 3 ) so when they use 3 they are doing the reverse of what they did in decomposing the original image which had three 512*512’s matrices.

Blue Channel

Green Channel

Red Channel

But to really get an idea of much each color contributes to the final image we need to look at the 3 images in grey monochrome, just add these 3 lines of code,

figure, imshow(red), title('Red channel - grey monochrome')
figure, imshow(green), title('Green channel - grey monochrome')
figure, imshow(blue), title('Blue channel - grey monochrome')

and we get an image where the darker it is, the less the intensity value and vice versa.

Blue In grey monochrome

Green In grey monochrome

Red In grey monochrome

Sampling and Quantization

The function we talked about above, when you consider a real life image is actually continuous with respect to the x- and y-coordinates, and also in amplitude. Converting such an image to digital form requires that the coordinates $x,y$ , as well as the amplitude $f(x,y)$ , be digitized. Digitizing the coordinate values is called sampling; digitizing the amplitude values is called quantization. Thus, when x,y, and the amplitude values off are all finite, discrete quantities, we call the image a digital image.

I recommend Digital Image Processing Using MATLAB® by Ralph Gonzalez (Author), Richard Woods (Author), Steven Eddins (Author). Chapter 2 should bring anyone up to speed with using MATLAB in image processing.

Frequency Domain of Images

Now that we have seen how we store images in discrete form, when we come to transforming it.
From what was taught to me in class we went directly into playing with the image in the frequency domain. Spatial filtering comes later, so for now let me tell you about a idea which I really wanted to write about. How the frequency domain is so important.

First of all let me refresh your memory about vector spaces, I will be drawing an analogy between a vector and a image,
the key idea is that an image can be considered to be the linear combination of the basis images just like how a vector can also be considered to be linear combination of the basis vectors!

now lets the extend the analogy, consider the vector space $R^3$ consisting of vectors of the form $\; (x,y,z) \;\; \forall \;x,y,z \in Z$

Similarly consider the set of all monochrome images, for simplicity’s sake let it be a square image with 9 pixels. we can represent such images using matrices of size $3 \times 3$ , say with a value of 0 indicating black
1 indicating white and the values in between denoting a shade of grey.

Now that the foundations have been laid, lets ask ourselfs how exactly is the “image” being stored? well its stored in a matrix with values that indicate the varying shades of grey for each pixel. The point I am trying to make here is that the image data is distributed spatially, we basically take the entity known as a image and fragment it into multiple pieces known as pixels and then we encode the data in the form of intensity of the various pixels.

Now lets take this same thing into the vector space discussion, a vector (in $R^3$ at least) has 3 components, when we store it we separate it into the 3 integer values that make it up.

Similarly lets look at the standard basis of $R^3$ ,
$\; (1,0,0) \;\; (0,1,0) \;\; (0,0,1) \;$

so, yeah we can construct any vector in $R^3$ by a linear combination of the 3 Standard basis vectors given above..

But what does that mean? and why are these 3 standard when we can take any 3 orthogonal vectors to do this job!

See what this means is that these three Standard basis vectors point in the direction of the axes of a Cartesian coordinate system and when we represent a vector $\vec{r} \in R^3$ as a linear combination of these 3 vectors, then the scalars we get as coefficients to each the basis vectors represents the amount the vector $\vec{r}$ contributes to the direction of the respective basis vector.

This has practical importance as when we have a vector which tells us say for example the velocity of a object, then it’s important to know what the component of that velocity along the three axises are, this is known as resolving into components.

So the point is when we choose a set of basis for our vector space what we are essentially doing is choosing in what way we want to represent the vectors themselves. We choose the standard basis vectors in the case of the $R^3$ vector space because the information we get (when we break any vector in that space into a linear combination of these basis vectors) is the coefficients of these basis vectors, these scalar values encode the information about the vector in a way that is useful to us!

Whew that was awesome right?!

Now comes the Image’s frequency domain part, this is not as straightforward as the vector space.

So the matrix we talked about is used to encode the image discretely in the spatial domain, but when we talk about the frequency domain what do we mean? And also what are the basis images we use in the monochrome images vector space?

Okay to answer these questions let’s think about natural images, instead of defining the shade of each and every pixel in the image what could be a better way of thinking of/encoding a image.

well the idea that DCT uses is that in normal natural images we will have portions of slowly varying pixels (The sky, ocean, skin etc). Because the pixels of a region are similar in intensity these pixels are said to be correlated, this sort of correlation makes the spatial domain method of storing the image kind of redundant right? we should stop looking at an image as a collection of pixels, instead think of a image as a literal weighted superposition of a set of basis images, where each of the basis images will be orthogonal to each other. The weights used in this linear combination will signify the amount of contribution each of the basis image makes towards the final image.

To illustrate this let’s consider a small image, with 64 pixels, a square $8 \times 8$ matrix could represent it. To see it clearly we zoom in 10x times.

The letter A

Now there are various transforms such as DCT, DFT, Hadamard, K-L etc
each of these provide us with a different way to arrive at the basis images.
Each of them have their own pro’s and con’s, For now let me show you the basis images used in the DCT transform.

The first 64 DCT basis images

If you notice you can see how there are 64 basis images, also towards the top left corner the images are smoothly varying while as we go towards the bottom left corner we can see that the image is rapidly varying. Thus when we find the scalar value corresponding to each of these basis image such that the linear combination is the small $8 \times 8$ image, then the low frequency coefficients are the coefficients of the smoothly varying images, while the high frequency coefficients are that of the rapidly varying basis images in the bottom right corner.

Thus the array of these coefficients are known as the DCT of the image and it encodes the image as the weighted superposition of a set of known basis images. For the $8 \times 8$ image given above the DCT would be

DCT of 8x8 image of A

Superimposing the coefficients over the basis images we get a proper idea of how much each basis image contributes to the final image.

DCT coefficients superimposed over basis images

Thus to get the image back we do the weighted sum, as shown below

Doing the weighted sum

The image on the left is seen to grow closer and finally become the actual $8 \times 8$ image, which is actually a pretty blurred A. The image on the middle is the product of the coefficient and the corresponding basis image (shown on the right).

So now I hope you understand what we mean by the frequency domain, these coefficients are the representation of the image in the frequency domain, and the coefficient matrix is the transformed image itself.

We saw the basis images used in the DCT transform, here as our objective was to segregate the image in terms of its smoothly varying and sharply changing patterns we used these basis states. Thus the key point I wanted to share was that these basis states are chosen out of a wish to look at the image from a particular perspective.

Just as how we used the standard basis when we wanted the vectors in terms of their components in the $x, y, z$ axes, we choose basis states for images because we want to want to understand it terms of some idea.

In brief the idea which motivates DCT is smoothness,
DFT is periodicity in the image,
Walsh–Hadamard transform is also about periodicity but has a computational edge over DFT as it doesn’t require multiplication, (only addition and subtraction needed) when computing it.

But the idea behind KL transform is brilliant and I think I’ll cover that in detail on my post on the Image Transforms.

Well as usual, any and ALL feedback is welcome.

Many thanks to,

Digital Image Processing Using Matlab,
Ralph Gonzalez, Richard Woods, Steven Eddins

Hanakus

Written with StackEdit.

Thursday, September 17, 2015

Image Processing Introduction

IC 040 IMAGE PROCESSING

This is my elective of my fifth Semester, I had to choose between Image Processing, Power Electronics, and Digital Signal Processing.

I had pegged P.E to be a really theoretical subject with definitions etc, but it turns out they have got a really awesome temp faculty who uses graphs to talk in detail about the subject.

In I.P (Image processing - an acronym I will be using from now on)
I plan to use code samples to illustrate all points that I learn, for this I will be using MATLAB, and python libraries such as PIL, numpy, mathplotlib, scipy etc. I recommend getting a mathematical package like Python(x,y) rather than installing these packages separately on your python installation.

These are my portions, and just as no subject is boring, I don’t think any subject for any course in my college has a seriously out of date syllabus. So here goes :-

Linearity and space-invariance, PSF, Discrete images and image transforms, 2-D sampling and reconstruction, Image quantization, 2-D transforms and properties.

Image enhancement - Histogram modelling, equalization and modification. Image smoothing , Image crispening. Spatial filtering, Replication and zooming, Generalized cepstrum and homomorphic filtering.

Image restoration - image observation models. Inverse and Wiener filtering. Filtering using image transforms. Constrained least-squares restoration. Generalized inverse, SVD and interactive methods. Recursive filtering.Maximum entropy restoration. Bayesian methods.

Image data compression - sub sampling, Coarse quantization and frame repetition. Pixel coding - PCM, entropy coding, runlength coding Bit-plane coding. Predictive coding. Transform coding of images. Hybrid coding and vector DPCM. Interframe hybrid coding.

Image analysis - applications, Spatial and transform features. Edge detection, boundary extraction, AR models and region representation. Moments as features. Image structure. Morphological operations and transforms. Texture Scene matching and detection. Segmentation and classification.

Yeah so when I saw the portions it looked really daunting to me too!
But here goes,

Imaging Systems and their relationship with Point Spread Function

Discrete images and image transforms

Written with StackEdit.

Point Spread Function

Imaging Systems and their relationship with Point Spread Function

Here I will be talking about what a PSF (Point Spread Function) is and what its relationship with imaging systems are, later I will talk about the space invariance property of imaging systems and how it helps us.

First of all perfect imaging systems just isn’t possible, to understand why watch this.

They explain with a simple single lens system, which focuses the light rays coming from an object (kept really far away so that the light rays hitting the lens can be assumed to be parallel to each other) to the focal point of the lens on the other side.

Even using aberration free lens (such that the point of convergence would be an actual point instead of a blurred point shape) the intensity of the light at the point would tend to infinity and the electric field would easily be large enough to ionize the surrounding air.

Here the lens is the imaging system, the object can be thought of as the input to the imaging system which gives us the image as the output.

Now that we know we will have some lower limit on the size of the image formed, what does this mean?

if we have 2 objects which are too close to each other, our imaging system may not be able to resolve these 2 distinct objects into 2 distinct images!
so the size of the smallest image it can resolve tells us how far apart these objects must be in order for it to be able to tell them apart. This ability of the imaging system to resolve details is known as Optical resolution

The image formed by a point source of light kept really far away from the imaging should give us that right? and that is the PSF!!

The point spread function (PSF) describes the response of an imaging system to a point source or point object.

The PSF in many contexts can be thought of as the extended blob in an image that represents an unresolved object.
The PSF is the impulse response of a focused optical system
The PSF is in functional terms the spatial domain version of the Optical transfer function of the imaging system.

The first point explores the fact that the distortion present even with the point source will be present in even amounts with a larger object image combination. The way I think of it is that the image formed will be resolved to about the precision set by the PSF. You will have a fuzzy region around the formed image at about the size of the PSF.

Yeah we supply the impulse input to the imaging system and record its response which is the impulse response.

The optical transfer function is defined as the Fourier transform of the impulse-response of the optical system, also called the point spread function. So as we all know the Fourier transform of the impulse response of a LTI system gives us its frequency response, and here the Fourier transform takes the PSF from its spatial domain into the frequency domain.
The optical transfer function provides a comprehensive and well-defined characterization of optical systems, so the PSF also plays a important role here.

And this video explains how Aberrations increases as $D$ increases,

$Aberrations \propto D \;\;\;\;\;\;$ because of the diffraction of light

So if $D$ is too large aberrations become a problem,

But also how $\Delta l$ decreases ( better resolution) with larger $D$ .

$\Large \Delta l = \frac{1.22f\lambda}{D}$

here,
$\Delta l$ is the size of blurred point formed (essentially the resolution)
$f$ is the focal length of the lens
$\lambda$ is the wavelength of the light
and $D$ is the size of the aperture.

Therefore we can see how the PSF gives us the measure of the imaging system, if the PSF of a imaging system is large and spread out then that means that imaging system has more aberrations etc.
But if the PSF is well contained it means the opposite and can tell us how the imaging system has negligible aberrations.

Thus the degree of spreading (blurring) of the point object is a measure for the quality of an imaging system.

But actually the PSF is much more than that, as we can see from the 4 points noted above.

As the PSF is the impulse response of the imaging system, this lets us calculate the output of the imaging system as the convolution integral of the system input with the PSF.
That is of course only if the imaging system in question is a additive linear two-dimensional imaging system. Oh and it should also be space invariant!

$\\ \Large g(x,y) = f(x,y) \ast h(x,y)$

Here $h$ is the impulse response or the PSF, $f$ is the system input and $g$ is the output, but lets look at the mathematics behind this relationship.

$P{}$ is a two dimensional system, in its most general form, is simply a mapping of some input set of two-dimensional functions $F_1(x, y), F_2(x, y),..., F_n(x, y)$ to a set of output two-dimensional functions $G_1(x, y), G_2(x, y),..., G_m(x, y)$ where $x,y$ are spatial variables.

This is the definition for a imaging system P

Also note we consider $P{}$ to be a additive linear two-dimensional imaging system.
This assumption of linearity is well founded as in non-coherent imaging systems such as fluorescent microscopes, telescopes or optical microscopes, the image formation process is linear in power and described by linear system theory. This means that when two objects A and B are imaged simultaneously, the result is equal to the sum of the independently imaged objects. In other words: the imaging of A is unaffected by the imaging of B and vice versa, owing to the non-interacting property of photons.

We can write the output G in terms of the superposition integral as follows,
Note that s and t are dummy variables used in the integral.
$\delta$ is the Dirac delta function

Output using convolution integral

The input $f(x,y)$ is written as the sum of amplitude weighted
Dirac delta functions by the sifting integral,
The imaging system’s response to the impulse input given by $\delta(x-s,y-t)$ is $h(x,y,s,t)$ (Impulse Response or The PSF)

Space Variance

Now generally if the impulse response is space variant then we have to stop at the superposition integral as the extent to which we can relate these quantities, But in the special case that the additive linear two-dimensional imaging system is space invariant, then the superposition integral reduces to the convolution integral.

enter image description here

Now comes the question what does it mean when we say the imaging system is Space Invariant?

well mathematically speaking we can just say - In the case when,

$\large h(x,y,s,t) = h( x-s, y-t)$

i.e the impulse response $h$ depends only on the factors $x-s$ & $y-t$ .

Intuitively, in an optical system this implies that the image of a point source in the focal plane will change only in location, not in functional form, as the placement of the point source moves in the object plane.

That’s it folks looks like I wrote more than a thousand words, Any sort of feedback is welcome.

Many thanks to,

Jiří Jan
Department of Biomedical Engineering
Brno University of Technology
Czech Republic

William K Pratt
Author of Digital Image Processing: PIKS Scientific Inside

And of course Wikipedia!

Written with StackEdit.

Saturday, September 12, 2015

Hello World

I am getting back to blogging after a kind of sabbatical,
lets get to learning and sharing.

I plan to use this post to test MarkDown and pagedown-extra
and also get familiar with StackEdit.

we can use control+B to make Bold text

while control+I is for Italic

I made a link to google using control+L
I guess cause L stands for Link

ctrl+B gets a Blockquote

ctrl+K is to write Code

ctrl+O for Ordered lists
which are numbered as we can see.

ctrl+R to add a horizontal Rule

ctrl+U for unordered lists
for which bullets are used.

and ctrl+H for a Heading

This is a double hashed heading

We use latex to write equations given below

integrals such as the Gamma function, fractions and powers

$\Gamma(z) = \int_0^\infty t^{z-1}e^{-t}dt \ f(x) = \frac{1}{x^6}.$

we can have sigma with limits and variables with subscripts

$g(n) = \sum\limits_{i=1}^n x_i \ h(n) = \prod\limits_{i=1}^n x_i$

Other symbols include

$10\div90 = \frac{1}{9} = 0.\overline{11} \ \frac{\partial x}{dt} = \infty \; \; \forall ,x$

We can also add comments to the document.

I used ctrl+G to add this image

enter image description here

Well my publishing of the document seems to be a problem,
on my wordpress site the Latex isn’t loading!
But on my blogger I got it working by using a dynamic template

Hope to add more later, signing off

Aditya A Prasad

Written with StackEdit.