Ideas: Homomorphic filtering

Prerequisites : Basics of Fourier Transform, High pass filters in the context of Image processing.

I will be talking about this filtering method in terms of using it as an enhancement technique, but it can be leveraged for other uses also.

The Wikipedia Page and this blog explains this entire concept thoroughly.

First let me start with the illumination-reflectance model of image formation which says that the intensity of any pixel in an image (which is the amount of light reflected by a point on the object and captured by an imaging system) is the product of the illumination of the scene and the reflectance of the object(s) in the scene.

$\Large i(x,y) = l(x,y)\ r(x,y)$

where $i$ is the image, $l$ is scene illumination, and $r$ is the scene reflectance. Reflectance $r$ arises from the properties of the scene objects themselves, but illumination $l$ results from the lighting conditions at the time of image capture.

This model should make sense because reflectance is basically the ratio of light reflected to the light received (illumination). But in case you are still not feeling like this equation makes sense then consider this image.

Golf ball

Here the image is the collection of values that tell us the intensity of light reflected from each point. For example the golf ball reflects more light, while the hole reflects less. These values form a matrix in the shape of the image - $i(x,y)$ .

Now what causes the golf ball to reflect more light than the grass? it’s a property of the material which decides what wavelengths of light are reflected and which are absorbed. We call a measure of this property as reflectance and the matrix $r(x,y)$ holds that value for all these points.

The imaging system - our camera, captures the light reflected from each point which is the product of the illumination hitting each point with the “reflectance” property of the material at these respective points.

So what we want is actually only that property - $r(x,y)$ which will be faithfully reflected in the image matrix $i$ if the illumination on the objects are uniform. In the case we hide the golf ball under a shadow while shining light on the grass, the image will turn out to show the golf ball as not as white as it actually is.

Now in such images taken in non-uniform illumination setting, Illumination typically varies slowly across the image as compared to reflectance which can change quite abruptly at object edges.

If the light source is placed at such a position that it regularly illuminates the scene then we won’t have the problem of irregular illumination, but consider it being placed in a direction such that part of the scene is highly illuminated, while the remaining part of the scene is in the shadows. Now it should be obvious that due to the properties of light, the 2 areas are separated by fuzzy area where it’s not fully illuminated nor is it in the shadows. This is what I mean by the Illumination typically varies slowly across the image . For example check out the photo at the bottom of the post.

But objects stand out among the background because at the object edges the intensity of light reflected (as compared to the background) changes abruptly, as you can see the black text has a high contrast with the white background.

So as far as the image $i$ is concerned, the uneven illumination - $l$ is the noise (Multiplicative noise - because $l$ is multiplied to $r$ ) that we wish to remove. Because the illumination term is what causes part of the scene to be in the shadows, and $r$ is the signal that we need - it holds the information about everything ‘s visual properties (how much a particular point absorbs/reflects light)

In order to make this removal easy we shift the domain of the image from the spatial domain of $(x,y)$ to the $log$ domain - $( \ln(x), \ln(y))$ . Here,

$\ln(i(x,y)) = \ln(l(x,y)\ r(x,y))$
$\ln(i(x,y)) = \ln(l(x,y)) + \ln(r(x,y))$

The property’s of log makes the image such that these multiplicative components can be separated linearly in the frequency domain.

Illumination variations can be thought of as a multiplicative noise in the spatial domain while its linear noise in the log domain.

Now the question comes why linear, the point is that we need to be able to separate $L$ and $R$ , so that we can remove $L$ without harming $R$

As we can see the Fourier transform - $\mathcal{F}$ of the image (shifting it from spatial domain - $(x,y)$ to the frequency domain - $(u,v)$ ) does not separate the illumination and reflectance components. Rather the multiplicative components become convolved in the frequency domain as per the Convolution theorem.

$\mathcal{F} \{i(x,y)\} = \mathcal{F} \{l(x,y).r(x,y)\} = \mathcal{F} \{l(x,y)\}*\mathcal{F} \{r(x,y)\}$

$=> \mathcal{F} \{i(x,y)\} \neq \mathcal{F} \{l(x,y)\}.\mathcal{F} \{r(x,y)\}$

But the log of the spatial domain - $(\ln(x),\ln(y))$ when shifted to frequency domain lets us express the the image (now $I$ ) as the linear sum of $L$ and $R$ because the fourier transform is linear.

$\mathcal{F} \{\ln(i(x,y))\} = \mathcal{F} \{\ln(l(x,y))\}+\mathcal{F} \{\ln(r(x,y))\}$
$=> I(u,v) = L(u,v) + R(x,y)$

Now that separability has been achieved, we can apply high pass filtering to $I$ ,

$H(u,v)I(u,v) = H(u,v)L(u,v) + H(u,v)R(x,y)$

So as $L(u,v)$ will have energy compaction among low frequency components, while $R(x,y)$ will have the same for high frequency components. We can see that the $H(u,v)L(u,v)$ term will become attenuated to a great degree (Depending on the filter $H$ and the cutoff frequency set).

so we will have

$H(u,v)L(u,v) + H(u,v)R(x,y) \approx R(u,v)$

$=> H(u,v)I(u,v) \approx R(u,v)$

So now after the high pass filtering of the fourier of the log of actual image (Whew that was a stretch! lol)

we see that we get the fourier of the log of the reflectance component of the image - $r(x,y)$ , and that was what we were trying to isolate all along.

so to get the reflectance back in the spatial domain $(x,y)$ we take the exponential of the inverse fourier transform of $H(u,v)I(u,v)$ , so we get a close approximation to - $r(x,y)$ .

$\exp\{\mathcal{F}^{-1}\{H(u,v)I(u,v))\}\} = \exp\{\mathcal{F}^{-1}\{R(u,v)\}\} = r(x,y)$

To clarify why we do exponential and inverse fourier transform (IFT), we are just backtracking to the spatial domain, we are doing IFT to get back to the log of the spatial domain - $(\ln(x),\ln(y))$ . And then depending on the base we use to get the log domain (I assumed natural logarithm so I said exponential), if you use base 10, then we need to do to the power 10 in order to get back to (x,y).

And so we have successfully filtered out the multiplicative noise that is the slowly varying illumination - $l(x,y)$ from the image - $i(x,y)$ .

I got the code from the blog post I mentioned at the top of the post. My modifications were minor, I just added a line to convert the image to grayscale in case you wanted to operate homomorphic filtering on a color image.

and the results when I ran it on a irregularly illuminated image was pretty awesome - Output of homomorphic filtering with HPF

I know it’s tough to argue that the right image is enhanced visually w.r.t human eyesight, but as you can see the “irregular illumination” has disappeared. This shows us how homomorphic filtering is just another tool in our toolbox and has its own use cases.

Written with StackEdit.

Ideas

Wednesday, October 7, 2015

Homomorphic filtering

No comments:

Post a Comment