Sunday, April 15, 2007

Investigating the GIMP Fourier transform

In my previous post I began to work up how we might use the Fourier transform to help align two images that form a 3D stereoscopic image pair.

A more detailed investigation reveals that we need to ask a few more questions.

I sort of understand what the Fourier transform means for scalar data. But in an image, there are three different channels of color information, usually decomposed in one of two ways.

Two different representations of three-dimensional color data in an image pixel:
  1. red, green, and blue (RGB), or alternatively as
  2. hue, saturation, and brightness. (HSV)
For any ONE of these channels (e.g. "red"), I can kind of understand what the Fourier transform is. The transform for any single channel should result in a complex number in each pixel of the transform. Complex numbers have two components. These two components of a complex number can be represented in at least two different ways.

Two different representations of a two dimensional complex number:
  1. Real component and Imaginary component
  2. Magnitude and phase


Two ways of representing a complex number: magnitude/phase and real/imaginary

The bottom line here is that is seems to me that the Fourier transform should have twice as much data as the original image, since the Fourier transform takes regular real numbers, and generates complex numbers. So a regular 3-channel image should create a Fourier transform with 6 channels. So what exactly is in the Fourier transform generated by the GIMP plug-in?

Unfortunately the documentation for the plug-in is in French, and I have not studied French since the mid-1970s.

Understanding how the transform data are represented is especially important at this point for two reasons:
  1. The whole trick of using the Fourier transform to ignore the horizontal/vertical translation component requires that we use only the magnitude of the complex numbers (which does not depend upon the image translation), and ignore the phase component (which depends exquisitely upon the image translation).
  2. Where are the six channels of data that should be coming from the Fourier transform?
So we need to determine whether the complex Fourier transform is stored as real/imaginary components, or if it is stored as magnitude/phase components. More fundamentally, we need to know how six channels of information are being stored in the seemingly 3 or 4 channeled image data (transparency can provide an additional channel).

I did some experimentation and determined that the red channel of the Fourier transform corresponds to the red channel of the original image, etc. Excellent.

Further, the French documentation is surprisingly intelligible when filtered through AltaVista babelfish. I still don't quite understand all of the details, but it appears that the complex values are stored in pairs of subsequent pixels, representing the logarithm of the real component, followed by the logarithm of the imaginary component. This is bad news. I want the magnitude of the complex number, which is equal to the square root of the sum of the squares of the real and imaginary components (using Pythagoras' theorem). It will be hairy to extract that information. So I need to either a) find another Fourier transform image filter, b) write a GIMP plug-in that further processes these Fourier transform images, c) think of some other trick, or d) abandon this project.

By the way, if you read the English translation of the French documentation, there is a good explanation of why, near the end of the article, he compares his simulated image to a "moose". It turns out that the French word for "moose" is "orignal", while the French word for "original" is "original". The author made a typo, misspelling "original" to accidentally type another actual French word. Thus his spell-checker did not catch it. I believe he meant to say that the simulated image resembles the original image, not that it resembles a moose. Or not. Who knows?

I will cogitate some more on what to do next. More next time...

6 comments:

vegas said...

hey, I'd be interested to hear how this ended up for you.

Biospud said...

Hey vegas. What you see here in these posts is pretty much as far as I got. I hope to return to this project one of these days.

Daniel said...

TLDR, but no, the fourier transform will not contain 'twice as much information', it will contain exactly the same amount of information. The key is that the imaginary component of the input data is always zero (images are real only). Thus you can find a representation of the FT which as well is real only.

Biospud said...

"Thus you can find a representation of the FT which as well is real only."

You flatter me Daniel. I'm sure you will agree that the Fourier transform of a real-valued function has complex values. But I do not know how to find a "representation" of that Fourier transform that is real only. Unless you mean that the complex Fourier coefficients can be represented as either (phase, amplitude) or (r cos theta, r sin theta) or (real, imaginary). But that would be using twice the number of data values. Which is exactly what you are saying is unnecessary. Please explain further. I would like to know how to "represent" the Fourier transform of a real valued function using only real values.

Adriano Ferrari said...

Hi,
What Daniel means is that half of the transformed data is superfluous, because it must cancel out the other half in order to end up with a purely real image when you transform back.

Another way to think of this is that you are giving it the same amount of information, except the "imaginary channel" of the original image is all zeros.

Good luck!
Adriano

Anthony Thyssen said...

Quote: "The bottom line here is that is seems to me that the Fourier transform should have twice as much data as the original image"

Well in one sense as two images are produced for a FFT, their is twice as much data.

But in a practical sense that data is repeated twice! Only half (say the left half) of each magnitude and phase image is needed, as the other half is simply a repeat of the other (rotated 180 degrees) about the origin.

Note that origin itself only needs the magnitude component (DC value) as the Phase value for this component is always zero (it is the constant average color of the image).

So while twice as much data is generated, only half is actually needed, or used.

Anthony Thyssen