Sunday, November 22, 2009

YouTube supports 3D stereoscopic video

Google's video service YouTube now supports stereoscopic video. This is great news. I predict that soon it will be possible to stream stereoscopic YouTube videos to stereoscopic monitors.

The only technical information so far is one very long help thread. The Google engineer behind 3d YouTube, "YouTube Pete", participates in that thread.

I would like to take a moment to thank YouTube Pete for his beautiful work on the 3D YouTube project. Kudos to Pete. It is much appreciated.

My hummingbird video



I made a hummingbird video to test out the 3d features myself. The embedded video below does not show the 3D interface. You must go to the YouTube page itself to see the full range of possibilities. Grab a pair of red/blue 3D glasses if you have one.



Remember to check out the original movie to see all of the 3D viewing options.

This hummingbird movie could be improved in several ways
  1. The left side is out of focus. I meant to set the focus for both cameras to 15 cm, but it looks like the focal length of the left eye was set too short.
  2. The sound doesn't seem to work. I plugged in a microphone, and selected the one audio option that was available in AmCap, but I don't hear any sound in the video. This needs to be investigated.
  3. I should register Stereoscopic Multiplexer, to avoid those watermarks on the video. It will cost about $90. Ouch.
  4. It would be good to get more light on the bird. Unfortunately, the sun won't shine on my patio until summer.
  5. The format is Left-Right (parallel), but the emerging YouTube standard is Right-Left (cross-eye), so I should use the Right-Left convention in the future. Plus I have an easier time free-viewing cross-eye, so it will be more convenient for me when viewing embedded videos like the one above. I used the YouTube tag "yt3d:swap=true" to correct for this inversion.

Other YouTube 3D videos



The following are examples of other stereoscopic videos on YouTube, created by others:

This so-called biodiversity documentary contains professional-quality footage of domesticated ducks, geese, and honeybees in India. The narration is done with a top-quality computer generated voice. The voice is only slightly creepy.



This next one is taken with a helmet camera. It is interesting and entertaining. It includes some cityscape images. Unfortunately, a cityscape shows little depth when using a normal human interpupillary distance of 60 mm or so. Hyperstereo might have been nice here.



There are many many other stereoscopic videos on YouTube. Search for "yt3d" on YouTube.

How I made the hummingbird video



I created my hummingbird video using two USB pen cameras. So I could get the two cameras as close as possible. This setup is suited for small, close subjects, such as hummingbirds. Because the two cameras are only 14 mm apart, as opposed to the 60 mm separation of human eyes, my setup yields a view as seen by another hummingbird, rather than what would be seen by a person. This is called hypostereo.

Two USB pen cameras and a portable netbook style computer are the basis of my stereoscopic video system. I created a custom bracket for the cameras so I can mount them on a tripod. The bracket is carefully shaped to compensate for the idiosyncrasies of these particular cameras. These very cheap cameras do not point in exactly the same direction.



The narrow 14 mm distance between the camera lenses is crucial to producing a subtle 3D effect with small close subjects such as the hummingbird. I chose these pen cameras because this form factor permits the smallest camera separation I could find.



Here is a shot of the whole setup prepared to take hummingbird videos.

Labels:

Saturday, August 22, 2009

Tk 8.5 is better than wxWidgets on Windows

UPDATE: It appears this issue might be fixed in a future release of wxwidgets.

I frequently write computer programs with graphical user interfaces ("GUI"s). I insist that the interfaces look good on Windows, Mac, and Linux computers. By "good", I mean that the widgets (the buttons, sliders, and what-not), look exactly like those found on most other applications developed specifically for that particular platform. For example, buttons and progress bars on Mac must have that clear blue "Aqua" look.

There are several programming tool kits which help to create native-looking user interfaces on multiple platforms. The three platforms I pay particular attention to are Windows, Mac, and Linux. Cross-platform GUI tool kits include wxWidgets, Tk, and Java Swing. This post documents the failure of wxWidgets and Java Swing to respect Windows font sizes.

Look at the following picture to see the failure of wx and Java to respect the Windows font sizes. From left to right, the test programs are in Visual Basic, python/Tk, python/wx, and Java Swing.



wxWidgets looks nice in some cases, but it has some ways to go to support native look and feel on Windows. I am working on several Windows XP systems, on which I routinely select "Large Fonts" in my desktop preferences. wxWidgets does not respect those preferences.

To see the difference, first set extra large fonts on your desktop:

Far click desktop -> Properties -> Appearance -> Font Size -> Extra Large Fonts

Next, write an application using wxWidgets and test whether it respects your font choice. I didn't think so.

If it's any consolation, Java doesn't respect the Windows font size either.

If you want to use a cross-platform widget tool kit, and your definition of "cross-platform" includes Windows, my recommendation is to use Tk 8.5.

The table below summarizes the results for the four test programs I wrote:


















GUI tool kits on Windows
Tool KitNative look-and-feel?Respects font size?
Visual basicNo(!)Yes
Tk 8.5YesYes
wx 2.8.10YesNo
Java 1.6.0YesNo


Below are the test programs I wrote to create the windows shown at the beginning of this post.


  • Visual Basic

    ' "Hello, World!" program in Visual Basic.
    Module Hello
    Sub Main()
    MsgBox("Hello, World! (VB)") ' Display message on computer screen.
    End Sub
    End Module


  • Tk 8.5 (tkinter in python 3.1)

    # Note - requires python 3.1 for ttk 8.5 support
    import tkinter as tk
    import tkinter.ttk as ttk

    root = tk.Tk()
    padding = 10
    panel = ttk.Frame(root, padding=padding).pack()
    label = ttk.Label(panel, text="Hello, World! (Tk)")
    label.pack(padx=padding, pady=padding)
    button = ttk.Button(panel, text="Hello", default="active")
    button.pack(padx=padding, pady=padding)
    root.mainloop()


  • wx 2.8.10 (in python 2.6 with wxpython)

    import wx

    padding = 10
    app = wx.App(0)
    frame = wx.Frame(None, -1, "Hello")
    panel = wx.Panel(frame)
    sizer = wx.BoxSizer(wx.VERTICAL)
    panel.SetSizer(sizer)
    text = wx.StaticText(panel, -1, "Hello, World! (wx)")
    sizer.Add(text, 0, wx.ALL, padding)
    button = wx.Button(panel, -1, "Hello")
    sizer.Add(button, 0, wx.ALL, padding)
    frame.Centre()
    frame.Show(True)
    app.MainLoop()


  • Java swing 1.6.0

    import javax.swing.*;
    import java.awt.Dimension;

    public class HelloWorldFrame extends JFrame
    {
    public static void main(String args[])
    {
    new HelloWorldFrame();
    }
    HelloWorldFrame()
    {
    try {
    UIManager.setLookAndFeel(UIManager.getSystemLookAndFeelClassName());
    } catch(Exception e) {}
    JPanel panel = new JPanel();
    add(panel);
    panel.setLayout(new BoxLayout(panel, BoxLayout.PAGE_AXIS));
    panel.setBorder(BorderFactory.createEmptyBorder(10,10,10,10));
    JLabel label = new JLabel("Hello, World! (java)");
    panel.add(label);
    panel.add(Box.createRigidArea(new Dimension(0, 10)));
    JButton button = new JButton("Hello");
    panel.add(button);
    pack();
    setVisible(true);
    }
    }


The wx bug tracker has had a couple of bug reports for this problem, one open for five years. Somehow I doubt they are itching to fix this problem.

The Tk source code that sets the windows correctly appears to be near line 418 of file win/tkWinFont.c in the Tk source code:


if (SystemParametersInfo(SPI_GETNONCLIENTMETRICS,
sizeof(ncMetrics), &ncMetrics, 0)) {
CreateNamedSystemLogFont(interp, tkwin, "TkDefaultFont",
&ncMetrics.lfMessageFont);
CreateNamedSystemLogFont(interp, tkwin, "TkHeadingFont",
&ncMetrics.lfMessageFont);
CreateNamedSystemLogFont(interp, tkwin, "TkTextFont",
&ncMetrics.lfMessageFont);
CreateNamedSystemLogFont(interp, tkwin, "TkMenuFont",
&ncMetrics.lfMenuFont);
CreateNamedSystemLogFont(interp, tkwin, "TkTooltipFont",
&ncMetrics.lfStatusFont);
CreateNamedSystemLogFont(interp, tkwin, "TkCaptionFont",
&ncMetrics.lfCaptionFont);
CreateNamedSystemLogFont(interp, tkwin, "TkSmallCaptionFont",
&ncMetrics.lfSmCaptionFont);
}


The wx source code has similar code in a few locations. But it appears that this technique may be only used for menu fonts and message dialog fonts.

The main problem might be that the method wxGetCCDefaultFont() in the wx source code uses SPI_GETINCONTITLELOGFONT instead of SPI_GETNONCLIENTMETRICS.

Microsoft has documentation for the NONCLIENTMETRICS data structure.

Even if the wx authors fix this today, I fear it will be a long time before the change trickles down into a wxPython release.

Saturday, August 15, 2009

Write your own stereoscopic 3D program using nVidia's "consumer" stereo driver


I have always been a fan of nVidia graphics boards because of their support for 3D stereoscopic games. But the "consumer level" (non-Quadro) stereoscopic drivers only seem to work with games. I have always wondered how to create my own applications that can use the stereoscopic drivers on less-expensive gaming video boards. Now I have found a way.

The "consumer" stereoscopic driver from nVidia only works with "full screen" games. When I started experimenting with OpenGL, I assumed that using the call "glutFullScreen()" might be enough to get the stereoscopic drivers to kick in. But it is not.

The trick is to use the glutEnterGameMode() call. I did a lot of searching on the internet, and nowhere is it mentioned that you must call glutEnterGameMode() to get the nVidia "consumer level" stereoscopic drivers to work. That is why I am sharing this blog post.

My working system is on Windows XP. I am uncertain if this approach will work with Windows Vista/7. I am a bit concerned because nVidia seems to be selling a hardware stereoscopic product these days. I am worried that my custom stereoscopic theater, which uses a pair of polarized video projectors, won't work if I upgrade my Windows version.

Here is how you can do it too, on Windows XP:
  1. Ensure you have a supported nVidia graphics board in your computer. See the stereoscopic driver users' guide for more details.
  2. Get the stereoscopic driver from nVidia. The most recent version (91.31) released for Windows XP is from 2006. That is the one I am using. Consult this driver guide for more details.
  3. Install Python 2.6 and PyOpenGL version 3.0.0, so you can conveniently create OpenGL programs in python.
  4. Familiarize yourself with OpenGL programming. I got started by following the examples of the "red book", the OpenGL Programming Guide.
  5. Study my example program, below, to learn how to call glutGameModeString() and glutEnterGameMode().
Below is the text of a complete working python program that works with the nVidia "consumer level" stereoscopic driver on my Windows XP computer. (The stereoscopic presentation only appears in the full screen gaming mode):

Modify the display() method and the animate() method to show whatever you want!

#!/cygdrive/c/Python26/python

from OpenGL.GL import *
from OpenGL.GLU import *
from OpenGL.GLUT import *
import sys


def do_nothing(*args):
"""
Empty method for glutDisplayFunc during risky transition to game mode.
"""
pass


class HelloOpenGL(object):
"""
Creates a rotating wire frame cube using OpenGL.

Pressing the "f" key toggles full screen game mode.
This full screen mode works with nVidia stereoscopic
driver for Windows XP.
"""
def __init__(self):
self.animation_interval = 100 # milliseconds
self.rotation_angle = 0.0 # degrees, starting point
glutInit("Cube.py")
glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE | GLUT_DEPTH)
glEnable(GL_DEPTH_TEST)
glutInitWindowSize(200, 200)
# Remember window id for when we return from game mode.
self.window_id = glutCreateWindow('Wire Cube')
self.initialize_gl_context()
# glutTimerFunc remains when GL context is replaced,
# so it does not go into self.initialize_gl_context()
glutTimerFunc(self.animation_interval, self.animate, 1)
glutMainLoop() # never returns

def clear_gl_callbacks(self):
"""
Set inoccuous callbacks during times when no valid context may be available.
"""
glutDisplayFunc(do_nothing)
glutMotionFunc(None)
glutKeyboardFunc(None)

def initialize_gl_context(self):
"""
When switching between full-screen and windowed modes,
initialize_gl_context() reinitializes state.
"""
glClearColor(0.5,0.5,0.5,0.0)
glutDisplayFunc(self.display)
# glutPassiveMotionFunc(self.mouse_motion)
glutMotionFunc(self.mouse_motion)
glutKeyboardFunc(self.keypress)
# establish the projection matrix (perspective)
glMatrixMode(GL_PROJECTION)
glLoadIdentity()
x,y,width,height = glGetDoublev(GL_VIEWPORT)
gluPerspective(
45, # field of view in degrees
width/float(height or 1), # aspect ratio
.25, # near clipping plane
200, # far clipping plane
)

def start_game_mode(self):
if glutGameModeGet(GLUT_GAME_MODE_ACTIVE):
return # already in game mode
glutGameModeString("800x600:16@60")
if glutGameModeGet(GLUT_GAME_MODE_POSSIBLE):
self.clear_gl_callbacks()
glutEnterGameMode()
self.initialize_gl_context()

def start_windowed_mode(self):
if glutGameModeGet(GLUT_GAME_MODE_ACTIVE):
self.clear_gl_callbacks()
glutLeaveGameMode()
# Remember the window we created at start up?
glutSetWindow(self.window_id)
self.initialize_gl_context()

def display(self):
"""
"display()" method is called every time OpenGL updates the display.
"""
# Erase the old image
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT)
# Modelview must be set before geometry is sent
# or else crash when entering stereoscopic mode.
glMatrixMode(GL_MODELVIEW)
glLoadIdentity()
gluLookAt(
0,-0.5,5, # eyepoint
0,0,0, # center-of-view
0,1,0, # up-vector
)
# Rotate about the origin as animation progresses
glRotate(self.rotation_angle, 0, 1, 0)
glPushMatrix()
try:
# Draw the cube
glutWireCube(2.0)
finally:
glPopMatrix()
glutSwapBuffers()

def mouse_motion(self, x, y):
pass

def keypress(self, key, x, y):
if key == '\033':
# Escape key leaves full screen mode
if glutGameModeGet(GLUT_GAME_MODE_ACTIVE):
self.start_windowed_mode()
elif key == "f":
# "f" key toggle full screen and windowed mode.
if glutGameModeGet(GLUT_GAME_MODE_ACTIVE):
self.start_windowed_mode()
else:
self.start_game_mode()

def animate(self, value):
"""
Periodically change the rotation angle for the cube animation.

This animate method() is called as a glutTimerFunc().
"""
self.rotation_angle += 1.0
while self.rotation_angle > 360.0:
self.rotation_angle -= 360.0
glutPostRedisplay()
# Be sure to come back for more
glutTimerFunc(self.animation_interval, self.animate, value+1)


# Run the HelloOpenGL application when this script is run directly.
if (__name__ == '__main__'):
HelloOpenGL()

Friday, July 13, 2007

Phantom cell phone vibrations

I have had a cell phone (Treo 680) for about 8 months. I keep it in my front right pants pocket. I always have it set to "vibrate". Lately, my leg has begun to vibrate right where the phone is, causing me to think that the phone is ringing.

It's really creepy.

One time I moved the cell phone away from my leg, but I could still feel the vibration in my leg. I could feel my leg actually vibrating with my hand. I couldn't get it to stop. My leg kept on "ringing" occasionally for quite some time. I have started keeping the phone in a different pocket.

Judging by the number of "I thought it was just me..." responses in a forum I found online, this is a surprisingly common phenomenon. According to this article, it happens when you are expecting a call. I don't get very many calls, so it does not take much!

Yikes!

Who's calling? Is it your leg or your cell phone? — JSCMS
Good vibrations? Bad? None at all? - USATODAY.com
Digg - Have You Noticed the Cell Phone "Phantom Vibration Syndrome"?

Monday, July 09, 2007

Using rxvt in cygwin

I don't like the default cygwin bash window.

To get a nice terminal in cygwin, I have been typing "rxvt" from the cygwin bash shell for years. After several previous abortive attempts, I have finally succeeded in creating a clickable icon that directly launches a nice rxvt (xterm-like) terminal window under Windows.

The solution I found is described at http://freemode.net/archives/000121.html.


I made a few modifications, because I like a larger font, and the batch file did not work for me without modification.

My ~/.Xdefaults file looks like this now:

! ~/.Xdefaults - X default resource settings
Rxvt*geometry: 120x40
Rxvt*background: #000020
Rxvt*foreground: #ffffbf
!Rxvt*borderColor: Blue
!Rxvt*scrollColor: Blue
!Rxvt*troughColor: Gray
Rxvt*scrollBar: True
Rxvt*scrollBar_right: True
! Rxvt*font: Lucida Console-12
Rxvt*font: fixedsys
Rxvt*SaveLines: 10000
Rxvt*loginShell: True
! VIM-like colors
Rxvt*color0: #000000
Rxvt*color1: #FFFFFF
Rxvt*color2: #00A800
Rxvt*color3: #FFFF00
Rxvt*color4: #0000A8
Rxvt*color5: #A800A8
Rxvt*color6: #00A8A8
Rxvt*color7: #D8D8D8
Rxvt*color8: #000000
Rxvt*color9: #FFFFFF
Rxvt*color10: #00A800
Rxvt*color11: #FFFF00
Rxvt*color12: #0000A8
Rxvt*color13: #A800A8
Rxvt*color14: #00A8A8
Rxvt*color15: #D8D8D8
! eof


My replacement for the default "cygwin.bat", which I call "cygwin-rxvt.bat" is as follows:

@echo off
C:
chdir C:\cygwin\bin
set EDITOR=vi
set VISUAL=vi
set CYGWIN=codepage:oem tty binmode title
set HOME=\cygwin\home\spud
rxvt -e /bin/tcsh -l


You will notice that I use tcsh rather than bash. Yes, yes, I know that hard-core UNIX geeks disdain tcsh and only use bash. Shut up. I don't care about you.

Finally, I use a simple prompt with tcsh, which has the side effect of setting the title bar for xterm-like terminals (including rxvt). I add the following line to my .tcshrc file:

set prompt="%{\033]0;%~%L\007%}\[%h\]> "

Sunday, April 22, 2007

Extracting the magnitude component of an image Fourier transform

New result!

I finally succeeded in extracting the magnitude component of the image Fourier transform (shown at right).


Recapping the story so far

I previously created a picture of a bird, and a slightly translated version of the same image. I intend to use these images to test ideas about using the Fourier transform to automatically align pairs of images to create aligned stereoscopic pairs.

The input images, show in the previous post, are summarized below:



Original image


Translated version of the original image, for testing my hypothesis.


Fourier transform of original, masked image.


Fourier transform of translated, masked image


I took the plunge and learned to write a filter using the pbmplus environment (see previous post). Here is the program as I wrote and used it for this post:


The new PGM filter I made

I understand that it is tedious to mix GIMP and PBM tools in an image processing pipeline. Perhaps I will port the FFT image processing to PBM later...

What follows next is C language source code I just now wrote for a new image filter in the PBMPlus or NetPBM image processing tool kit:

/* pgm_fourier_recast.c - read a portable graymap produced by the
** GIMP Fourier plug-in, and extract magnitude and phase components
**
** Copyright (C) 2007 by biospud@blogger.com
**
** Permission to use, copy, modify, and distribute this software and its
** documentation for any purpose and without fee is hereby granted, provided
** that the above copyright notice appear in all copies and that both that
** copyright notice and this permission notice appear in supporting
** documentation. This software is provided "as is" without express or
** implied warranty.
*/


/*
** 1) Place source file pgm_fourier_recast.c in directory with working build of netpbm/editor
** 2) Add "pgm_fourier_recast" to list of files in Makefile
** 3) "make pgm_fourier_recast" from netpbm/editor directory
*/


#include <stdio.h>
#include <math.h>
#include "pgm.h"

typedef struct pgm_image_struct {
int height;
int width;
gray maximumValue;
gray** data;
} PgmImage;

PgmImage getInputImage( int argc, char *argv[] );
PgmImage convertFourierToPhaseMagnitude(PgmImage inputImage);
void writeImageAndQuit(PgmImage outputImage);
double gimpFourierPixelToDouble(PgmImage image, int x, int y);
double getNormalizationFactor(PgmImage image, int x, int y);
gray doubleToGimpFourierPixel(double value, PgmImage image, int x, int y);

int main( int argc, char *argv[] )
{
PgmImage inputImage;
PgmImage outputImage;

inputImage = getInputImage(argc, argv);
outputImage = convertFourierToPhaseMagnitude(inputImage);
writeImageAndQuit(outputImage);
}

PgmImage getInputImage( int argc, char *argv[] ) {
const char* const usage = "[pgmfile]";
int argn;
FILE* inputFile;

PgmImage answer;

pgm_init( &argc, argv );

argn = 1;

if ( argn < argc ) {
inputFile = pm_openr( argv[argn] );
++argn;
} else {
inputFile = stdin;
}

if ( argn != argc )
pm_usage( usage );

answer.data = pgm_readpgm(
inputFile,
&answer.width,
&answer.height,
&answer.maximumValue
);

pm_close( inputFile );

return answer;
}

double gimpFourierPixelToDouble(PgmImage image, int x, int y) {
/*
** based on source code at
** http://people.via.ecp.fr/~remi/soft/gimp/gimp_plugin_en.php3
*/


gray pixel = image.data[x][y];

/*
** renormalize
** from (range 0 -> 255)
** to range (-128 -> +127),
*/

double d128 = (double)(pixel) - 128.0; /* double128() */

double bounded = (d128 / 128.0); /* unboost() */
double unboosted0 = 160 * (bounded * bounded); /* unboost() */
double unboosted = d128 > 0 ? unboosted0 : -unboosted0; /* unboost() */

double answer = unboosted / getNormalizationFactor(image, x, y);

return answer;
}

/* Normalization factor that corrects scale of Fourier transform
** pixel based upon distance from origin
*/

double getNormalizationFactor(PgmImage image, int x, int y) {
/*
** based on source code at
** http://people.via.ecp.fr/~remi/soft/gimp/gimp_plugin_en.php3
*/

double cx = (double)abs(x - (image.width + 1)/2 + 1);
double cy = (double)abs(y - (image.height + 1)/2 + 1);
double energy = (sqrt(cx) + sqrt(cy));
return energy*energy;
}

gray doubleToGimpFourierPixel(double value, PgmImage image, int x, int y) {

double normalized = value * getNormalizationFactor(image, x, y);
double bounded = fabs( normalized / 160.0 );
double boosted0 = 128.0 * sqrt (bounded);
double boosted = (value > 0) ? boosted0 : -boosted0;

/*
** renormalize
** from range (-128 -> +127),
** to (range 0 -> 255)
*/

int answer = (int)boosted + 128;
if (answer >= 255) return 255;
if (answer <= 0) return 0;
return answer;
}

PgmImage convertFourierToPhaseMagnitude(PgmImage inputImage) {
PgmImage answer;
int outRows = inputImage.height;
int outCols = inputImage.width;
int row, col;

double realDouble, imaginaryDouble;
double magnitudeDouble, phaseDouble;
gray realPixel, imaginaryPixel;
gray magnitudePixel, phasePixel;

int doUsePhase = 0;

answer.height = outRows;
answer.width = outCols;
answer.maximumValue = inputImage.maximumValue;
answer.data = pgm_allocarray( outCols, outRows );

for ( row = 0; row < outRows; ++row ) {
for ( col = 0; col < outCols; col += 2) {
/* get pixel values from image */
realPixel = inputImage.data[row][col];
imaginaryPixel = inputImage.data[row][col + 1];

/* convert to doubles */
realDouble = gimpFourierPixelToDouble(inputImage, row, col);
imaginaryDouble = gimpFourierPixelToDouble(inputImage, row, col);

/* convert real/imaginary to magnitude/phase */
magnitudeDouble = sqrt(
realDouble * realDouble +
imaginaryDouble * imaginaryDouble
);

/* convert to pixel values */
magnitudePixel = doubleToGimpFourierPixel(
magnitudeDouble,
inputImage, row, col
);

if (doUsePhase) {
phaseDouble = atan2(imaginaryDouble, realDouble);

phasePixel = (int)(256.0 * phaseDouble / (2.0 * 3.14159));
while (phasePixel > 255) phasePixel -= 256;
while (phasePixel < 0) phasePixel += 256;
}

/*
i1 = inputImage.data[row][col];
v = gimpFourierPixelToDouble(inputImage, row, col);
i2 = doubleToGimpFourierPixel(v, inputImage, row, col);
printf("%.3g\t%.3g\t%.3g\t%.3g\n",
realDouble, imaginaryDouble, magnitudeDouble, phaseDouble);
*/


answer.data[row][col] = magnitudePixel;

if (doUsePhase)
answer.data[row][col + 1] = phasePixel;
else
answer.data[row][col + 1] = magnitudePixel;

}
}

return answer;
}

void writeImageAndQuit(PgmImage outputImage) {
/* Write resulting image */
pgm_writepgm(
stdout,
outputImage.data,
outputImage.width,
outputImage.height,
outputImage.maximumValue,
0
);

/* and clean up */
pm_close( stdout );
pgm_freearray(
outputImage.data,
outputImage.height
);

exit( 0 );
}


Original vs. translated images in Fourier magnitude space:

Phew! After writing this filter, I created the following "magnitude only" versions of the test images:


Original: Magnitude component of Fourier transform of original image


Translated: Magnitude component of Fourier transform of translated image

A superficial look suggests that the magnitude component is in fact very similar between the two images. But for automation, I need a quantitative measure to decide how similar two images are. More next time...

Thursday, April 19, 2007

Testing my Fourier transform hypothesis

In the past few posts I have repeatedly assumed that the magnitude component of the Fourier transform of an image will be relatively unchanged when the original image is translated vertically and/or horizontally. My next task should be either prove or disprove this hypothesis before going much further.

Let's start with two gray-scale images that differ only in horizontal alignment for testing. If my intuition is correct, the magnitude portion of the Fourier transform should differ only slightly between the two images.

I downloaded and installed NetPBM, to facilitate command line processing of images. I suspect that it will be easier for me to write new pbm filters than to write GIMP plug-ins.

One infuriating thing about NetPBM is that one of the maintainers has destroyed many of the original man pages in an effort to "simplify" the distribution. I genuinely appreciate this dude taking on the responsibility to maintain the code, but this one horrible documentation decision has caused me to curse out loud many times in the past several years. My feelings are neatly summed up by the observations of another user on the netbsd packaging discussion list:


"...I want the manual as released with the code I'm using, no changes after the fact. Release your manuals, don't blog them. it is *IMPOSSIBLE* for me to get that manual, no matter how many hoops I jump through, because you cannot (as they suggest) 'wget' an old version of the manual, one which still has manual pages instead of links to other non-Netpbm projects featured on the top page, one which has actual documentation for pnmscale rather than a three-page rant about why I should switch to Netpam..."


Hear hear.

In any case, here is a visual overview of the experiment set-up:



Original image

One thing I will need is a method to compare how similar two images are. As a control, I will be comparing the original image to itself.



Translated version of the original image, for testing my hypothesis.

If I am right about the Fourier transform, the magnitudes of the Fourier transform will be almost the same between the original image and the translated one. This will simulate the comparison of stereo pairs that do not perfectly line up.




Gray version of the translated image

To simplify the analysis, I created a gray-scale version of the images, so the issue of the color channels does not complicate the analysis.




The mask I used to "remove" the edges of the images

Recall from my earlier posting that the blurry circle mask is used to reduce edge artifacts in the Fourier transform.




Apply circle mask to untranslated image


Masked version of translated image

Finally, create the two Fourier transforms, one for the untranslated image and one for the translated image:


Fourier transform of original, masked image.


Fourier transform of translated, masked image

Next I need to extract the magnitudes of the Fourier transforms and compute the similarities between the images. I have some ideas of how to do this, but it will require more work. I expect that the PBM tools will come in handy here. More next time...