I did this talk at the Big Data SG meet up in Feb 2012. Was a good session.
I did this interview with my O’Reilly editor, Andy Oram, as a promotional video for my book ‘Exploring Everyday Things with R and Ruby’
I did this talk in Pune, India in Mar 2012
3D has always fascinated me. When I was young my brother and I had a ViewMaster and a Pan-Pet Panorama Stereo Viewer, both of which totally bowled us over when we first saw it. My brother as usual totally took it apart and fixed it up again, multiple times while I simply spent hours goggling at the thing. I have no idea where they are now but thinking back they were my first recollection of understanding stereoscopy is.
A bit of history
It’s probably surprising to most people (at least it was to me) that the modern techniques of 3D imaging and stereoscopy dated way back before even photography. In fact the first few stereoscopic images were drawings. The picture below shows one of the earliest stereoscopic drawings by Jacopo Chimenti, a painter from Florence, Italy.
In 1838, Charles Wheatstone, a British inventor, published a paper that provided the scientific basis for stereography. He showed that the brain unifies the slightly different two-dimensional images from each eye into a single object of three dimensions. Wheatstone’s early stereographs were also drawings rather than photographs.
Photographic stereographs were first produced in 1849 by the English physicist David Brewster who improved the stereoscope and in 1849 the first true stereo camera with two lenses.
3D/stereographic imaging techniques
The principles of stereoscopy are quite simple. We see things in 3 dimensions (i.e. being able to see 3-dimensional depth) because each of our 2 eyes actually see a slightly different image. This is because our eyes positioned are apart from each other which generates what is called binocular disparity. Recreating this effect with a 2-dimensional image then allows us to ‘see’ the image in 3D.
There are a number of ways to do this but generally the principle revolves around creating a set of 2 images, one for each eye and ‘forcing’ the left eye to view the left image and the right eye to view the right image.
This method places the left image on the right side and the right image on the left side. To view the image in stereo, force your eyes to go cross-eyed, which will produce 3 images. Then slowly ease the eyes to view the middle image in 3D. This is not as silly as it sounds, and actually works though it can be a strain on the eyes (obviously).
The wiggle method surprises a lot of people (including me when I first read about it) but it can sometimes be pretty effective. Basically you use the 2 images and create a GIF that alternates between each other.
This method uses various kinds of viewers, from the 18th century Wheatstone stereoscope to the popular Holmes American stereoscope and the transparency viewers like the ViewMaster and the Pan-Pet that I grew up with. It also includes high tech head-mounted displays.
Parallax barrier and lenticular printing
These 2 methods are similar though parallax barrier is pretty high-tech while lenticular prints is as low-tech as it can be. Parallax barrier essentially places a barrier in front of an image source, usually a LCD display, with a series of precision slits, allowing each eye to see a different set of pixels. This is famously used in the Nintendo 3DS.
Lenticular printing uses a similar technique but with lenticular lenses. Lenticular prints are popular as novelty items and you’d probably encountered them in many places without knowing what it was called.
These 2 methods are often also classified as ‘autostereoscopy’ or glasses-free 3D.
This is probably the method you’re most likely to encounter nowadays in movies and in 3D TVs. I classify both passive and active glasses in this category though the actual technologies can be vastly different such as alternating different frames with special projectors and using polarized light.
Which brings us to the type of 3D image we’ll be trying out today — anaglyphs.
The idea of anaglyphs is simple. We start with the 2 left and right images again. This time they are superimposed on each other, but the left would be corrected to show only red color while the right would be corrected to show cyan color. Actually we can use other colors besides red and cyan but these 2 colors are the most popular (and patent-free).
The image is then viewed with a pair of glasses that filter soff red on the left lens and cyan on the right lens. The results is that the left eye would only see the left image and the right eye the right image, therefore generating the perception of depth.
The main problem with this technique (besides the necessity of wearing glasses) is that the colors are a bit wonky. Also if some color from the left image gets into the right eye (and vice versa) a faintly colored “ghost” will be seen. And if the filter from each lens filters off different amount of light resulting in luminance imbalance, it can easily cause headaches (happened to me lots of times during the experiments I did below).
However there are plenty of advantages of anaglyphs. Firstly there isn’t a need for fancy high-tech equipment. Anaglyph red-cyan glasses can be easily created at home or bought cheaply and as you will see below, creating anaglyphs is child’s play.
Creating anaglyphs with Ruby
Creating anaglyphs is ridiculously easy with RMagick. This is the whole script I used.
#!/usr/bin/env ruby require ‘rubygems’ require ‘rmagick’ include Magick left = ImageList.new(ARGV).gamma_correct(1,0,0) right = ImageList.new(ARGV).gamma_correct(0,1,1) anaglyph = left.composite right, CenterGravity, ScreenCompositeOp anaglyph.write(‘anagylph.jpg’)
As you can see, the real work is done in only 3 lines of code. Firstly I create an ImageList object (assuming the first parameter is the file name of the first image). Then I use #gamma_correct and filter off the greens and blues of the left image while keeping the reds. The for the right image, I do the same thing, except this time I filter off the reds while keeping the greens and blues. Finally I use #composite and blend the 2 images together using the screen blending mode (which lightens the image after blending). I used CenterGravity to place the right image at the center of the left image here but it really doesn’t matter since both images are supposed to be the same size anyway. And what remains is just to write the anaglyph back into a file.
Of course, all of these means nothing if we can’t capture the left and right images. For this there are the stereo cameras, ranging from the amazing to the weird and the totally slap-together.
Alternatively you can do the same thing with a single camera, although I wouldn’t recommend it for anything else except still shots. To do this, some recommend to use what is known as the ‘cha-cha’ technique. This requires the photographer to snap a picture then shifting weight slightly to the left or right therefore moving a few centimeters to take a reasonably good second image.
Me? I didn’t want to buy a fancy 3D camera and wasn’t inclined do the cha-cha so I applied a bit of MacGyverism on my primary camera.
It’s not perfect but it does take a reasonably good picture.
I was never much into image processing. Sure, like most programmers I dabbled into it for cropping images or doing some fancy-schmancy filtering effects stuff. I even wrote a Flickr clone for my last book which has a rather impressive photo editor (mashed up from Pixlr, not mine). But I never thought much on how those effects were done or who came up with them in the first place. That is until I met Irwin Sobel.
For those who know their image processing, this should ring bells immediately. Yes, it’s that Sobel. But a minute to give some background — Irwin is a colleague of mine working in the Mobile and Immersive Experience Lab in HP Labs. I was visiting about two weeks ago and was introduced to him and his current projects. Inevitably someone talked about the Sobel operator, a commonly used algorithm used for edge detection. I was, unfortunately, totally clueless about what it was. Not good. So not surprisingly I ended up Googling for ‘Sobel operator’ at the first possible chance and found out what it was.
The Sobel operator is an algorithm for edge detection in images. Edge detection for those who are not familiar with the term, is an image processing technique to discover the boundaries between regions in an image. It’s an important part of detecting features and objects in an image. Simply put, edge detection algorithms help us to determine and separate objects from background, in an image.
The Sobel operator does this in a rather clever way. An image gradient is a change in intensity (or color) of an image (I’m over simplifying but bear with me). An edge in an image occurs when the gradient is greatest and the Sobel operator makes use of this fact to find the edges in an image. The Sobel operator calculates the approximate image gradient of each pixel by convolving the image with a pair of 3×3 filters. These filters estimate the gradients in the horizontal (x) and vertical (y) directions and the magnitude of the gradient is simply the sum of these 2 gradients.
The magnitude of the gradient, which is what we use, is calculated using:
That’s the simplified, 2-paragraph theory behind the algorithm. If this fascinates you, you should grab a couple of books on image processing and computer vision and go through them.
Let’s look at how to implement the Sobel operator. This is simply by creating the 2 filters and running them through each pixel in the image, starting from the left and going right. Note that because the filter is a 3×3 matrix, the pixels in the first and last rows as well as the first and last columns cannot be estimated so the output image will be a 1 pixel-depth smaller than the original image.
To calculate the pixel in the right side of the equation (the one with coordinates 1,1) the following equation is used:
output pixel [1,1] = ([0,0] x -1) + ([0,1] x 0) + ([0,2] x 1) + ([1,0] x -2) + ([1,1] x 0) + ([1,2] x 2) + ([2,0] x -1) + ([2,1] x 0) + ([2,2] x 1)
To simplify matters even more, the grayscale version of the original image is usually used.
Now let’s look at the Ruby implementation
require 'chunky_png' class ChunkyPNG::Image def at(x,y) ChunkyPNG::Color.to_grayscale_bytes(self[x,y]).first end end img = ChunkyPNG::Image.from_file('engine.png') sobel_x = [[-1,0,1], [-2,0,2], [-1,0,1]] sobel_y = [[-1,-2,-1], [0,0,0], [1,2,1]] edge = ChunkyPNG::Image.new(img.width, img.height, ChunkyPNG::Color::TRANSPARENT) for x in 1..img.width-2 for y in 1..img.height-2 pixel_x = (sobel_x * img.at(x-1,y-1)) + (sobel_x * img.at(x,y-1)) + (sobel_x * img.at(x+1,y-1)) + (sobel_x * img.at(x-1,y)) + (sobel_x * img.at(x,y)) + (sobel_x * img.at(x+1,y)) + (sobel_x * img.at(x-1,y+1)) + (sobel_x * img.at(x,y+1)) + (sobel_x * img.at(x+1,y+1)) pixel_y = (sobel_y * img.at(x-1,y-1)) + (sobel_y * img.at(x,y-1)) + (sobel_y * img.at(x+1,y-1)) + (sobel_y * img.at(x-1,y)) + (sobel_y * img.at(x,y)) + (sobel_y * img.at(x+1,y)) + (sobel_y * img.at(x-1,y+1)) + (sobel_y * img.at(x,y+1)) + (sobel_y * img.at(x+1,y+1)) val = Math.sqrt((pixel_x * pixel_x) + (pixel_y * pixel_y)).ceil edge[x,y] = ChunkyPNG::Color.grayscale(val) end end edge.save('engine_edge.png')
First thing you’d notice is that I used a library called ChunkyPNG, which is PNG manipulation library that is implemented in pure Ruby. While wrappers over ImageMagick (like RMagick) is probably the defacto image processing and manipulation library in Ruby, I thought it’s kind of pointless to do a Sobel operator with ImageMagick since it already has its own edge detection implementation.
To simplify the implementation, I opened up the Image class in ChunkyPNG and added a new method that will return a grayscale pixel at a specific location. Then I created the 2 Sobel filters with arrays of arrays. I created 2 nested loops to iterate through each pixel column by column, then row by row and at each pixel I used the equation above to calculate the gradient by applying the x filter then the y filter. Finally I used the gradient and set a grayscale pixel based on the gradient value, on a new image.
Here you can see the original image, which I reused from the Wikipedia entry on Sobel operator.
And the edge detected image with the x filter applied only.
This is the edge detected image with the y filter only.
Finally this is the edge detected image with both x and y filters applied.
This short exercise might not be technically challenging but it made me appreciate the pioneers who invented things that we now take for granted. Here’s a final picture, one with myself and Irwin (he is the guy who’s sitting opposite me), and a bunch of other colleagues at HP Labs Palo Alto over lunch. Thanks Irwin, for the Sobel operator!