HTML Forms and Go

This is an excerpt out of my new Go Web Programming book that talks about using Go to process HTML forms sent from the browser. This sounds pretty trivial but as in much of web programming (and much of programming per se), it’s often the trivial things that stumble us.

Go Web Programming

Before we get into getting form data from a POST request, let’s take a closer look into HTML forms and see what they are. Most of the time, POST requests come in the form (pun intended) of a HTML form and often look like this:

<form action="/process" method="post">
<input type="text" name="first_name"/>
<input type="text" name="last_name"/>
<input type="submit"/>
</form>

Within the form tag, we place a number of HTML form elements like text input, text area, radio buttons, checkboxes, file uploads and so on. These elements allow users to enter data to be submitted to the server. Data is submitted to the server when the user clicks a button or somehow triggers the form submission.

We know the data is sent to the server through a HTTP POST request, and is placed in the body of the request. But how is the data formatted? The HTML form data is always sent as name-value pairs but how are these name-value pairs formatted in the POST body? It is important for us to know this because as we receive the POST request from the browser, we need to be able to parse the data and extract the name-value pairs.
The format of the name-value pairs sent through a POST request is specified by the content type of the HTML form. This is defined using the enctype attribute like this:

<form action="/process" method="post" enctype="application/x-www-form-urlencoded">
<input type="text" name="first_name"/>
<input type="text" name="last_name"/>
<input type="submit"/>
</form>

The default values for enctype is application/x-www-form-urlencoded but browsers are required to support at least application/x-www-form-urlencoded and multipart/form-data (HTML5 also supports a text/plain value).

If we set enctype to application/x-www-form-urlencoded, the browser will encode the HTML form data a long query string with the name-value pairs separated by an ampersand (&) and the name is separated from the values by an equal (=), that is the same as URL encoding, hence the name. In other words, the HTTP body will look something like this:

first_name=sau%20sheong&last_name=chang

If we set enctype to multipart/form-data, each name-value pair is converted into a MIME message part, each with its own content type and content disposition. For example, the same form data as above will now look something like this:

------WebKitFormBoundaryMPNjKpeO9cLiocMw
Content-Disposition: form-data; name="first_name"

sau sheong
------WebKitFormBoundaryMPNjKpeO9cLiocMw
Content-Disposition: form-data; name="last_name"

chang
------WebKitFormBoundaryMPNjKpeO9cLiocMw--

When would we use either one or the other? If we’re sending simple text data, the URL encoded form is better as it is simpler, more efficient and less processing is needed. If we’re sending large amounts of data, especially when uploading files the multipart-MIME form is better. We can even specify to do base64 encoding to send binary data as text.

So far we’ve only talked about POST requests, what about GET requests in a HTML form? HTML allows the method attribute to be either POST or GET, so this is also a valid format.

<form action="/process" method="get">
<input type="text" name="first_name"/>
<input type="text" name="last_name"/>
<input type="submit"/>
</form>

In this case, there is no request body (GET requests have no request body), all the data are set in the URL as name-value pairs.

Now that we know how data is sent from a HTML form to the server, let’s go back to the server and see how we use net/http to process the request.

Form

One way to extract data from the HTTP request is to extract data from the URL and the body in the raw form, which requires us to parse the data ourselves. However we normally do not need to, because the net/http library provides us with a rather comprehensive set of functions, although not named entirely correctly, normally provides us with all we need. Let’s talk about each one of them in turn.

The functions in Request that allows us to extract data from the URL and/or the body revolve around the Form, PostForm and MultipartForm fields. The data are in the form of key-value pairs (which is what we normally get from a POST request anyway). The general algorithm is:

  • Call ParseForm or ParseMultipartForm to parse the request
  • Access Form, PostForm or MultipartForm accordingly

Let’s take a look at some code.

package main

import (
"fmt"
"net/http"
)

func process(w http.ResponseWriter, r *http.Request) {
r.ParseForm()
fmt.Fprintln(w, r.Form)
}

func main() {
server := http.Server{
Addr: "127.0.0.1:8080",
}
http.HandleFunc("/process", process)
server.ListenAndServe()
}

The focus of this server is on these 2 lines:

r.ParseForm()
fmt.Fprintln(w, r.Form)

As mentioned earlier, we need to first parse the request using ParseForm, and then access the Form field.

Let’s take a look at the client that is going to call this server. We’ll create a simple, minimal HTML form to send the request to the server. Place the code in a file named client.html.

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Go Web Programming</title>
</head>
<body>
<form action="http://127.0.0.1:8080/process?hello=world&thread=123" method="post" enctype="application/x-www-form-urlencoded">
<input type="text" name="hello" value="sau sheong"/>
<input type="text" name="post" value="456"/>
<input type="submit"/>
</form>
</body>
</html>

In this form we are:

  • Sending the URL http://localhost:8080/process?hello=world&thread=123 to the server using the POST method
  • Specifying the content type (in the enctype field) to be application/x-www-form-urlencoded
  • Sending 2 HTML form key-value pairs – hello=sau sheong and post=456 to the server

Note that we have 2 values for the key hello. One of them is world in the URL and the other is sau sheong in the HTML form.

Open the client.html file directly in your browser (you don’t need to serve it out from a web server, just running it locally on your browser is fine) and click on the submit button. What you will see on the browser is:

map[thread:[123] hello:[sau sheong world] post:[456]]

This is the raw string converted version of the Form struct in the POST request, after the request has been parsed. The Form struct is a map, which keys are strings and values are a slice of strings. Notice that the map is not sorted so you might get a different sorting of the returned values. Nonetheless what we get is the combination of the query values hello=world and thread=123 as well as form values hello=sau sheong and post=456. As you can see, the values are URL decoded (there is a space between sau and sheong).

PostForm

Of course if you wanted to just get the value to the key post, you can use r.Form["post"] which will give you a map with 1 element – [456]. If the form and the URL have the same key, both of them will be placed in a slice, with the form value always prioritized before the URL value.

What if we need just the form key-value pairs and want to totally ignore the URL key-value pairs? For this we have the PostForm, which only provides key-value pairs for the form and not the URL. If we change from using r.Form to using r.PostForm in the code this is what we get:

map[post:[456] hello:[sau sheong]]

We used application/x-www-form-urlencoded for the content type. What happens if we use multipart/form-data? Make the change to the client HTML form, switch back to using r.Form and let’s find out:

map[hello:[world] thread:[123]]

What happened here? We only get the URL query key-value pairs this time and not the form key-value pairs, because PostForm only supports application/x-www-form-urlencoded. To get multipart key-value pairs from the body, we need to use the MultipartForm.

MultipartForm

Instead of using ParseForm and then calling Form on the request, we have to use ParseMultipartForm then use MultipartForm on the request. ParseMultipartForm also calls ParseForm when necessary.

r.ParseMultipartForm(1024)
fmt.Fprintln(w, r.MultipartForm)

We need to tell ParseMultipartForm how much data we want to extract from the multipart form, in bytes. Now let’s see what happens:

&{map[hello:[sau sheong] post:[456]] map[]}

This time we see the form key-value pairs, but not the URL key-value pairs. This is because MultipartForm only contains the form key-value pairs. Notice that the returned value is no longer a map, but a struct that contains 2 maps. The first map has keys that are strings and values that are slices of string while the second map is empty. It’s empty because it’s a map with keys that are strings but values that are files.

There is one last set of functions that allows us to access the key-value pairs even easier than what we’ve just went through. The FormValue function allows us to access the key-value pairs just like in Form, except that it is for a specific key and we don’t need to call ParseForm or ParseMultipartForm beforehand – the FormValue function does that for us.

From our previous example, this means if we do this in our handler function:

fmt.Fprintln(w, r.FormValue("hello"))

And we set the client’s form enctype to application/x-www-form-urlencoded, we will get this:

sau sheong

We get only sau sheong because FormValue only retrieves the first value, even though we actually have both values in the Form struct. To prove this, let’s add another line below the earlier line of code, like this:

fmt.Fprintln(w, r.FormValue("hello"))
fmt.Fprintln(w, r.Form)

This time we’ll see:

sau sheong
map[post:[456] hello:[sau sheong world] thread:[123]]

The PostFormValue function does the same thing, except that it is for PostForm instead of Form. Let’s make some changes to the code to use the PostFormValue function:

fmt.Fprintln(w, r.PostFormValue("hello"))
fmt.Fprintln(w, r.PostForm)

This time we get this instead:

sau sheong
map[hello:[sau sheong] post:[456]]

As you can see we get only the form key-value pairs.

Both FormValue and PostFormValue call ParseMultipartForm for us so we don’t need to call it ourselves, but there’s a slightly confusing gotcha that you should be careful with (at least as of Go 1.4). If we set the client form’s enctype to be multipart/form-data and try to get the value using either FormValue or PostFormValue, we won’t be able to get it even though MultipartForm has been called!

To be clearer, let’s make some changes to the server’s handler function again:

fmt.Fprintln(w, "(1)", r.FormValue("hello"))
fmt.Fprintln(w, "(2)", r.PostFormValue("hello"))
fmt.Fprintln(w, "(3)", r.PostForm)
fmt.Fprintln(w, "(4)", r.MultipartForm)

This is our result from using our form with enctype set to multipart/form-data:

(1) world
(2)
(3) map[]
(4) &{map[hello:[sau sheong] post:[456]] map[]}

The first line in the results gives us the value for hello that’s found in the URL and not the form. The second line and third line tells us why, because if we just take the form key-value pairs, we actually get nothing. That’s because FormValue and PostFormValue corresponds to Form and PostForm, and not MultipartForm. The last line in the results proves to us that ParseMultipartForm was actually called, that’s why if we try to access the MultipartForm we’ll get the data there.

We covered quite a bit in this blog post so let’s recap how these functions are different, in a nice table.

html_form_and_go

Undoubtedly the naming convention leaves much to be desired!

Go Web Programming

I’ve gone and done it. I’ve started writing another book. And not just any book, but a book on good old honest-to-goodness web programming. If I’m qualified to write about any programming topic that’s probably it. I’ve been doing web application programming so long now that almost the first thing I check out in any new programming language I get to know is their http library. I’ve written web applications in almost every programming language I know, and some I’m not even sure I actually know.

So now this. And on the Go language. We’ll see.

http://www.manning.com/chang/

Create 3D anaglyph images with 3 lines of Ruby code

3D has always fascinated me. When I was young my brother and I had a ViewMaster and a Pan-Pet Panorama Stereo Viewer, both of which totally bowled us over when we first saw it. My brother as usual totally took it apart and fixed it up again, multiple times while I simply spent hours goggling at the thing. I have no idea where they are now but thinking back they were my first recollection of understanding stereoscopy is.

ViewMaster
ViewMaster
Pan-Pet Panorama Stereo Viewer
Pan-Pet Panorama Stereo Viewer

A bit of history

It’s probably surprising to most people (at least it was to me) that the modern techniques of 3D imaging and stereoscopy dated way back before even photography. In fact the first few stereoscopic images were drawings. The picture below shows one of the earliest stereoscopic drawings by Jacopo Chimenti, a painter from Florence, Italy.

Jacopo Chimenti's first stereoscopic image
Jacopo Chimenti’s first stereoscopic image

In 1838, Charles Wheatstone, a British inventor, published a paper that provided the scientific basis for stereography. He showed that the brain unifies the slightly different two-dimensional images from each eye into a single object of three dimensions. Wheatstone’s early stereographs were also drawings rather than photographs.

Wheatstone's stereoscope
Wheatstone’s stereoscope

Photographic stereographs were first produced in 1849 by the English physicist David Brewster who improved the stereoscope and in 1849 the first true stereo camera with two lenses.

3D/stereographic imaging techniques

The principles of stereoscopy are quite simple. We see things in 3 dimensions (i.e. being able to see 3-dimensional depth) because each of our 2 eyes actually see a slightly different image. This is because our eyes positioned are apart from each other which generates what is called binocular disparity. Recreating this effect with a 2-dimensional image then allows us to ‘see’ the image in 3D.

There are a number of ways to do this but generally the principle revolves around creating a set of 2 images, one for each eye and ‘forcing’ the left eye to view the left image and the right eye to view the right image.

Freeviewing

This method places the left image on the right side and the right image on the left side. To view the image in stereo, force your eyes to go cross-eyed, which will produce 3 images. Then slowly ease the eyes to view the middle image in 3D. This is not as silly as it sounds, and actually works though it can be a strain on the eyes (obviously).

Stereogram with cross-eyed method
Stereogram with cross-eyed method

Wiggle method

Stereogram with wiggle method
Stereogram with wiggle method

The wiggle method surprises a lot of people (including me when I first read about it) but it can sometimes be pretty effective. Basically you use the 2 images and create a GIF that alternates between each other.

Viewers

This method uses various kinds of viewers, from the 18th century Wheatstone stereoscope to the popular Holmes American stereoscope and the transparency viewers like the ViewMaster and the Pan-Pet that I grew up with. It also includes high tech head-mounted displays.

Holmes' American Stereoscope (reproduction)
Holmes’ American Stereoscope (reproduction)

Parallax barrier and lenticular printing

These 2 methods are similar though parallax barrier is pretty high-tech while lenticular prints is as low-tech as it can be. Parallax barrier essentially places a barrier in front of an image source, usually a LCD display, with a series of precision slits, allowing each eye to see a different set of pixels. This is famously used in the Nintendo 3DS.

Nintendo 3DS
Nintendo 3DS

Lenticular printing uses a similar technique but with lenticular lenses. Lenticular prints are popular as novelty items and you’d probably encountered them in many places without knowing what it was called.

These 2 methods are often also classified as ‘autostereoscopy’ or glasses-free 3D.

Difference between parallax barrier and lenticular printing
Difference between parallax barrier and lenticular printing
Lenticular print of a promotion item
Lenticular print of a promotion item

3D glasses

This is probably the method you’re most likely to encounter nowadays in movies and in 3D TVs. I classify both passive and active glasses in this category though the actual technologies can be vastly different such as alternating different frames with special projectors and using polarized light.

Which brings us to the type of 3D image we’ll be trying out today — anaglyphs.

The idea of anaglyphs is simple. We start with the 2 left and right images again. This time they are superimposed on each other, but the left would be corrected to show only red color while the right would be corrected to show cyan color. Actually we can use other colors besides red and cyan but these 2 colors are the most popular (and patent-free).

The image is then viewed with a pair of glasses that filter soff red on the left lens and cyan on the right lens. The results is that the left eye would only see the left image and the right eye the right image, therefore generating the perception of depth.

Red-cyan anaglyph glasses
Red-cyan anaglyph glasses

The main problem with this technique (besides the necessity of wearing glasses) is that the colors are a bit wonky. Also if some color from the left image gets into the right eye (and vice versa) a faintly colored “ghost” will be seen. And if the filter from each lens filters off different amount of light resulting in luminance imbalance, it can easily cause headaches (happened to me lots of times during the experiments I did below).

However there are plenty of advantages of anaglyphs. Firstly there isn’t a need for fancy high-tech equipment. Anaglyph red-cyan glasses can be easily created at home or bought cheaply and as you will see below, creating anaglyphs is child’s play.

Creating anaglyphs with Ruby

Creating anaglyphs is ridiculously easy with RMagick. This is the whole script I used.

#!/usr/bin/env ruby

require ‘rubygems’
require ‘rmagick’
include Magick

left = ImageList.new(ARGV[0]).gamma_correct(1,0,0)
right = ImageList.new(ARGV[1]).gamma_correct(0,1,1)
anaglyph = left.composite right, CenterGravity, ScreenCompositeOp

anaglyph.write(‘anagylph.jpg’)

As you can see, the real work is done in only 3 lines of code. Firstly I create an ImageList object (assuming the first parameter is the file name of the first image). Then I use #gamma_correct and filter off the greens and blues of the left image while keeping the reds. The for the right image, I do the same thing, except this time I filter off the reds while keeping the greens and blues. Finally I use #composite and blend the 2 images together using the screen blending mode (which lightens the image after blending). I used CenterGravity to place the right image at the center of the left image here but it really doesn’t matter since both images are supposed to be the same size anyway. And what remains is just to write the anaglyph back into a file.

Of course, all of these means nothing if we can’t capture the left and right images. For this there are the stereo cameras, ranging from the amazing to the weird and the totally slap-together.

3D World 120 Tr-Lens: Stereoscopic Three Lenses 3D Camera
3D World 120 Tr-Lens: Stereoscopic Three Lenses 3D Camera
Fujifilm FinePix REAL 3D W3
Fujifilm FinePix REAL 3D W3
2 instant cameras as conjoined stereo camera
2 instant cameras as conjoined stereo camera

Alternatively you can do the same thing with a single camera, although I wouldn’t recommend it for anything else except still shots. To do this, some recommend to use what is known as the ‘cha-cha’ technique. This requires the photographer to snap a picture then shifting weight slightly to the left or right therefore moving a few centimeters to take a reasonably good second image.

Me? I didn’t want to buy a fancy 3D camera and wasn’t inclined do the cha-cha so I applied a bit of MacGyverism on my primary camera.

MacGyver'ed dual iPhones with rubber-bands
MacGyver’ed dual iPhones with rubber-bands

It’s not perfect but it does take a reasonably good picture.

Left image of Kai Wen
Left image of Kai Wen
Right image of Kai Wen
Right image of Kai Wen
Anaglyphic image of Kai Wen
Anaglyphic image of Kai Wen

Edge detection with the Sobel operator in Ruby

I was never much into image processing. Sure, like most programmers I dabbled into it for cropping images or doing some fancy-schmancy filtering effects stuff. I even wrote a Flickr clone for my last book which has a rather impressive photo editor (mashed up from Pixlr, not mine). But I never thought much on how those effects were done or who came up with them in the first place. That is until I met Irwin Sobel.

For those who know their image processing, this should ring bells immediately. Yes, it’s that Sobel. But a minute to give some background — Irwin is a colleague of mine working in the Mobile and Immersive Experience Lab in HP Labs. I was visiting about two weeks ago and was introduced to him and his current projects. Inevitably someone talked about the Sobel operator, a commonly used algorithm used for edge detection. I was, unfortunately, totally clueless about what it was. Not good. So not surprisingly I ended up Googling for ‘Sobel operator’ at the first possible chance and found out what it was.

The Sobel operator is an algorithm for edge detection in images. Edge detection for those who are not familiar with the term, is an image processing technique to discover the boundaries between regions in an image. It’s an important part of detecting features and objects in an image. Simply put, edge detection algorithms help us to determine and separate objects from background, in an image.

The Sobel operator does this in a rather clever way. An image gradient is a change in intensity (or color) of an image (I’m over simplifying but bear with me). An edge in an image occurs when the gradient is greatest  and the Sobel operator makes use of this fact to find the edges in an image. The Sobel operator calculates the approximate image gradient of each pixel by convolving the image with a pair of 3×3 filters. These filters estimate the gradients in the horizontal (x) and vertical (y) directions and the magnitude of the gradient is simply the sum of these 2 gradients.

The magnitude of the gradient, which is what we use, is calculated using:

That’s the simplified, 2-paragraph theory behind the algorithm. If this fascinates you, you should grab a couple of books on image processing and computer vision and go through them.

Let’s look at how to implement the Sobel operator. This is simply by creating the 2 filters and running them through each pixel in the image, starting from the left and going right. Note that because the filter is a 3×3 matrix, the pixels in the first and last rows as well as the first and last columns cannot be estimated so the output image will be a 1 pixel-depth smaller than the original image.

To calculate the pixel in the right side of the equation (the one with coordinates 1,1) the following equation is used:

output pixel [1,1] = ([0,0] x -1) + ([0,1] x 0) + ([0,2] x 1) + ([1,0] x -2) + ([1,1] x 0) + ([1,2] x 2) + ([2,0] x -1) + ([2,1] x 0) + ([2,2] x 1)

To simplify matters even more, the grayscale version of the original image is usually used.

Now let’s look at the Ruby implementation


require 'chunky_png'

class ChunkyPNG::Image
  def at(x,y)
    ChunkyPNG::Color.to_grayscale_bytes(self[x,y]).first
  end
end

img = ChunkyPNG::Image.from_file('engine.png')

sobel_x = [[-1,0,1],
           [-2,0,2],
           [-1,0,1]]

sobel_y = [[-1,-2,-1],
           [0,0,0],
           [1,2,1]]

edge = ChunkyPNG::Image.new(img.width, img.height, ChunkyPNG::Color::TRANSPARENT)

for x in 1..img.width-2
  for y in 1..img.height-2
    pixel_x = (sobel_x[0][0] * img.at(x-1,y-1)) + (sobel_x[0][1] * img.at(x,y-1)) + (sobel_x[0][2] * img.at(x+1,y-1)) +
              (sobel_x[1][0] * img.at(x-1,y))   + (sobel_x[1][1] * img.at(x,y))   + (sobel_x[1][2] * img.at(x+1,y)) +
              (sobel_x[2][0] * img.at(x-1,y+1)) + (sobel_x[2][1] * img.at(x,y+1)) + (sobel_x[2][2] * img.at(x+1,y+1))

    pixel_y = (sobel_y[0][0] * img.at(x-1,y-1)) + (sobel_y[0][1] * img.at(x,y-1)) + (sobel_y[0][2] * img.at(x+1,y-1)) +
              (sobel_y[1][0] * img.at(x-1,y))   + (sobel_y[1][1] * img.at(x,y))   + (sobel_y[1][2] * img.at(x+1,y)) +
              (sobel_y[2][0] * img.at(x-1,y+1)) + (sobel_y[2][1] * img.at(x,y+1)) + (sobel_y[2][2] * img.at(x+1,y+1))

    val = Math.sqrt((pixel_x * pixel_x) + (pixel_y * pixel_y)).ceil
    edge[x,y] = ChunkyPNG::Color.grayscale(val)
  end
end

edge.save('engine_edge.png')

First thing you’d notice is that I used a library called ChunkyPNG, which is PNG manipulation library that is implemented in pure Ruby. While wrappers over ImageMagick (like RMagick) is probably the defacto image processing and manipulation library in Ruby, I thought it’s kind of pointless to do a Sobel operator with ImageMagick since it already has its own edge detection implementation.

To simplify the implementation, I opened up the Image class in ChunkyPNG and added a new method that will return a grayscale pixel at a specific location. Then I created the 2 Sobel filters with arrays of arrays. I created 2 nested loops to iterate through each pixel column by column, then row by row and at each pixel I used the equation above to calculate the gradient by applying the x filter then the y filter. Finally I used the gradient and set a grayscale pixel based on the gradient value, on a new image.

Here you can see the original image, which I reused from the Wikipedia entry on Sobel operator.

And the edge detected image with the x filter applied only.

This is the edge detected image with the y filter only.

Finally this is the edge detected image with both x and y filters applied.

This short exercise might not be technically challenging but it made me appreciate the pioneers who invented things that we now take for granted. Here’s a final picture, one with myself and Irwin (he is the guy who’s sitting opposite me), and a bunch of other colleagues at HP Labs Palo Alto over lunch. Thanks Irwin, for the Sobel operator!

Generate MP3 waveforms with Ruby and R

I blame Rully for this. If it wasn’t for him I wouldn’t have been obsessed with this and spent a good few hours at night figuring it out last week. It all started when Rully mentioned that he knew how many beeps there are in the Singapore MRT (subway system) ‘doors closing’ warning. There are 13 beeps, he explained and he said he found out by running a WAV recording of it through a MatLab package which in turn generated a waveform that allowed him to count the number of beeps accurately (it is normally too fast for the ear to determine the number of beeps). Naturally such talk triggered the inner competitive geek in me. I happened to be doing Ruby and R integration at the moment (watch out for my talk at RedDotRubyConf at the end of April) so I had no choice but to retrace his steps using my new toys. Really.

MP3 is a compressed and lossy audio encoding format and to generate a waveform I decided to convert it to WAV first. Doing this is relatively simple — there is a library called icanhasaudio that wraps around LAME to encode and decode audio. Naturally you need to have LAME installed first in your machine before you can do this but once you have done that, decoding the MP3 is a breeze:

reader = Audio::MPEG::Decoder.new
File.open('mrt_closing.mp3', 'rb') do |input|
  File.open('out.wav', 'wb')  do |output|
    reader.decode(input, output)
  end
end

That was easy enough. The next step was a bit trickier though. To understand how to create a waveform from a WAV file let’s digress a bit into what a WAV file is. WAV is an audio file format, originally from IBM and Microsoft, used to store audio bitstreams.WAV is an extended RIFF format, which is a little-endian version of the AIFF format (which is big-endian). In RIFF, data are stored in ‘chunks’ and for WAV, there are basically 2 types of chunks — a format chunk and a sound data chunk. The format chunk contains the parameters describing the waveform for example its sample rate, and the data chunk contains the actual waveform data. There are other chunks but because I’m really only interested in the waveform I’ll conveniently ignore them. This is how a minimal WAV file looks like:

The data chunk has a chunk ID which is always ‘data’, and a chunk size that is a long integer. Data in the data chunk is stored in sample points. A sample point is a value that represents a sample of a sound at a given moment in time. Each sample point is stored as a linear 2’s-complement value from 9 – 32 bits wide, specificed in the BitsPerSample field in the format chunk. Sounds in a WAV file can also come in multiple channels (for e.g. a stereo sound will come in 2 channels, like our file.) For such multi-channel sounds, the sample points are interleaved, one from each channel. A grouping of sample points for a single point in time for all the channels is called a sample frame. This graphic explains it best.

If you open the WAV up with a hex editor it will look something like this:

I wouldn’t go through the format chunks, in fact there is an easier way to find out the format, and that is for me to open up the WAV file using QuickTime and inspect it.

For more information you can get the WAV specs here. This is the information we found of the WAV file that we will use in a while:

  • Format : Linear PCM
  • Number of channels : 2
  • Number of bits per sample : 16

In order to create the waveform, I opened up the WAV file, and collected each sample point from each channel and convert that sample point into an integer. This will be the data file I will use later in R to generate the waveform. Let’s take a look at the code now that we use to generate the data:

FasterCSV.open('wavdata.csv', 'w') do |csv|
 csv << %w(ch1 ch2 combined)
  File.open('out.wav') do |file|
    while !file.eof?
      if file.read(4) == 'data'
        length = file.read(4).unpack('l').first
        wavedata = StringIO.new file.read(length)
        while !wavedata.eof?
          ch1, ch2 = wavedata.read(4).unpack('ss')
          csv << [ch1, ch2,ch1+ch2]
        end
      end
    end
  end
end

Note that I didn’t read the number of channels from the WAV file but instead assumed it has 2 channels (stereo), to simplify the code. Firstly I open up the WAV file. I ignored the format chunk completely and looked for the data chunk only. Then I read the length of the data chunk by reading the next 4 bytes and unpacking it as a long integer (hence ‘l’ format in the String#unpack method). This gives me the length of the data chunk that I will need to read next.

Next, for ease of reading I wrap the returned data string in a StringIO object. As we found out earlier, each sample has 2 channels and each sample point has 16 bits, so we need to retrieve 32 bits or 4 bytes. Since each sample point has 16 bits, this means a short integer, so we unpack the 4 bytes that are read into 2 short integers, and this will give us the 2 sample points of 2 channels of that sample frame.

After that it’s a simple matter of stuffing the sample points into a CSV file.

Finally, to generate the waveform from the data file, I run it through a simple R script, which I integrated with Ruby using the Ruby Rserve client.

script=<<-EOF
  png(file='/Users/sausheong/projects/wavform/mrtplot.png', height=800, width=600, res=72)
  par(mfrow=c(3,1),cex=1.1)
  wav_data <- read.csv(file='/Users/sausheong/projects/wavform/wavdata.csv', header=TRUE)
  plot(wav_data$combined, type='n', main='Channel 1', xlab='Time', ylab='Frequency')
  lines(wav_data$ch1)
  plot(wav_data$combined, type='n', main='Channel 2', xlab='Time', ylab='Frequency')
  lines(wav_data$ch2)
  plot(wav_data$combined, type='n', main='Channel 1 + Channel 2', xlab='Time', ylab='Frequency')
  lines(wav_data$combined)
  dev.off()
EOF
Rserve::Connection.new.eval(script)

The script generates the following PNG file:

As you can see from the waveform (ignoring the first 2 bulges, which are ‘doors’ and ‘closing’ respectively) there are 13 sharp pointy shapes, which represent a beep each.

You can get the code and images here from my GitHub repository. If you’re interested to hear more about Ruby and R integration, do come down to the RedDotRubyConf on 22 and 23 April 2011!

My new book is out!

It’s been a while from the day I started writing Cloning Internet Applications with Ruby but it’s finally out! You can get it from bookstores, Amazon or its main site at Packt. It’s available in both a paper and a digital version (PDF), so get it now!

The main idea behind this book is actually quite simple and it started out in this blog. The first ‘clone’ I wrote was the Internet search engine in 200 lines of code, which was really very much scratching an itch that I had while I was in Yahoo, about a year and a half ago. I was interested in search engines athen and I wanted to write a very simple search engine to illustrate the principles behind an Internet search engine. That gave me a chance to try out Sinatra, the minimalist web application framework, which worked out really well for me eventually. In turn, that kickstarted me into on a whimsy challenge to do the same with Twitter in the same number of lines of code, using Sinatra and later, TinyURL in 40 lines of code. After that it was only a short leap to writing a whole book about it.

While the original idea revolved around writing clones with the smallest codebase possible, eventually the book evolved to be about writing minimal-feature clones written using the Ruby libraries that I now love to use i.e. Sinatra, DataMapper and Haml. The fundamental premise of the book still remained though, that is to illustrate how clones of popular Internet applications can be written with Ruby.

While this is a highly technical book with lots of code, I added in plenty of elements of the reasons and rationale (according to me, that is) why and how certain features of those applications work. For example, Twitter’s and Facebook’s features for connecting their users (‘friending’ features) in a social network are different, because they target users differently. Twitter’s friending features are primarily one-way and do not need explicit approval while Facebook’s friending features are two-ways and need explicit approvals from both parties. This means design and implementation differences, which are explained in detail in the book.

The experience in writing this book was good, and I have learnt tremendously in the process though it was a struggle. I can say this now that it’s published, but there were certain times I wanted to throw in the towel because of the messy my career was in then. I was still in Yahoo when I started, and I continued while doing my consulting work which eventually led to Garena, then wrapping up before I left Garena and finally being published now as I’m in HP Labs. It took a longer time to finish this than my first book, because of the upheaval in my career in the past year or so and also because overall I wanted to come up with a better book. This resulted in a book that has been revised repeated as companies, statistics and technologies changed. When I started, TinyURL was the king of the hill of URL shorteners while bit.ly was up and coming having just taken over as the default URL shortener in Twitter. TinyURL is now one of the players, with bit.ly probably the largest but Twitter has come out with its own shortener. Facebook Connect was the way to go when I wrote the chapter on Facebook, but the Open Graph APIs has taken over since then. Twitter used both HTTP Basic Authentication and OAuth when I was writing, but has switched over completely to OAuth now. And I expect the list to go on and on.

Still, it has been a good journey and a good fight. Finally publishing it is a grand feeling second to none (except when I had my first child). Hope you enjoy the book!

Toilets and engineers

It started with a lunch conversation that slowly turned into a coffee time mini-debate. You see, I moved into a new job recently at the Applied Cloud Computing Lab in HP Labs Singapore. My first task was to hire engineers to staff the lab and while the final numbers were not absolute, the number 70 was bandied around for the population of the facilities. We have half a floor at Fusionopolis, plenty of space really but the bone of contention was more delicately focused on more sanitary facilities i.e. the toilets.

The problem was that the whole floor shares a single pair of toilets (one for male and another for female of course). I was totally convinced that with 70 people in the office, we will reach bladder or bowel apocalypse, a disaster in the waiting. My other colleagues however were less worried — the other floors seem to be doing fine. It wasn’t as if we could do anything about it, no one’s going to magically add in new toilets for us no matter how long the toilet queue was; but I had to know. I was also curious of course — what is the algorithm used to determine the number of toilets per square meter of occupancy?

Looking up building regulations found nothing (at least not online) in Singapore but UK has some regulations covered by the Health and Safety Executive.

Looking at facilities used by men only:

Number of men at work Number of toilets Number of urinals
1-15 1 1
16-30 2 1
31-45 2 2
46-60 3 2
61-75 3 3
76-90 4 3
91-100 4 4

Well, the toilets seem to almost fit into the UK regulations, but these are just regulations. How did these numbers come about and how realistic are they? To better model the usage scenarios, I did a Monte Carlo simulation on the usage pattern of the toilets, based on a simple set of assertions.

Taking an educated guess of 1 in 7 of engineers to be female, that leaves us with 60 male engineers sharing a single toilet facility. The toilet facility has 3 urinals and 2 stalls, one of which is a flush toilet, the other is a squat toilet (which in general no one uses any more).

An important piece of information is to find out how many times an average person need to urinate a day. It turns out that our bladder will signal us to make a trip to the toilet when it fills up to about 200ml. Combine with the information that an average urine output of an adult is 1.5 liters a day, this means that we roughly need to go about 8 times a day. This in turn means 8 times in 24 hours or 1 time in 3 hours on an average. The normal working hours in HP are 8:30am – 5:30pm, which is 9 hours, so this means on an average a person makes a beeline to the toilet 3 times during the course of the working day in the office.

I modeled the scenario discretely, per minute. This means a person in the office would make a decision to go to the loo or not every minute within the 9 hours in the office, or 540 minutes. Coupled with an average of 3 times in 540 minutes, this means the probability that a person would go to the toilet every count of the minute is 3 out of 540. In the case of urinating, my estimate is that a person will be able to complete the necessary job within 2 minute on an average and vacate the facility.

Armed with this model, let’s look at the simulation code:

require 'rubygems'
require 'faster_csv'

# population of 60 people on the same floor
population = 60
# period of time from 8:30am -> 5:30pm is 9 hours
period = 9
# 8 times a day -> 8 times in 24 hours -> 1 time every 3 hours
# within a period of 9 hours -> 3 times
# probability in every min -> 3 out of 9 * 60
times_per_period = 3
# rate of servicing
mu = 2
# number of people in the queue
queue = 0
# number of stalls available
stalls = 3
# number of occupied stalls
occupied = 0

# total of all queue sizes for calculating average queue size
queue_total = 0
# max queue size for the day
max_queue_size = 0

FasterCSV.open('queue.csv', 'w') do |csv|
  csv << %w(t queue occupied)
  (period * 60).times do |t|
   # at every rate of service interval, vacate any and all people from the toilet and replace them with people in the queue
    if t % mu == 0
      occupied = 0
      while occupied < stalls and queue > 0
        occupied += 1
        queue -= 1
      end
    end
    population.times do
      # does the engineer want to relieve himself?
      if (rand(period * 60) + 1) <= times_per_period
        if occupied < stalls
          occupied += 1
        else
          queue += 1
        end
      end
    end

    csv << [t, queue, occupied]
    max_queue_size = queue if queue > max_queue_size
    queue_total = queue_total + queue
  end
end

puts "** Max queue size : #{max_queue_size}"
puts "** Total mins in queue by all : #{queue_total} mins"

Most of the code are quite self explanatory. However notice that mu is 3 (minutes) instead of 2 as we have discussed earlier. The reason for this is that in keeping the simulation code simple, I’ve not factored in that if we clear the toilets every 2 minutes, people who entered during the 2nd minute (according to the simulation) will be cleared in just 1 minute. To compensate for this, I use a mu variable that is 1.5 times the service rate.

The simulation writes to queue.csv, which contains the queue size and toilet occupancy at every interval. This are the results:

** Max queue size : 3
** Total mins in queue by all : 17 mins

I opened the CSV up and used the data to create a simple histogram:

Queue size
Number of minutes
Percentage
0 526 97.6%
1 9 1.8%
2 3 0.4%
3 2 0.2%

From the results it’s pretty obvious that the architect and engineers who built Fusionopolis knew their stuff. These were the quantitative conclusions:

  1. Queues were formed less than 3% of the time of the day
  2. The maximum number of people in the queue were 3 people and that happened for only less than 2 minutes in a day
  3. The total time spent by everyone in the office, in a queue to the toilet was 17 mins

This is for the men’s urinals, let’s look at the stalls now. The same simulation script can be used, just changing the parameters a bit. Instead of mu = 3, we use mu = 8. Instead of 3 urinals, we have 2 stalls and instead of 3 times over the period, we use 1 time over the working period.

These are the results:

** Max queue size : 3
** Total mins in queue by all : 81 mins
Queue size
Number of minutes
Percentage
0 485 89.8%
1 33 6.1%
2 21 3.9%
3 1 0.2%

The problem seems a bit more serious now though not significantly so, though the chances of being in a a queue are higher now (10%).

In conclusion, I was wrong but I’m glad to be wrong in this case. Waiting in a toilet queue is seriously not a fun thing to do.