Creating a photo mosaic web app

A few months ago, my good friend Satish Talim had this great idea to create a series of Go challenges to help Go programmers to up their game. The idea was to create a programming problem every month (or so) that presents a fresh and interesting challenge to the Go community. There would be prizes to win, but more importantly, it was a community effort to help each other to improve ourselves. Satish asked me to write a challenge, and I readily agreed to create challenge #3.

Being a web application programmer for most of my career, it was the most natural thing to do to create a challenge that’s based on creating a web application. And some time back, I wrote a mosaic-generating script using Ruby during a hackathon, so I thought to marry both ideas together to create a challenge to create photo mosaic web app.

To be honest, at the time of issuing the challenge I haven’t actually written the photo mosaic web app yet. In fact, I only started writing it after the challenge was over. It took me the better part of 2 days to complete the photo mosaic web app. But I wasn’t finished yet, and I wanted to go a bit further and use Go’s concurrency to improve its performance. This blog post is the results of what I did.

The code for this blog post is found at https://github.com/sausheong/mosaic. Note that the code in GitHub has slight variations from the code described below.

The live site is found at http://mosaic.saush.com. It is deployed using Docker to Digital Ocean, through Tutum. The performance on the live site is not as good as described here, as it is only running on 1 CPU VM with 512MB.

Creating the photo mosaic

A photographic mosaic, or a photo mosaic is a picture (usually a photograph) that has been divided into (usually equal sized) rectangular sections, each of which is replaced with another picture (called a tile picture). If we view it from far away or if you squint at it, then the original picture can be seen. If we look closer though, we will see that the picture is in fact made up of many hundreds or thousands of smaller tile pictures.

The basic idea is simple – the web application allows a user to upload a target picture, which will be used to create a photo mosaic. To make things simple, I will assume that tile pictures are already available and are correctly sized.

Let’s start with the photo mosaic algorithm. The steps are simple and the whole web application can be written without the use of any third-party libraries.

  1. Build up a tile database, hash of tile pictures, by scanning a directory of pictures, then using the file name as the key and the average color of the picture as the value. The average color is a 3-tuple calculated from getting the red, green and blue (RGB) of every pixel and adding up all the reds, greens and blues, then divided by the total number of pixels
  2. Cut the target picture into smaller pictures of the correct tile size
  3. For every tile-sized piece of the target picture, assume the average color to the the color of the top left pixel of that piece
  4. Find the corresponding tile in the tile database that is the nearest match to the average color of the piece of the target picture, and place that tile in the corresponding position in the photo mosaic. To find the nearest match, we calculate the Euclidean distance between the two color 3-tuples by converting each color 3-tuple into a point in a 3-dimensional space
  5. Remove the tile from the tile database so that each tile in the photo mosaic is unique

I placed all the mosaic creating code in a single source file named mosaic.go. Let’s look at each function in this file.

// find the average color of the picture
func averageColor(img image.Image) [3]float64 {
    bounds := img.Bounds()
    r, g, b := 0.0, 0.0, 0.0
    for y := bounds.Min.Y; y < bounds.Max.Y; y++ {
        for x := bounds.Min.X; x < bounds.Max.X; x++ {
            r1, g1, b1, _ := img.At(x, y).RGBA()
            r, g, b = r+float64(r1), g+float64(g1), b+float64(b1)
        }
    }
    totalPixels := float64(bounds.Max.X * bounds.Max.Y)
    return [3]float64{r / totalPixels, g / totalPixels, b / totalPixels}
}

First is the averageColor function, which takes the red, green and blue of each pixel in the image, adds up all the reds, greens and blues and then divides each sum by the total number of pixels in the image. Then we create a 3-tuple (actually a 3 element array) consisting of these numbers.
Next, we have the resize function. The resize function resizes an image to a new width.

// resize an image to its new width
func resize(in image.Image, newWidth int) image.NRGBA {
    bounds := in.Bounds()
    width := bounds.Max.X - bounds.Min.X
    ratio := width / newWidth
    out := image.NewNRGBA(image.Rect(bounds.Min.X/ratio, bounds.Min.X/ratio, bounds.Max.X/ratio, bounds.Max.Y/ratio))
    for y, j := bounds.Min.Y, bounds.Min.Y; y < bounds.Max.Y; y, j = y+ratio, j+1 {
        for x, i := bounds.Min.X, bounds.Min.X; x < bounds.Max.X; x, i = x+ratio, i+1 {
            r, g, b, a := in.At(x, y).RGBA()
            out.SetNRGBA(i, j, color.NRGBA{uint8(r), uint8(g), uint8(b), uint8(a)})
        }
    }
    return *out
}

The tileDB function creates a database of the tile picture by scanning the directory where the tile pictures are located.

// populate a tiles database in memory
func tilesDB() map[string][3]float64 {
    fmt.Println("Start populating tiles db ...")
    db := make(map[string][3]float64)
    files, _ := ioutil.ReadDir("tiles")
    for _, f := range files {
        name := "tiles/" + f.Name()
        file, err := os.Open(name)
        if err == nil {
            img, _, err := image.Decode(file)
            if err == nil {
                db[name] = averageColor(img)
            } else {
                fmt.Println(":", err, name)
            }
        } else {
            fmt.Println("cannot open file", name, err)
        }
        file.Close()
    }
    fmt.Println("Finished populating tiles db.")
    return db
}

The tile database is a map with a string as the key a 3-tuple (in this case a 3-element array) as the value. I open up each image file in the directory and then get the average color of the image to create an entry in the map. The tile database is used to find the correct tile picture in the tile picture directory. It is passed into the nearest function, along with the target color 3-tuple.

// find the nearest matching image
func nearest(target [3]float64, db *map[string][3]float64) string {
    var filename string
    smallest := 1000000.0
    for k, v := range *db {
        dist := distance(target, v)
        if dist < smallest {
            filename, smallest = k, dist
        }
    }
    delete(*db, filename)
    return filename
}

Each entry in the tile database is compared with the target color and the entry with the smallest distance is returned as the nearest tile, and also removed from the tile database. The distance function calculates the Euclidean distance between two 3-tuples.

// find the Eucleadian distance between 2 points
func distance(p1 [3]float64, p2 [3]float64) float64 {
    return math.Sqrt(sq(p2[0]-p1[0]) + sq(p2[1]-p1[1]) + sq(p2[2]-p1[2]))
}

// find the square
func sq(n float64) float64 {
    return n * n
}

Finally, scanning and loading the tile database every time a photo mosaic is created can be pretty cumbersome. I want to do that only once, and clone the tile database every time a photo mosaic is created. The source tile database, TILEDB is then created as a global variable and populated on the start of the web application.

var TILESDB map[string][3]float64

// clone the tile database each time we generate the photo mosaic
func cloneTilesDB() map[string][3]float64 {
    db := make(map[string][3]float64)
    for k, v := range TILESDB {
        db[k] = v
    }
    return db
}

The photo mosaic web application

With the mosaic-generating functions in place, I can start writing my web application. The web application is placed in a source code file named main.go.

package main

import (
    "fmt"
    "html/template"
    "net/http"
    "bytes"
    "encoding/base64"
    "image"
    "image/draw"
    "image/jpeg"
    "os"
    "strconv"
)

func main() {
    mux := http.NewServeMux()
    files := http.FileServer(http.Dir("public"))
    mux.Handle("/static/", http.StripPrefix("/static/", files))
    mux.HandleFunc("/", upload)
    mux.HandleFunc("/mosaic ", mosaic)
    server := &http.Server{
        Addr:    "127.0.0.1:8080",
        Handler: mux,
    }
// building up the source tile database
    TILESDB = tilesDB()
    fmt.Println("Mosaic server started.")
    server.ListenAndServe()
}

// to display the upload page
func upload(w http.ResponseWriter, r *http.Request) {
    t, _ := template.ParseFiles("upload.html")
    t.Execute(w, nil)
}

// the HandlerFunc that contains all the photo mosaic generating algorithms
func mosaic(w http.ResponseWriter, r *http.Request) {
    t0 := time.Now()
    // get the content from the POSTed form
    r.ParseMultipartForm(10485760) // max body in memory is 10MB
    file, _, _ := r.FormFile("image")
    defer file.Close()
    tileSize, _ := strconv.Atoi(r.FormValue("tile_size"))
    // decode and get original image
    original, _, _ := image.Decode(file)
    bounds := original.Bounds()
    // create a new image for the mosaic
    newimage := image.NewNRGBA(image.Rect(bounds.Min.X, bounds.Min.X, bounds.Max.X, bounds.Max.Y))
    // build up the tiles database
    db := cloneTilesDB()
    // source point for each tile, which starts with 0, 0 of each tile
    sp := image.Point{0, 0}
    for y := bounds.Min.Y; y < bounds.Max.Y; y = y + tileSize {
        for x := bounds.Min.X; x < bounds.Max.X; x = x + tileSize {
            // use the top left most pixel as the average color
            r, g, b, _ := original.At(x, y).RGBA()
            color := [3]float64{float64(r), float64(g), float64(b)}
            // get the closest tile from the tiles DB
            nearest := nearest(color, &db)
            file, err := os.Open(nearest)
            if err == nil {
                img, _, err := image.Decode(file)
                if err == nil {
                    // resize the tile to the correct size 
                    t := resize(img, tileSize)
                    tile := t.SubImage(t.Bounds())
                    tileBounds := image.Rect(x, y, x+tileSize, y+tileSize)
                    // draw the tile into the mosaic
                    draw.Draw(newimage, tileBounds, tile, sp, draw.Src)
                } else {
                    fmt.Println("error:", err, nearest)
                }
            } else {
                fmt.Println("error:", nearest)
            }
            file.Close()
        }
    }

    buf1 := new(bytes.Buffer)
    jpeg.Encode(buf1, original, nil)
    originalStr := base64.StdEncoding.EncodeToString(buf1.Bytes())

    buf2 := new(bytes.Buffer)
    jpeg.Encode(buf2, newimage, nil)
    mosaic := base64.StdEncoding.EncodeToString(buf2.Bytes())
    t1 := time.Now()
    images := map[string]string{
        "original": originalStr,
        "mosaic":   mosaic,
        "duration": fmt.Sprintf("%v ", t1.Sub(t0)),
    }
    t, _ := template.ParseFiles("results.html")
    t.Execute(w, images)
}

The main logic for creating the photo mosaic is in the mosaic function, which is a handler function. First, I get the uploaded file and also the tile size from the form.

// get the content from the POSTed form
r.ParseMultipartForm(10485760) // max body in memory is 10MB
file, _, _ := r.FormFile("image")
defer file.Close()
tileSize, _ := strconv.Atoi(r.FormValue("tile_size"))

Next, I decode the uploaded target image, and also create a new photo mosaic image.

// decode and get original image
original, _, _ := image.Decode(file)
bounds := original.Bounds()
// create a new image for the mosaic
newimage := image.NewNRGBA(image.Rect(bounds.Min.X, bounds.Min.X, bounds.Max.X, bounds.Max.Y))

I also clone the source tile database, and set up the source point for each tile (this is needed by the image/draw package later).

// build up the tiles database
db := cloneTilesDB()
// source point for each tile, which starts with 0, 0 of each tile
sp := image.Point{0, 0}

I am now ready to iterate through each tile-sized piece of the target image.

for y := bounds.Min.Y; y < bounds.Max.Y; y = y + tileSize {
    for x := bounds.Min.X; x < bounds.Max.X; x = x + tileSize {
        // use the top left most pixel color as the average color
        r, g, b, _ := original.At(x, y).RGBA()
        color := [3]float64{float64(r), float64(g), float64(b)}
        // get the closest tile from the tiles DB
        nearest := nearest(color, &db)
        file, err := os.Open(nearest)
        if err == nil {
            img, _, err := image.Decode(file)
            if err == nil {
                // resize the tile to the correct size 
                t := resize(img, tileSize)
                tile := t.SubImage(t.Bounds())
                tileBounds := image.Rect(x, y, x+tileSize, y+tileSize)
                // draw the tile into the mosaic
                draw.Draw(newimage, tileBounds, tile, sp, draw.Src)
            } else {
                fmt.Println("error:", err, nearest)
            }
        } else {
            fmt.Println("error:", nearest)
        }
        file.Close()
    }
}

For every piece, I pick the top left pixel and assume that’s the average color. Then I find the nearest tile in the tile database that matches this color. The tile database gives me a filename, so I open up the tile picture, and resize it to the tile size. The resultant tile is drawn into the photo mosaic I created earlier.

Once the photo mosaic is created, I encode it into JPEG format, then encode it once again into a base64 string.

    buf1 := new(bytes.Buffer)
    jpeg.Encode(buf1, original, nil)
    originalStr := base64.StdEncoding.EncodeToString(buf1.Bytes())

    buf2 := new(bytes.Buffer)
    jpeg.Encode(buf2, newimage, nil)
    mosaic := base64.StdEncoding.EncodeToString(buf2.Bytes())
    t1 := time.Now()
    images := map[string]string{
        "original": originalStr,
        "mosaic":   mosaic,
        "duration": fmt.Sprintf("%v ", t1.Sub(t0)),
    }
    t, _ := template.ParseFiles("results.html")
    t.Execute(w, images)

The original target picture and the photo mosaic are then sent to the results.html template to be displayed on the next page. As you can see, the image is displayed with using a data URL with the base64 content that is embedded in the web page itself.

<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <title>Mosaic</title>
    ...
  </head>   
  <body>
    <div class='container'>
        <div class="col-md-6">
          <img src="data:image/jpg;base64,{{ .original }}" width="100%">
          <div class="lead">Original</div>
        </div>
        <div class="col-md-6">
          <img src="data:image/jpg;base64,{{ .mosaic }}" width="100%">
          <div class="lead">Mosaic – {{ .duration }} </div>
        </div>
        <div class="col-md-12 center">
          <a class="btn btn-lg btn-info" href="/">Go Back</a>
        </div>
    </div>   
    <br>
  </body>
</html>

Here’s a screenshot of the mosaic that’s created.

09-01
Basic photo mosaic web app

Now that we have the basic mosaic generating web application, let’s create the concurrent version of it.

Concurrent photo mosaic web application

One of the more frequent use of concurrency is to improve performance. The web application I just showed created a mosaic from a 151KB JPEG image in about 2.25 seconds. The performance is not really fantastic and can definitely be improved using some concurrency. The algorithm I am using in this example to build some concurrency into the photo mosaic web application is quite straightforward.

  1. Split the original image into 4 quarters
  2. Process them at the same time
  3. Combine the results back into a single mosaic

From a diagrammatic point of view:

Concurrency algorithm
Concurrency algorithm

A word of caution – this is not the only way that performance can be improved or concurrency can be achieved, but only one relatively simple and straightforward way.

The only function that changes in this is the mosaic handler function. In the earlier program, I had a single handler function that created the photo mosaic. In the concurrent version of the photo mosaic web application, I need to break up that function into two separate functions, called cut and combine respectively. Both the cut and the combine functions are called from the mosaic handler function.

func mosaic(w http.ResponseWriter, r *http.Request) {
    t0 := time.Now()
    r.ParseMultipartForm(10485760) // max body in memory is 10MB
    file, _, _ := r.FormFile("image")
    defer file.Close()
    tileSize, _ := strconv.Atoi(r.FormValue("tile_size"))
    original, _, _ := image.Decode(file)
    bounds := original.Bounds()
    db := cloneTilesDB()

    // fan-out
    c1 := cut(original, &db, tileSize, bounds.Min.X, bounds.Min.Y, bounds.Max.X/2, bounds.Max.Y/2)
    c2 := cut(original, &db, tileSize, bounds.Max.X/2, bounds.Min.Y, bounds.Max.X, bounds.Max.Y/2)
    c3 := cut(original, &db, tileSize, bounds.Min.X, bounds.Max.Y/2, bounds.Max.X/2, bounds.Max.Y)
    c4 := cut(original, &db, tileSize, bounds.Max.X/2, bounds.Max.Y/2, bounds.Max.X, bounds.Max.Y)

    // fan-in
    c := combine(bounds, c1, c2, c3, c4)

    buf1 := new(bytes.Buffer)
    jpeg.Encode(buf1, original, nil)
    originalStr := base64.StdEncoding.EncodeToString(buf1.Bytes())

    t1 := time.Now()
    images := map[string]string{
        "original": originalStr,
        "mosaic":   <-c,
        "duration": fmt.Sprintf("%v ", t1.Sub(t0)),
    }

    t, _ := template.ParseFiles("results.html")
    t.Execute(w, images)
}

Cutting up the image is handled by the cut function, in what is known as the fan-out pattern.

Splitting the target picture into 4 quadrants
Splitting the target picture into 4 quadrants

The original image is cut up into 4 quadrants to be processed separately.

c1 := cut(original, &db, tileSize, bounds.Min.X, bounds.Min.Y, bounds.Max.X/2, bounds.Max.Y/2)
c2 := cut(original, &db, tileSize, bounds.Max.X/2, bounds.Min.Y, bounds.Max.X, bounds.Max.Y/2)
c3 := cut(original, &db, tileSize, bounds.Min.X, bounds.Max.Y/2, bounds.Max.X/2, bounds.Max.Y)
c4 := cut(original, &db, tileSize, bounds.Max.X/2, bounds.Max.Y/2, bounds.Max.X, bounds.Max.Y)

You might notice that these are regular functions and not goroutines, how can they run concurrently? The answer is because the cut function creates an anonymous goroutine and returns a channel.

func cut(original image.Image, db *map[string][3]float64, tileSize, x1, y1, x2, y2 int) <-chan image.Image {
    c := make(chan image.Image)
    sp := image.Point{0, 0}
    go func() {
        newimage := image.NewNRGBA(image.Rect(x1, y1, x2, y2))
        for y := y1; y < y2; y = y + tileSize {
            for x := x1; x < x2; x = x + tileSize {
                r, g, b, _ := original.At(x, y).RGBA()
                color := [3]float64{float64(r), float64(g), float64(b)}
                nearest := nearest(color, db)
                file, err := os.Open(nearest)
                if err == nil {
                    img, _, err := image.Decode(file)
                    if err == nil {
                        t := resize(img, tileSize)
                        tile := t.SubImage(t.Bounds())
                        tileBounds := image.Rect(x, y, x+tileSize, y+tileSize)
                        draw.Draw(newimage, tileBounds, tile, sp, draw.Src)
                    } else {
                        fmt.Println("error:", err)
                    }
                } else {
                    fmt.Println("error:", nearest)
                }
                file.Close()
            }
        }
        c <- newimage.SubImage(newimage.Rect)
    }()
    return c
}

The logic is exactly the same as in the original photo mosaic web application. I created a channel in the cut function and started up an anonymous goroutine that sends the results to this channel, then return the channel. This way, the channel is immediately returned to the mosaic handler function, and the completed photo mosaic segment is sent to the channel when the processing is done. You might notice that while I’ve created the return channel as a bi-directional channel, I can typecast it to be returned as a receive-only channel.

I’ve cut the original image into 4 separate pieces and convert each piece into a part of a photo mosaic. It’s time to put them together again, using what is commonly known as the fan-in pattern, in the combine function.

func combine(r image.Rectangle, c1, c2, c3, c4 <-chan image.Image) 
<-chan string {
    c := make(chan string)
    // start a goroutine
    go func() {
        var wg sync.WaitGroup
        img:= image.NewNRGBA(r)
        copy := func(dst draw.Image, r image.Rectangle, 
src image.Image, sp image.Point) {
            draw.Draw(dst, r, src, sp, draw.Src)
            wg.Done()
        }
        wg.Add(4)
        var s1, s2, s3, s4 image.Image
        var ok1, ok2, ok3, ok4 bool
        for  {
            select {
            case s1, ok1 = <-c1:
                go copy(img, s1.Bounds(), s1,
                    image.Point{r.Min.X, r.Min.Y})
            case s2, ok2 = <-c2:
                go copy(img, s2.Bounds(), s2, image.Point{r.Max.X / 2, r.Min.Y})
            case s3, ok3 = <-c3:
                go copy(img, s3.Bounds(), s3, image.Point{r.Min.X, r.Max.Y/2})
            case s4, ok4 = <-c4:
                go copy(img, s4.Bounds(), s4, image.Point{r.Max.X / 2, r.Max.Y / 2})
            }
            if (ok1 && ok2 && ok3 && ok4) {
                break
            }
        }
        // wait till all copy goroutines are complete
        wg.Wait()
        buf2 := new(bytes.Buffer)
        jpeg.Encode(buf2, newimage, nil)
        c <- base64.StdEncoding.EncodeToString(buf2.Bytes())
    }()
    return c
}

As in the cut function, the main logic in combining the images are in an anonymous goroutine, and I create and return a receive-only channel. As a result, I can encode the original image while combining the 4 photo mosaic segments.

In the anonymous goroutine, I create another anonymous function and assign it to a variable copy. This function copies a photo mosaic segment into the final photo mosaic and will be run as a goroutine later. Because the copy function is called as a goroutine, I will not be able to control when they complete. To synchronize the completion of the copying, I use a WaitGroup. I create a WaitGroup wg, then set the counter to 4 using the Add method. Each time the copy function completes, it will decrement the counter using the Done method. I call the Wait method just before encoding the image to allow all the copy goroutines to complete and I actually have a complete photo mosaic image.

Remember that the input to the combine function includes the 4 channels coming from the cut function containing the photo mosaic segments, and I don’t know when the channels actually have segments. I could try to receive each one of those channels in sequence, but that wouldn’t be very concurrent. What I would like to do is to start processing whichever segment that comes first and the select statement fits the bill nicely.

var s1, s2, s3, s4 image.Image
var ok1, ok2, ok3, ok4 bool
for  {
select {
    case s1, ok1 = <-c1:
        go copy(img, s1.Bounds(), s1, image.Point{r.Min.X, r.Min.Y})
    case s2, ok2 = <-c2:
        go copy(img, s2.Bounds(), s2, image.Point{r.Max.X / 2, r.Min.Y})
    case s3, ok3 = <-c3:
        go copy(img, s3.Bounds(), s3, image.Point{r.Min.X, r.Max.Y / 2})
    case s4, ok4 = <-c4:
        go copy(img, s4.Bounds(), s4, image.Point{r.Max.X / 2, r.Max.Y / 2})
    }
    if (ok1 && ok2 && ok3 && ok4) {
        break
    }
}

I loop indefinitely and in each iteration, I select the channel that is ready with a value (or if more than one is available, Go randomly assigns me one). I use the image from this channel and start a goroutine with the copy function. Note that I’m using the multi-value format for receiving values from the channel, meaning the second variable (ok1, ok2, ok3 or ok4) tells me if I have successfully received from the channel. The for loop breaks once I have received successfully on all channels.

Moving on, and referring to the WaitGroup wg I used earlier, remember that even though I received all the photo mosaic segments successfully, I have in turn started 4 separate goroutines, which might not have completed at that point in time. The Wait method on the WaitGroup wg blocks the encoding of the assembled photo mosaic until the photo mosaic is completed.

Here’s a screenshot of the results, using the same target picture and tile pictures.

Photo mosaic web application with concurrency
Photo mosaic web application with concurrency

If you’re sharp-eyed, you might see the slight differences in the photo mosaic that’s generated. The final photo mosaic is assembled from 4 separate pieces and the algorithm doesn’t smoothen out the rough edges. However, you can see the difference in performance – where the basic photo mosaic web application took 2.25 seconds, this one using concurrency only takes almost a quarter of that time, around 646 miliseconds.

For readers who are really observant you might realize that both web applications are actually running on just one CPU! As mentioned by Rob Pike, concurrency is not parallelism – what I’ve shown you is how we can take a simple algorithm and break it down into a concurrent one, with no parallelism involved! None of the goroutines are actually running in parallel (since there is only one CPU) even though they are running independently.
Of course it would be cruel not to go the last step and show how it actually runs using multiple CPUs. To do this, I simply need to set the GOMAXPROCS in the runtime to the actual number of CPUs running on my system. The changes are in the main.go file. Remember to import the runtime package before making the change below.

func main() {
    // this prints out the number of CPUs in my system
    fmt.Println("Number of CPUs:", runtime.NumCPU())
    runtime.GOMAXPROCS(runtime.NumCPU())
    fmt.Println("Starting mosaic server ...")
    mux := http.NewServeMux()
    files := http.FileServer(http.Dir("public"))
    mux.Handle("/static/", http.StripPrefix("/static/", files))
    mux.HandleFunc("/", upload)
    mux.HandleFunc("/mosaic", mosaic)
    server := &http.Server{
        Addr:    "127.0.0.1:8080",
        Handler: mux,
    }
    TILESDB = tilesDB()
    fmt.Println("Mosaic server started.")
    server.ListenAndServe()
}

I compile and then upload the same target picture again.

Photo mosaic web application with concurrency and 8 CPUs
Photo mosaic web application with concurrency and 8 CPUs

As you can see, the performance has improved 3 times, from 646 milliseconds to 216 milliseconds! And if we compare that with our original photo mosaic web application with 2.25 seconds, that’s a 10 times performance improvement! That is an actual comparison, even though we did not run it with 8 CPUs previously, our original photo mosaic web application is not concurrent and therefore can only use one CPU – giving it 8 CPUs make no difference at all.
What is also interesting to note is that there is no difference between the original and the concurrent web applications, in terms of the photo mosaic algorithm. In fact, between the two applications, the mosaic.go source file was not modified at all. The only difference is concurrency, and that is a testament to how powerful it is.

Advertisements

HTML Forms and Go

This is an excerpt out of my new Go Web Programming book that talks about using Go to process HTML forms sent from the browser. This sounds pretty trivial but as in much of web programming (and much of programming per se), it’s often the trivial things that stumble us.

Go Web Programming

Before we get into getting form data from a POST request, let’s take a closer look into HTML forms and see what they are. Most of the time, POST requests come in the form (pun intended) of a HTML form and often look like this:

<form action="/process" method="post">
<input type="text" name="first_name"/>
<input type="text" name="last_name"/>
<input type="submit"/>
</form>

Within the form tag, we place a number of HTML form elements like text input, text area, radio buttons, checkboxes, file uploads and so on. These elements allow users to enter data to be submitted to the server. Data is submitted to the server when the user clicks a button or somehow triggers the form submission.

We know the data is sent to the server through a HTTP POST request, and is placed in the body of the request. But how is the data formatted? The HTML form data is always sent as name-value pairs but how are these name-value pairs formatted in the POST body? It is important for us to know this because as we receive the POST request from the browser, we need to be able to parse the data and extract the name-value pairs.
The format of the name-value pairs sent through a POST request is specified by the content type of the HTML form. This is defined using the enctype attribute like this:

<form action="/process" method="post" enctype="application/x-www-form-urlencoded">
<input type="text" name="first_name"/>
<input type="text" name="last_name"/>
<input type="submit"/>
</form>

The default values for enctype is application/x-www-form-urlencoded but browsers are required to support at least application/x-www-form-urlencoded and multipart/form-data (HTML5 also supports a text/plain value).

If we set enctype to application/x-www-form-urlencoded, the browser will encode the HTML form data a long query string with the name-value pairs separated by an ampersand (&) and the name is separated from the values by an equal (=), that is the same as URL encoding, hence the name. In other words, the HTTP body will look something like this:

first_name=sau%20sheong&last_name=chang

If we set enctype to multipart/form-data, each name-value pair is converted into a MIME message part, each with its own content type and content disposition. For example, the same form data as above will now look something like this:

------WebKitFormBoundaryMPNjKpeO9cLiocMw
Content-Disposition: form-data; name="first_name"

sau sheong
------WebKitFormBoundaryMPNjKpeO9cLiocMw
Content-Disposition: form-data; name="last_name"

chang
------WebKitFormBoundaryMPNjKpeO9cLiocMw--

When would we use either one or the other? If we’re sending simple text data, the URL encoded form is better as it is simpler, more efficient and less processing is needed. If we’re sending large amounts of data, especially when uploading files the multipart-MIME form is better. We can even specify to do base64 encoding to send binary data as text.

So far we’ve only talked about POST requests, what about GET requests in a HTML form? HTML allows the method attribute to be either POST or GET, so this is also a valid format.

<form action="/process" method="get">
<input type="text" name="first_name"/>
<input type="text" name="last_name"/>
<input type="submit"/>
</form>

In this case, there is no request body (GET requests have no request body), all the data are set in the URL as name-value pairs.

Now that we know how data is sent from a HTML form to the server, let’s go back to the server and see how we use net/http to process the request.

Form

One way to extract data from the HTTP request is to extract data from the URL and the body in the raw form, which requires us to parse the data ourselves. However we normally do not need to, because the net/http library provides us with a rather comprehensive set of functions, although not named entirely correctly, normally provides us with all we need. Let’s talk about each one of them in turn.

The functions in Request that allows us to extract data from the URL and/or the body revolve around the Form, PostForm and MultipartForm fields. The data are in the form of key-value pairs (which is what we normally get from a POST request anyway). The general algorithm is:

  • Call ParseForm or ParseMultipartForm to parse the request
  • Access Form, PostForm or MultipartForm accordingly

Let’s take a look at some code.

package main

import (
"fmt"
"net/http"
)

func process(w http.ResponseWriter, r *http.Request) {
r.ParseForm()
fmt.Fprintln(w, r.Form)
}

func main() {
server := http.Server{
Addr: "127.0.0.1:8080",
}
http.HandleFunc("/process", process)
server.ListenAndServe()
}

The focus of this server is on these 2 lines:

r.ParseForm()
fmt.Fprintln(w, r.Form)

As mentioned earlier, we need to first parse the request using ParseForm, and then access the Form field.

Let’s take a look at the client that is going to call this server. We’ll create a simple, minimal HTML form to send the request to the server. Place the code in a file named client.html.

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Go Web Programming</title>
</head>
<body>
<form action="http://127.0.0.1:8080/process?hello=world&thread=123" method="post" enctype="application/x-www-form-urlencoded">
<input type="text" name="hello" value="sau sheong"/>
<input type="text" name="post" value="456"/>
<input type="submit"/>
</form>
</body>
</html>

In this form we are:

  • Sending the URL http://localhost:8080/process?hello=world&thread=123 to the server using the POST method
  • Specifying the content type (in the enctype field) to be application/x-www-form-urlencoded
  • Sending 2 HTML form key-value pairs – hello=sau sheong and post=456 to the server

Note that we have 2 values for the key hello. One of them is world in the URL and the other is sau sheong in the HTML form.

Open the client.html file directly in your browser (you don’t need to serve it out from a web server, just running it locally on your browser is fine) and click on the submit button. What you will see on the browser is:

map[thread:[123] hello:[sau sheong world] post:[456]]

This is the raw string converted version of the Form struct in the POST request, after the request has been parsed. The Form struct is a map, which keys are strings and values are a slice of strings. Notice that the map is not sorted so you might get a different sorting of the returned values. Nonetheless what we get is the combination of the query values hello=world and thread=123 as well as form values hello=sau sheong and post=456. As you can see, the values are URL decoded (there is a space between sau and sheong).

PostForm

Of course if you wanted to just get the value to the key post, you can use r.Form["post"] which will give you a map with 1 element – [456]. If the form and the URL have the same key, both of them will be placed in a slice, with the form value always prioritized before the URL value.

What if we need just the form key-value pairs and want to totally ignore the URL key-value pairs? For this we have the PostForm, which only provides key-value pairs for the form and not the URL. If we change from using r.Form to using r.PostForm in the code this is what we get:

map[post:[456] hello:[sau sheong]]

We used application/x-www-form-urlencoded for the content type. What happens if we use multipart/form-data? Make the change to the client HTML form, switch back to using r.Form and let’s find out:

map[hello:[world] thread:[123]]

What happened here? We only get the URL query key-value pairs this time and not the form key-value pairs, because PostForm only supports application/x-www-form-urlencoded. To get multipart key-value pairs from the body, we need to use the MultipartForm.

MultipartForm

Instead of using ParseForm and then calling Form on the request, we have to use ParseMultipartForm then use MultipartForm on the request. ParseMultipartForm also calls ParseForm when necessary.

r.ParseMultipartForm(1024)
fmt.Fprintln(w, r.MultipartForm)

We need to tell ParseMultipartForm how much data we want to extract from the multipart form, in bytes. Now let’s see what happens:

&{map[hello:[sau sheong] post:[456]] map[]}

This time we see the form key-value pairs, but not the URL key-value pairs. This is because MultipartForm only contains the form key-value pairs. Notice that the returned value is no longer a map, but a struct that contains 2 maps. The first map has keys that are strings and values that are slices of string while the second map is empty. It’s empty because it’s a map with keys that are strings but values that are files.

There is one last set of functions that allows us to access the key-value pairs even easier than what we’ve just went through. The FormValue function allows us to access the key-value pairs just like in Form, except that it is for a specific key and we don’t need to call ParseForm or ParseMultipartForm beforehand – the FormValue function does that for us.

From our previous example, this means if we do this in our handler function:

fmt.Fprintln(w, r.FormValue("hello"))

And we set the client’s form enctype to application/x-www-form-urlencoded, we will get this:

sau sheong

We get only sau sheong because FormValue only retrieves the first value, even though we actually have both values in the Form struct. To prove this, let’s add another line below the earlier line of code, like this:

fmt.Fprintln(w, r.FormValue("hello"))
fmt.Fprintln(w, r.Form)

This time we’ll see:

sau sheong
map[post:[456] hello:[sau sheong world] thread:[123]]

The PostFormValue function does the same thing, except that it is for PostForm instead of Form. Let’s make some changes to the code to use the PostFormValue function:

fmt.Fprintln(w, r.PostFormValue("hello"))
fmt.Fprintln(w, r.PostForm)

This time we get this instead:

sau sheong
map[hello:[sau sheong] post:[456]]

As you can see we get only the form key-value pairs.

Both FormValue and PostFormValue call ParseMultipartForm for us so we don’t need to call it ourselves, but there’s a slightly confusing gotcha that you should be careful with (at least as of Go 1.4). If we set the client form’s enctype to be multipart/form-data and try to get the value using either FormValue or PostFormValue, we won’t be able to get it even though MultipartForm has been called!

To be clearer, let’s make some changes to the server’s handler function again:

fmt.Fprintln(w, "(1)", r.FormValue("hello"))
fmt.Fprintln(w, "(2)", r.PostFormValue("hello"))
fmt.Fprintln(w, "(3)", r.PostForm)
fmt.Fprintln(w, "(4)", r.MultipartForm)

This is our result from using our form with enctype set to multipart/form-data:

(1) world
(2)
(3) map[]
(4) &{map[hello:[sau sheong] post:[456]] map[]}

The first line in the results gives us the value for hello that’s found in the URL and not the form. The second line and third line tells us why, because if we just take the form key-value pairs, we actually get nothing. That’s because FormValue and PostFormValue corresponds to Form and PostForm, and not MultipartForm. The last line in the results proves to us that ParseMultipartForm was actually called, that’s why if we try to access the MultipartForm we’ll get the data there.

We covered quite a bit in this blog post so let’s recap how these functions are different, in a nice table.

html_form_and_go

Undoubtedly the naming convention leaves much to be desired!