saush

Downloading documents programmatically with Google Docs API

Posted in Ruby by sausheong on January 28, 2008

An interesting and perhaps little known feature in the Google Documents List Data API (perhaps because it’s currently undocumented!) is that you can use the APIs to programmatically get the word processor documents in various formats. This means that if you have a document currently in your folder (or anyone’s folder for that matter), you can use the APIs to save it to your computer. Why is this useful, since we can already do this from the browser?

It’s the same reason why the Google Documents List Data API was created in the first place — it allows you to write an application that can interact with Google Docs. Say for example if you want to save the document into the database instead of a file. Or you want to allow your users to use their documents on Google Docs to interact with your application. It’s a cool little feature that has quite interesting possibilities.

Let me show you how this can be done easily in Ruby. For this I’ve written a small Ruby script that allows you to take a file from your Google Docs account and save it as a file on your computer.

require 'rubygems'
require 'net/https'
require 'open-uri'

GOOGLE_URL = 'www.google.com'
GOOGLE_DOCS_URL = 'docs.google.com'

# convenience module to use HTTPS (mainly used for login)
module Net
  class HTTPS < HTTP
    def initialize(address, port = nil)
      super(address, port)
      self.use_ssl = true
    end
  end
end

class GCommander

  # Login into the Google Docs APIs using a Google account
  def login(email, pwd)
    params = { 'Email' => email,
      'Passwd' => pwd,
      'source' => 'saush-docs-01',
      'accountType' => 'HOSTED_OR_GOOGLE',
      'service' => 'writely' # this is the Google Docs service ID
    }

    response = Net::HTTPS.post_form(URI.parse("https://#{GOOGLE_URL}/accounts/ClientLogin"), params)
    response.error! unless response.kind_of? Net::HTTPSuccess
    @token = response.body.split(/=/).last
  end

  # Fetches the document, only works with word processor documents for now
  def fetch_document(saveas, url)
    exfmt = saveas.split('.')[1]
    raise "Unsupported format" if !%w(doc pdf).include? exfmt
    file = File.new(saveas, 'w')
    doc_id = ''
    URI.parse(url).query.split('&').each {|p| doc_id = p.split('=')[1] if p.include? 'docid=' }
    doc_url = "http://docs.google.com/MiscCommands?command=saveasdoc&exportformat=#{exfmt}&docID=#{doc_id}"
    open(doc_url, 'Authorization' => "GoogleLogin auth=#{@token}") { |data|
      file.write data.read
    }
    file.close
  end

end

if __FILE__ == $0 then
  gcmd = GCommander.new
  gcmd.login('YOUR GOOGLE ACCOUNT ID', 'YOUR GOOGLE ACCOUNT PASSWORD')
  gcmd.fetch_document(ARGV[0], ARGV[1])
end

Just replace placeholders with the actual Google account ID and password. How do you use it?

ruby gcommander.rb some_file.pdf http://docs.google.com/Doc?docid=GOOGLE_DOC_ID&hl=en

The document URL should be replaced with the actual URL of the Google document, which you can cut and paste from the browser. I’ve only tested this with MS Word (.doc) and Adobe Acrobat (.pdf) documents though.

For more information on how to use Google Docs for mashups, buy my book when it comes out in March :)

About these ads
Tagged with:

3 Responses

Subscribe to comments with RSS.

  1. Naveen said, on July 16, 2008 at 5:51 pm

    Thatz pretty useful writeup. Could you please let me know is there any documentation available for retrieving the contents of the document in Java?
    Any other pointers would be much helpful.

    Naveen. V

  2. Dan said, on November 15, 2011 at 2:01 pm

    Do you have an updated version that works with Google’s new document format?

  3. Burke said, on October 19, 2013 at 1:51 am

    Hello there! This post could not be written any better!
    Reading through this post reminds me of my good old room mate!
    He always kept talking about this. I will forward this write-up
    to him. Fairly certain he will have a good read.
    Thanks for sharing!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 446 other followers

%d bloggers like this: