saush

Write an Internet search engine with 200 lines of Ruby code

Posted in Ruby, search engine by sausheong on March 17, 2009

Most people who know me realises at some point in time that I go through cycles where I dive into things that catch my imagination and obsess over it for days, weeks or even months. Some things like origami and programming never really go away while others like bread-making make cometary orbits that come once in a few years. In the past long gone, this was fueled mainly through reading copious amounts of books and magazines, but since the arrival of the Internet, my obsessive cycles reach new peaks. It allowed me to reach out to like minded people and provided a massive amount of data that I can never hope to conquer.

Fortunately with search engines, I don’t need to conquer them by myself. Search engines are almost ubiquitous on the Internet, the most famous one has since become a verb. In June 2006, the verb ‘to google’ was officially added to the Oxford English Dictionary. The barrier to entry for obsessing over something is now nothing more than an Internet connection away. So it’s only natural that search engines has now become my new obsessive interest. This of course has been enhanced in particular by the fact that I work for Yahoo!, who owns second most popular web search engine in the world.

The topics on search engines are pretty vast. Much of it is centered around search engine optimization (SEO), which contrary to its name, is not about optimizing search engines but about optimizing web sites to make it rise to the top of a search engine results list. However this is not what I’m interested in (at this moment). What I’m interested in is how search engines work, what makes it tick, basically the innards of a search engine. As part of my research, I wrote a search engine using Ruby, called SaushEngine, which I will describe in full here.

Surprisingly, as I delved deeper into the anatomy of a search engine, most search engines are pretty much the same. Sure, the implementation can be vastly different but the basic ideas are pretty much universal in all the search engines I looked at, including web search engines such as Yahoo! and Google as well as standalone ones such as Sphinx, ht://Dig, Lucence and Nutch. In essence, any search engine has these 3 parts:

  1. Crawl – the part of the search engine that goes around extracting data from various data sources that are being searched
  2. Index – the part of the search engine that processes the data that has been extracted during the crawl and stores it
  3. Query – the part of the search engine that allows a user to query the search engine and result ranked results

This article will describe how each part works and how my Ruby search engine implements it. SaushEngine (pun intended) is a simple web search engine, that is around 200 lines of Ruby code. However I do cheat a bit and use various libraries extensively including Hpricot, DataMapper and the Porter Stemmer. SaushEngine is a web search engine which means it goes out to Internet and harvests data on Internet web sites. Having said that, it’s definitely not production grade and lacks much of the features of a proper search engine (including high performance) in exchange for a simpler implementation.

Crawl

Let’s start with the crawl phase of SaushEngine. SaushEngine has a web crawler typically called Spider (spider crawls the web — get it? I know, it’s an old joke), in 2 files named spider.rb and index.rb. This crawler doesn’t just extract the information only though, it partly processes the data as well. It is designed this way to simplify the overall design. In this section we’ll concentrate on the crawling parts first.

There are a number of things to consider when designing a crawler, the most important amongst which are:

  • How do we crawl?
  • What kind of documents do we want to get?
  • Which sites are we allowed to visit?
  • How do we re-crawl and how often?
  • How fast can we crawl?

Many search engine designs approach crawling modularly by detaching the dependency of the crawler to the index and query design. For example in Lucene, the crawler is not even part of the search engine and the implementer is expected to provide the data source. On the other hand, some crawlers are not built for search engines, but for research into the Internet in general. In fact, the first crawler, the World Wide Web Wanderer, was deployed in 1993 to measure the size of the Internet. As mentioned before, to simplify the design, SaushEngine’s crawler is an integral part of the search engine and the design is not modular. The subject of web crawlers is a topic of intensive research, with tons of existing literature dating from the late 1990s till today. Without going extensively into them, let’s see how SaushEngine is designed to answer some of those questions.

How do we crawl?
This is the central question. Google claimed in a blog entry in the official Google blog in July 2008 that they found 1 trillion unique URLs. That’s a whole lot of web pages. As a result, there is always the question of the value of visiting a particular page and prioritizing which sites to crawl are important to get the strategic sites in an evenly distributed way.

Most crawlers work by starting with bunch ‘seed’ URLs and following all links in those seeds, so this means choosing the correct seeds are critical to having a good index. SaushEngine uses a breadth-first strategy in crawling sites from its seed URLs.

While we’re discussing the generic Internet crawler here, there exist a more focused type of crawler that only crawls specific sites or topics etc. For example, I could crawl only food-related sites or anime-related sites. The specifics in implementation will be different but the core mechanism will remain the same.

What kind of documents do we want to get?
Obviously for an Internet web crawler, we what are looking are all the web pages on the various web sites on the Internet. Some more advanced search engines include Word documents, Acrobat documents, Powerpoint presentations and so on but predominantly the main data source are HTML-based web pages that are reference by URLs. The SaushEngine design only crawls for html documents and ignore all other types of documents. This has an impact when we go into the indexing phase. Besides blocking off URLs that end with certain types such as .doc, .pdf, .avi etc, SaushEngine also rejects URLs with ‘?’, ‘&’ and ‘/cgi-bin/’ and so on, in them as the likelihood of generated content or it being a web application is high.

Which sites are we allowed to visit?
A major concern many web sites have against web crawlers are that they will consume bandwidth and incur costs for the owners of the various site it crawls. As a response to this concern, the Robots Exclusion Protocol was created where a file named robots.txt is placed at the root level of a domain and this file tells compliant crawlers which parts of the site is allowed or not allowed. SaushEngine parses the robots.txt file for each site to extract the permissions and complies with the permissions in the file.

How do we re-crawl and how often?
Web pages are dynamic. Content in web pages can change regularly, irregularly or even never. A good search engine should be able to balance between the freshness of the page (which makes it more relevant) to the frequency of the visits (which consumes resources). SaushEngine make a simple assumption that any page should be revisited if it is older than 1 week (a controllable parameter).

How fast can we crawl?
Performance of the crawler is critical to the search engine. It determines the number of pages in its index, which in turn determines how useful and relevant the search engine is. The SaushEngine crawler is rather slow because it’s not designed for speed but for ease of understanding of crawler concepts.

Let’s go deeper into the design. This is the full source code for index.rb

require 'rubygems'
require 'dm-core'
require 'dm-more'
require 'stemmer'
require 'robots'
require 'open-uri'
require 'hpricot'

DataMapper.setup(:default, 'mysql://root:root@localhost/saush')
FRESHNESS_POLICY = 60 * 60 * 24 * 7 # 7 days

class Page
 include DataMapper::Resource

 property :id,          Serial
 property :url,         String, :length => 255
 property :title,       String, :length => 255
 has n, :locations
 has n, :words, :through => :locations
 property :created_at,  DateTime
 property :updated_at,  DateTime

 def self.find(url)
 page = first(:url => url)
 page = new(:url => url) if page.nil?
 return page
 end

 def refresh
 update_attributes({:updated_at => DateTime.parse(Time.now.to_s)})
 end

 def age
 (Time.now - updated_at.to_time)/60
 end

 def fresh?
 age > FRESHNESS_POLICY ? false : true
 end
end

class Word
 include DataMapper::Resource

 property :id,         Serial
 property :stem,       String
 has n, :locations
 has n, :pages, :through => :locations

 def self.find(word)
 wrd = first(:stem => word)
 wrd = new(:stem => word) if wrd.nil?
 return wrd
 end
end

class Location
 include DataMapper::Resource

 property :id,         Serial
 property :position,   Integer

 belongs_to :word
 belongs_to :page
end

class String
 def words
 words = self.gsub(/[^\w\s]/,"").split
 d = []
 words.each { |word| d << word.downcase.stem unless (COMMON_WORDS.include?(word) or word.size > 50) }
 return d
 end

 COMMON_WORDS = ['a','able','about','above','abroad', ...,'zero'] # truncated
end

DataMapper.auto_migrate! if ARGV[0] == 'reset'

Note the various libraries I used for SaushEngine:

  1. DataMapper (dm-core and dm-more)
  2. Stemmer
  3. Robots
  4. Hpricot

I also use the MySQL relational database as the index (which we’ll get to later). In the Page class, note the fresh?, age and refresh methods. The fresh? method checks if the page is fresh or not, and the freshness of the page is determined by the age of the page since it was last updated. Also note that I extended the String class by adding a words method that returns a stem of all the words from the string, excluding an array of common words or if the word is really long. I create the array of word stems using the Porter stemmer. We’ll see how this is used in a while.

Now take a look at spider.rb, which is probably the largest file in SaushEngine, around 100 lines of code:

require 'rubygems'
require 'index'

LAST_CRAWLED_PAGES = 'seed.yml'
DO_NOT_CRAWL_TYPES = %w(.pdf .doc .xls .ppt .mp3 .m4v .avi .mpg .rss .xml .json .txt .git .zip .md5 .asc .jpg .gif .png)
USER_AGENT = 'saush-spider'

class Spider

 # start the spider
 def start
 Hpricot.buffer_size = 204800
 process(YAML.load_file(LAST_CRAWLED_PAGES))
 end

 # process the loaded pages
 def process(pages)
 robot = Robots.new USER_AGENT
 until pages.nil? or pages.empty?
 newfound_pages = []
 pages.each { |page|
 begin
 if add_to_index?(page) then
 uri = URI.parse(page)
 host = "#{uri.scheme}://#{uri.host}"
 open(page, "User-Agent" => USER_AGENT) { |s|
 (Hpricot(s)/"a").each { |a|
 url = scrub(a.attributes['href'], host)
 newfound_pages << url unless url.nil? or !robot.allowed? url or newfound_pages.include? url
 }
 }
 end
 rescue => e
 print "\n** Error encountered crawling - #{page} - #{e.to_s}"
 rescue Timeout::Error => e
 print "\n** Timeout encountered - #{page} - #{e.to_s}"
 end
 }
 pages = newfound_pages
 File.open(LAST_CRAWLED_PAGES, 'w') { |out| YAML.dump(newfound_pages, out) }
 end
 end

 # add the page to the index
 def add_to_index?(url)
 print "\n- indexing #{url}"
 t0 = Time.now
 page = Page.find(scrub(url))

 # if the page is not in the index, then index it
 if page.new_record? then
 index(url) { |doc_words, title|
 dsize = doc_words.size.to_f
 puts " [new] - (#{dsize.to_i} words)"
 doc_words.each_with_index { |w, l|
 printf("\r\e - %6.2f%",(l*100/dsize))
 loc = Location.new(:position => l)
 loc.word, loc.page, page.title = Word.find(w), page, title
 loc.save
 }
 }

 # if it is but it is not fresh, then update it
 elsif not page.fresh? then
 index(url) { |doc_words, title|
 dsize = doc_words.size.to_f
 puts " [refreshed] - (#{dsize.to_i} words)"
 page.locations.destroy!
 doc_words.each_with_index { |w, l|
 printf("\r\e - %6.2f%",(l*100/dsize))
 loc = Location.new(:position => l)
 loc.word, loc.page, page.title = Word.find(w), page, title
 loc.save
 }
 }
 page.refresh

 #otherwise just ignore it
 else
 puts " - (x) already indexed"
 return false
 end
 t1 = Time.now
 puts "  [%6.2f sec]" % (t1 - t0)
 return true
 end

 # scrub the given link
 def scrub(link, host=nil)
 unless link.nil? then
 return nil if DO_NOT_CRAWL_TYPES.include? link[(link.size-4)..link.size] or link.include? '?' or link.include? '/cgi-bin/' or link.include? '&' or link[0..8] == 'javascript' or link[0..5] == 'mailto'
 link = link.index('#') == 0 ? '' : link[0..link.index('#')-1] if link.include? '#'
 if link[0..3] == 'http'
 url = URI.join(URI.escape(link))
 else
 url = URI.join(host, URI.escape(link))
 end
 return url.normalize.to_s
 end
 end

 # do the common indexing work
 def index(url)
 open(url, "User-Agent" => USER_AGENT){ |doc|
 h = Hpricot(doc)
 title, body = h.search('title').text.strip, h.search('body')
 %w(style noscript script form img).each { |tag| body.search(tag).remove}
 array = []
 body.first.traverse_element {|element| array << element.to_s.strip.gsub(/[^a-zA-Z ]/, '') if element.text? }
 array.delete("")
 yield(array.join(" ").words, title)
 }
 end
end

$stdout.sync = true
spider = Spider.new
spider.start

The Spider class is a full-fledged Internet crawler. It loads its seed URLs from a YAML file named seed.yml and processes each URL in that list. To play nice and comply with the Robots Exclusion Protocol, I use a slightly modified Robots library based on Kyle Maxwell’s robots library, and set the name of the crawler as ‘saush-spider’. As the crawler goes through each URL, it tries to index each one of them. If it successfully indexes the page, it goes through each link in the page and tries to add it to newly found pages bucket if it is allowed (checking robots.txt), or if if it not already added. At the end of the original seed list, it will take this bucket and replace it in the seed URLs list in the YAML file. This is effectively a breadth-first search that goes through each URL at every level before going down to the next level. Please note that I did some minor modifications to the robots library and the link to this is my fork out of GitHub, not Kyle’s.

The flow of adding to the index goes like this. First, we need to scrub the URL by taking out URLs that are suspected to be web applications (like Google Calendar, Yahoo Mail etc) we check if the URL is in the index already, if it’s not, we proceed to index it. If it already is, we’ll check for the page’s freshness and re-index it if it’s stale. Otherwise, we’ll just leave it alone.

That’s it! The crawling part is rather simple but slow. To use this effectively we will need to add in capabilities to run a whole bunch of spiders at the same time but this will do for now. Let’s get to the indexing part next.

Index

Indexing involves parsing and storing data that has been collected by the crawler. The main objective here is to build a index that can be used to query and retrieve the stored data quickly. But what exactly is an index?

An index in our context is a data structure that helps us to look up a data store quickly. Take any data store that has N number of records. If we want to find a document in this data store we would need to go through each record to check if that’s the one we’re looking for before proceeding to the next. If it takes 0.01s to check each document and if we have 100,000 records, we’ll take 0.01 second at best case, 1,000 seconds to search through the data store at worst case and 500 seconds on average. Performance is therefore time linear or we say it takes O(N) time. That’s way too slow for any search engine to effectively handle millions or billions of records.

To improve the search performance, we insert an intermediate data structure between the query and the actual data store. This data structure, which is the index, allows us to return results in much improved times.

There are many different types of data structures that can be used as an index to the data store but the most popular one used by search engines is the inverted index. A normal way of mapping documents to words (called the forward index) are to have each document map to a list of words. For example:
se01

An inverted index however maps words to the documents where they are found.

se02

A quick look at why inverted indices is much faster. Say we want to find all documents in the data store that has the word ‘fox’ in it. To do this on a normal forward index, we need to go through each record to check for the word ‘fox’ i.e. in our example above, we need to check the forward index 4 times to find that it exists in document 2 and 3. In the case of the inverted index, we need to just go to the ‘fox’ record and find out that it exists in document 2 and 3. Imagine a data store of a millions of records and you can see why an inverted index is so much more effective.

In addition, a full inverted index also includes the position of the word within the document:

se03

This is useful for us later during the query phase.

SaushEngine implements an inverted index in a relational database. Let’s look at how this is implemented. I used MySQL as it is the most convenient but with some slight modifications you can probably use any relational database.

se04

The Pages table stores the list of web pages that has been crawled, along with its title and URL. The updated_at field determines if the page is stale and should be refreshed. The Words table stores a list of all words found in all documents. The words in this table however are stemmed using the Porter stemmer prior to storing, in order to reduce the number of words stored. An improved strategy could be to also include alternate but non-standard synonyms for e.g. database can be referred as DB but to keep things simple, only stemming is used. By this time you should recognize that Locations table is the inverted index, which maps words and its positions to the document. To access and interact with these tables through Ruby, I used DataMapper, an excellent Ruby object-relational mapping library. However a point to note is that DataMapper doesn’t allow you to create indices on the relationship foreign keys, so you’ll have to do it yourself. I found that putting 3 indices — 1 for word_id, 1 for page_id and 1 with word_id and page_id improves the query performance tremendously. Also please note that the search engine requires you to have both dm-core and dm-more so remember to install those 2 gems.

Now that we have the index, let’s go back and look at parsing the information that is retrieved through the spider. The bulk of the work is in the index and add_to_index? methods in the Spider class. The index method is called from the add_to_index? method, along with the scrubbed URL and a code block that utilizes DataMapper to add the words and its locations into the index.

Using Open URI, the web page is extracted and Hpricot is used to parse the HTML. I take only the body of the document, and removed non-text portions of the document like inline scripts and styles, forms and so on. Going through each remaining element, I take only the explicitly text elements and drop the rest. Next, I tokenize the text. My strategy is pretty simple, which is to delimit the text by white spaces only. As a side note, while this works pretty ok in English text, it doesn’t work as well in other languages especially Asian languages such as Chinese or Korean (where sometimes there are no white space delineation at all). The resulting tokens, which are words, are run through the Porter stemmer to produce word stems. These word stems are the ones that are added to the index.

Another side note — if you’re not a Rubyist or a beginner Rubyist you might interested to see that I actually passed an entire code block into the index method. This block differs if it’s a new web page or a refreshed web page. Doing this reduces the amount of duplicate code I had to write which makes the code easier to maintain and to read.

Also if you’re wondering what happened to keyword meta-tags, you’re right, I didn’t use them. There is no big philosophical reasoning why I skipped them — I just wanted to focus on the document text and not give any special consideration to keywords in the meta tag.

Finally let’s look at the query phase of the search engine, which all that we’ve talked about before comes together.

Query

The query phase of the search engine allows a user to express his intent and get relevant results. The search query input is most often limited to a single text box. As a result of this constraint, there are a number of problems. Firstly, because the user can only express his question in a single line of text, the true intention of the user can vary widely. For example, if I type in ‘apple’ as the search input, do I mean the computer company or the fruit? There is no way the search engine can find out absolutely from the user. Of course, a more seasoned search user would put in more clues, for example, ‘apple mac’ or ‘apple ipod’ will return the more accurate intention.

Also, all intentions must be express only in text. For example, if I want to find a document that was written after a certain date, or a document written by at least 3 collaborators, or a photo with a particular person in it, expressing it in a text query is pretty challenging. This is the problem posed in parsing and identifying the search query, the first part of the query phase. For SaushEngine, I made broad assumptions and parsing the search query becomes only a matter of tokenizing the search input into words and stemming the words.

The second part of the query phase is to return relevant results. This is usually implementation specific and is dependent on the index that has been generated. The challenge in this part doesn’t really lie here, it lies in sorting the results by relevance. Although this sounds simple, it can be a truly profound problem with an amazing variety of solutions. Choosing the correct sorting solutions (also called ranking algorithms), generating the correct sorted relevance and responding with reasonable performance is the balance a search engine designer must make.

There are a large number of ranking algorithms that can be used for a search engine. The most famous one is probably PageRank, the ranking algorithm created by the Google founders, Larry Page and Sergey Brin. PageRank assigns a score for every page depending on the number and importance of pages that links to it and uses that to rank the page. Other algorithms use different methods, from getting feedback from the search engine users to rank the search results (the more times a user clicks on a search result, the higher it is ranked) to inspecting the contents of the page itself for clues on its relevance.

For SaushEngine, I chose 3 very simple ranking algorithms that are based on the content of the page itself:

  1. Frequency of the search words
  2. Location of the search words
  3. Distance of the search words

Frequency

The frequency ranking algorithm is also quite simple. The page that has more of the search words is assumed to be more relevant.

Location

The location ranking algorithm is very simple. The assumption made here is that if the search word is near to the top of the document, the page is more relevant.

Distance

The distance ranking algorithm inspects the distance of the search words between each other on every page. The closer the words are to each other on a page, the higher that page will be ranked. For example, if I search for ‘brown fox’ in these 2 documents:

1 – The quick brown fox jumped over the lazy dog
2 – The brown dog chased after the fox.

The will both turn up the search results, but document 1 will be more relevant as the distance between ‘brown’ and ‘fox’ in document 1 is 0 while in document 2 it is 4.

Let’s see how SaushEngine implements these 2 ranking algorithms. SaushEngine’s ranking algorithms are in a class named Digger, in a file named digger.rb. Here’s the full source code to digger.rb:

require 'index'

class Digger
 SEARCH_LIMIT = 19  

 def search(for_text)
 @search_params = for_text.words
 wrds = []
 @search_params.each { |param| wrds << "stem = '#{param}'" }
 word_sql = "select * from words where #{wrds.join(" or ")}"
 @search_words = repository(:default).adapter.query(word_sql)
 tables, joins, ids = [], [], []
 @search_words.each_with_index { |w, index|
 tables << "locations loc#{index}"
 joins << "loc#{index}.page_id = loc#{index+1}.page_id"
 ids << "loc#{index}.word_id = #{w.id}"
 }
 joins.pop
 @common_select = "from #{tables.join(', ')} where #{(joins + ids).join(' and ')} group by loc0.page_id"
 rank[0..SEARCH_LIMIT]
 end

 def rank
 merge_rankings(frequency_ranking, location_ranking, distance_ranking)
 end

 def frequency_ranking
 freq_sql= "select loc0.page_id, count(loc0.page_id) as count #{@common_select} order by count desc"
 list = repository(:default).adapter.query(freq_sql)
 rank = {}
 list.size.times { |i| rank[list[i].page_id] = list[i].count.to_f/list[0].count.to_f }
 return rank
 end  

 def location_ranking
 total = []
 @search_words.each_with_index { |w, index| total << "loc#{index}.position + 1" }
 loc_sql = "select loc0.page_id, (#{total.join(' + ')}) as total #{@common_select} order by total asc"
 list = repository(:default).adapter.query(loc_sql)
 rank = {}
 list.size.times { |i| rank[list[i].page_id] = list[0].total.to_f/list[i].total.to_f }
 return rank
 end

 def distance_ranking
 return {} if @search_words.size == 1
 dist, total = [], []
 @search_words.each_with_index { |w, index| total << "loc#{index}.position" }
 total.size.times { |index| dist << "abs(#{total[index]} - #{total[index + 1]})" unless index == total.size - 1 }
 dist_sql = "select loc0.page_id, (#{dist.join(' + ')}) as dist #{@common_select} order by dist asc"
 list = repository(:default).adapter.query(dist_sql)
 rank = Hash.new
 list.size.times { |i| rank[list[i].page_id] = list[0].dist.to_f/list[i].dist.to_f }
 return rank
 end

 def merge_rankings(*rankings)
 r = {}
 rankings.each { |ranking| r.merge!(ranking) { |key, oldval, newval| oldval + newval} }
 r.sort {|a,b| b[1]<=>a[1]}
 end
end

The implementation is mostly done in SQL, as can be expected. The basic mechanism is to generate the SQL queries (one for each ranking algorithm) from the code and send it to MySQL using a DataMapper pass-through method. The results from the queries are then processed as ranks and are sorted accordingly by a rank merging method.

Let’s look at each method in greater detail. The search method is the main one, which takes in a text string search query as input. This search query is then broken down into different word stems and we try to look for these words in our index, to get their row IDs for easier and faster manipulation. Next, we create the common part of the search query, something that is needed for each subsequent query and call the rank method.

The rank method is an aggregator that calls a number of ranking methods in sequence, and merges the returned rank lists together. A rank list is nothing more than an array of 2 item arrays, with the first item being the word ID and the second item being its rank by that algorithm:

[ [1123, 0.452],
[557 , 0.314],
[3263, 0.124] ]

The above means that there are 3 words in the rank list, the first being a word with the word ID 1123 and this word is ranked with a number 0.452, the second word is 557 and so on. Merging the returned rank lists would just mean that if we have the same words in the different rank lists but given a different ranking number, we add the rank numbers together.

Let’s look at the ranking algorithms. The easiest is the frequency algorithm. In this algorithm, we rank the page according to the number of times the searched words appear in that page. This is the SQL query that is normally generated from a search with 2 effective terms:

SELECT loc0.page_id, count(loc0.page_id) as count FROM locations loc0, locations loc1
WHERE loc0.page_id = loc1.page_id AND
  loc0.word_id = 1296 AND
  loc1.word_id = 8839
GROUP BY loc0.page_id ORDER BY count DESC

This returns a resultset like this:
se05

which obviously tells us which page has the most count of the given words. To normalize the ranking, just divide all the word counts with the largest count (in this case it is 1575). The highest ranked page has a rank count of 1, while the rest of the pages are ranked at numbers smaller than 1.

The location algorithm is about the same as the frequency algorithm. Here, we rank the page according to how close the word is to the top of the page. With multi-word searches, this becomes an exercise in adding the locations of each word. The is the SQL query from the same search:

SELECT loc0.page_id, (loc0.position + 1 + loc1.position + 1) as total FROM locations loc0, locations loc1
WHERE loc0.page_id = loc1.page_id AND
  loc0.word_id = 1296 AND
  loc1.word_id = 8839
GROUP BY loc0.page_id ORDER BY total ASC

This results a result set like this:
se06

which again obviously tells us which page has those words closest to the top of the page. To normalize the ranking however, we can’t use the same strategy as before, which is to divide each count by the largest count, as the lower the number is, the higher it should be ranked. Instead, I inversed the results (dividing the smallest total by each page total). For the smallest total this produces a rank count of 1 again, and the rest are ranked at numbers smaller than 1.

The previous 2 algorithms were relatively simple, the word distance is slightly tricky. We want to find the distance between all the search terms. For just 1 search term, this is a non-event as there is no distance. For 2 search terms, this is also pretty easy as it is only searching the distance between 2 words. It becomes tricky for 3 or more search terms. Should I find the distance between word 1 and 2 then word 1 and 3 then word 2 and 3? What if there are 4 or more terms? The growth in processing becomes factorial. I opted for a simpler approach, which is, for 3 terms, to find the distance between word 1 and 2 and then word 2 and 3. For 4 terms, it will be the distance between word 1 and word 2, then word 2 and word 3 and finally word 3 and word 4. The growth in processing then becomes a more manageable linear growth.

This is the SQL generated by this approach for 3 terms (adding 1 more additional term from above):

SELECT loc0.page_id, (abs(loc0.position - loc1.position) + abs(loc1.position - loc2.position)) as dist
FROM locations loc0, locations loc1, locations loc2
WHERE loc0.page_id = loc1.page_id AND
  loc1.page_id = loc2.page_id AND
  loc0.word_id = 1296 AND
  loc1.word_id = 8839 AND
  loc2.word_id = 8870
GROUP BY loc0.page_id ORDER BY dist ASC

I use the abs function in MySQL to get the distance between 2 word locations in the same page. This returns a resultset like this:

se07

Like the location algorithm, the smaller the distance, the more relevant the page is, so the same strategy in creating the page rank.

The rankings that are returned by the different ranking algorithms are added up together equally to give a final ranking of each page. While I put equal emphasis on each ranking algorithm, this doesn’t necessarily need to be so. I could have easily put in parameters that would weigh the algorithms differently.

We’ve finished up the search engine proper. However search engines need an interface for users to interact with. We need wrap a nice web application around SaushEngine.

User Interface

I chose Sinatra, a minimal web application framework, to build a single page web application that wraps around SaushEngine. I chose it because of its style and simplicity and its ability to let me come up with a pretty good interface in just a few lines of code.

For this part of SaushEngine, you’ll need to install the Sinatra gem. We have 2 files, this is the Sinatra file, ui.rb:

require 'rubygems'
require 'digger'
require 'sinatra'

get '/' do
  erb :search
end

post '/search' do
  digger = Digger.new
  t0 = Time.now
  @results = digger.search(params[:q])
  t1 = Time.now
  @time_taken = "#{"%6.2f" % (t1 - t0)} secs"
  erb :search
end

Finally you need a view template (search.erb) to be its main user interface. I use ERB for templating because it’s the most familiar to me. This is how it turns out:

se08

This is obviously not an industrial strength or commercially viable search engine. I wrote this as a tool to describe how search engines are written. There are much things to be improved, especially on the crawler, which is way too slow to be effective (unless it is a focused crawler) or the index (stored in a few MySQL tables but would have problems with a large index) or the query (ranking algorithms are too simplistic).

There are obviously a lot of literature out there I have referenced, many of them are now hyperlinked from this article. However, a book that have influenced me greatly especially in designing the index in MySQL is Toby Segaran‘s Programming Collective Intelligence.

If you’re interested in extending it please feel free to take this code but would appreciate it if you drop me a note here. If you’re looking at an actual deployment please take at look at http://saushengine.saush.net. Note that this link might not be up at all times. If you’re looking for the code, you can find it here at git://github.com/sausheong/saushengine.git.

About these ads

125 Responses

Subscribe to comments with RSS.

  1. Tran Duc Minh said, on March 17, 2009 at 11:10 pm

    So amazing,
    Actually i did not believe Ruby ‘s strength so far. After reading this, it changed my mind.
    How deep do we traverse in a url?
    Ah, i could not find your seeds.yaml
    How big is seeds file in a real search engine? Is it similar to a big DNS file at that time?

    Thanks,
    Minh

  2. sausheong said, on March 18, 2009 at 12:42 am

    You can find seed.yml and the rest of the source code from GitHub. Ruby is a beautiful language and it has always been a pleasure to program in it.

    The crawler as described above is a pretty weak piece of software. Today it goes breadth-first without much thought, just a mindless spider looking at every URL and sucking everything in. Most modern crawlers have some form of intelligence in determining if it should at all extract the URL, depending on it’s importance, priority and other criteria set by the search engine itself.

    To look at a sturdier search engine I would recommend looking at Lucene or Nutch or Sphinx. SaushEngine is mainly an educational tool :)

  3. Raz said, on March 18, 2009 at 3:27 am

    I have been working on something very similar to this for a few weeks.

    What kinds of speeds are you getting for your search, how large is your dataset of crawled pages, and how do you think this scales.

    The reason i ask is in my case i had 2 million records, with about ~60 “words” per record. All the queries i tried were getting crushed because all my queries have to have multiple “words” So when you search you might be searching for something that contains up to 10 different “words.”

    I keep quoting words because instead of actual words i have content identifiers.

    I would really like to hear your speeds and discuss this more.

    My queries were working well, but querying though all 2 million datasets and the content of all of them was destroying my speed to the order of minutes.

  4. Scott said, on March 18, 2009 at 4:24 am

    Im not a ruby guy, but good job. Im impressed. Never knew it could be done in such little code.

  5. [...] Saush.com gibt in einem ausführlichen Tutorial preis, wie man mit knapp 200 Zeilen RubyCode  seine eigene Suchmaschine baut. [...]

  6. Elijah Grey said, on March 18, 2009 at 9:32 am

    I didn’t review all the code that throughly, but I see that you mentioned that any url with an “&” isn’t indexed. I’m assuming that this is only the case if it follows a “?”. If this isn’t the case, I think you should change it.

  7. Ashutosh said, on March 18, 2009 at 10:47 am

    Sphinx ( http://www.sphinxsearch.com/ ) can be used for indexing and querying. It’s fast and scalable. There’s even a Ruby plugin for Sphinx at http://rubyforge.org/projects/sphinx/

  8. [...] Write an Internet Search Engine with 200 lines of Ruby code – Saush.com takes an intriguing look at the mechanisms which drive your typical search engine using a extremely simple example, written in the easy-to-learn language, Ruby. Great article for wanna-be web developers. [...]

  9. Eric Grunin said, on March 18, 2009 at 2:11 pm

    Very nice. Having done one of these myself a few years ago, I suggest you consider two small additions that will make a big difference:

    1) Checking to see if you’ve already searched a given URL gets very slow, no matter how you index. Instead, calculate a hash of the URL and index on that.

    2) The query needs to assign each word a weight based on its scarcity. For example, a search on [saush search] should rank a page containing [saush search sauch] ahead of [search saush search], because ‘saush’ is rarer and therefore more valuable.

    Neither of these takes a lot of code.

  10. Bjoern said, on March 19, 2009 at 12:44 am

    Great article. I really enjoyed reading this, please keep up the good work !

  11. sausheong said, on March 19, 2009 at 1:47 am

    @Raz – my index is rather small at the moment, I’m still indexing in the 10′s of thousands. The reason is probably because I use all words in the document rather than selected ones (the number of words are in 100s and 1000s unlike yours which is in 10s). I’m sure my approach will eat up resources, but it gives me more accurate searches. Unfortunately it is very slow this way as well. My crawler is the weakest link in the chain at the moment and it’s not a very sturdy chain!

    @Elijah – thanks for the heads up. It’s a random assumption that I made to remove ‘&’s and I see that why the assumption is wrong.

    @Ashutosh – thanks for the comment. I used Sphinx before through Ultrasphinx. It’s a great search engine. The reason why I chose to re-write one on my own is purely as an educational tool though.

    @Eric – thanks for the tips. I will look into implementing them though at this point in time, the weakest part of the search engine is the crawler.

    @all the rest – thanks for your well wishes!

  12. ehsanul said, on March 19, 2009 at 6:55 am

    Nice, a beginner ruby coder like me will benefit a lot from this.

    Nikogori is supposed to be faster than Hpricot (much faster in some cases), and it can act as a drop-in replacement for it, so maybe you should check it out to speed up your crawling. :)

  13. Giavasan » Bad Day said, on March 19, 2009 at 7:00 am

    [...] per punti la logica alla base di un motore di ricerca web, mostrando come scriverne il codice in sole 200 linee di Ruby. Megapanzer, invece, tratta solo C++ ed è un blog interamente dedicato alla costruzione di un [...]

  14. [...] of course, but Sau’s made an attempt at implementing a basic search engine in Ruby and has written a pretty interesting, indepth article about the whole process. Sau’s search engine is formed of a crawler, indexer, and query system, and uses Hpricot, [...]

  15. Timothy said, on March 21, 2009 at 2:11 am

    Wow! Really cool. Good work.

  16. pligg.com said, on March 21, 2009 at 2:52 pm

    saush.com » Blog Archive » Write an Internet search engine with 200 lines of Ruby code…

    Shows example of web crawling and indexing….

  17. Abhijat said, on March 23, 2009 at 8:25 am

    It is enlightening to learn the concepts and the approach involved in making a search engine :)

    i really loved going through your explanation of the crawling and stemming techniques.

    The choice to use sinatra is also something worth noticing.

    Thanks

  18. Speedlinking 24 [16 a 22 de Março] said, on March 24, 2009 at 11:00 pm

    [...] 16. “Write an Internet search engine with 200 lines of Ruby code” [...]

  19. [...] em Ruby Como experiência, Sau tentou implementar uma search engine básica com Ruby e escreveu um artigo muito interessante e detalhado sobre o processo. A search engine do Sau é formada por um crawler, um indexador, e um sistema de pesquisa, e ele [...]

  20. hlxwell said, on April 1, 2009 at 10:24 am

    ????????????????????

    ??? stemmer ????????????stemmer??????????????????????????????????? – rmmseg-cpp http://hi.baidu.com/hlxwell/blog/item/9b398cd355f9f532960a1670.html
    ????????
    index? sphinx????????
    crawler????????????????????????????????

  21. hlxwell said, on April 1, 2009 at 10:24 am

    ????????????????????

    ??? stemmer ????????????stemmer??????????????????????????????????? – rmmseg-cpp http://hi.baidu.com/hlxwell/blog/item/9b398cd355f9f532960a1670.html
    ????????
    index? sphinx????????
    crawler????????????????????????????????

  22. hlxwell said, on April 1, 2009 at 10:40 am

    ????????????????????

    ??? stemmer ????????????stemmer??????????????????????????????????? – rmmseg-cpp http://hi.baidu.com/hlxwell/blog/item/9b398cd355f9f532960a1670.html
    ????????
    index? sphinx????????
    crawler????????????????????????????????

  23. [...] trying out Sinatra as the interface to my search engine, I got hooked to it. I liked the no-frills approach to developing a web application. I liked it so [...]

  24. anselm said, on April 2, 2009 at 11:57 pm

    What a nice piece of work. Great to see it all in once place.

    I modified class Digger for postgresql – the change is to add group by fields into the search method. This makes postgres happy. Here is the method that I changed:

    def search(for_text)
    @search_params = for_text.to_s.words
    wrds = []
    @search_params.each { |param| wrds << “stem = ‘#{param}’” }
    return [] if wrds.length < 1
    word_sql = “select * from words where #{wrds.join(” or “)}”
    @search_words = repository(:default).adapter.query(word_sql)
    tables, joins, ids = [], [], []
    posns = []
    @search_words.each_with_index { |w, index|
    tables << “locations loc#{index}”
    joins << “loc#{index}.page_id = loc#{index+1}.page_id”
    ids << “loc#{index}.word_id = #{w.id}”
    posns << “loc#{index}.position”
    }
    joins.pop
    @common_select = “from #{tables.join(‘, ‘)} where #{(joins + ids).join(‘ and ‘)} group by lo
    c0.page_id, #{posns.join(‘, ‘)}”
    rank[0..SEARCH_LIMIT]
    end

  25. Jonathan Nelson said, on April 4, 2009 at 7:15 am

    @sausheong – you are a true inspiration to me. thank you for this writing.

  26. Link Building Blog said, on April 7, 2009 at 3:53 am

    Very well written. This is the kind of information that is useful to those want to increase their SERP’s. Keep up the good work.

  27. [...] saush.com » Blog Archive » Write an Internet search engine with 200 lines of Ruby code [...]

  28. [...] Sheong Chang>> I picked up Sinatra first when I was writing my search engine and I was looking for a simple way to write my search engine interface. The simplicity of Sinatra [...]

  29. BloggerDude said, on October 9, 2009 at 8:24 am

    I don’t know If I said it already but …Hey good stuff…keep up the good work! :) I read a lot of blogs on a daily basis and for the most part, people lack substance but, I just wanted to make a quick comment to say I’m glad I found your blog. Thanks,)

    A definite great read….

  30. Dalibor Nasevic said, on November 16, 2009 at 6:18 am

    Here is my fork at GIthub: http://github.com/dalibor/saushengine

    Changes:
    - Code moved to Ruby on Rails framework
    - Spider uses Active Record ORM
    - Fixed proper indexing of cyrilic characters
    - Rescued Timeout errors in robots library

  31. Abraham Si said, on January 14, 2010 at 6:56 pm

    Is this multithreaded? I think that the crawler, indexer can run as 1 thread. But the query stuff has to be separated into another thread. Or are you runnng them thru 2 ruby instances?

    Abraham

  32. Santosh Waghmare said, on February 27, 2010 at 8:18 pm

    Hi Dear,

    I am very initial level ruby programmer, want some stuffs on crawler and seen your
    nice piece of code, but I am unable to configure it on my machine. I am getting error “required –dm-more”.

    Also I am unable to pluin the dm-more with your code.

    Please help me on the same issue.

    Regards,
    Santosh Waghmare

  33. D3VELOPER said, on March 16, 2010 at 11:12 am

    Great work, but I have a question about the Indexing phase, What is the best way to store the inverted index ? using the Database or What …? from many different views like storage cost, performance, speed of indexing and quey?

  34. Best Vegetarian Cookbooks said, on July 11, 2010 at 8:59 pm

    I am not much into reading blogs and articles, but somehow I read some articles on your blog and found the information helpful and interesting.

  35. fax software said, on August 12, 2010 at 2:48 pm

    Quite good topic

  36. Steffan Rosenthal said, on September 15, 2010 at 1:05 am

    Of course, what a great site and informative posts, I will add backlink – bookmark this site? Regards,
    Reader.

  37. infomaven said, on December 7, 2010 at 6:06 am

    Thanks for a concise, elegant explanation. I don’t know much about Ruby, but reading your article makes me want to learn it even more.

    Do you get to use Ruby at your job, or is it mainly a labor of love?

  38. 2010 in review « saush said, on January 2, 2011 at 7:57 pm

    [...] Write an Internet search engine with 200 lines of Ruby code March 2009 37 comments and 2 Likes on WordPress.com 4 [...]

  39. umconcept said, on December 28, 2011 at 3:15 am

    Its a very comprehensive detail regarding Searh Engine Codes……………………..

  40. Effie said, on November 30, 2012 at 11:32 pm

    This page really has all the info I needed about this subject
    and didn’t know who to ask.

  41. Lashonda said, on December 15, 2012 at 2:11 pm

    When some one searches for his essential thing, so he/she wants to be available that in detail, so that thing is maintained over here.

  42. fem Dophilus for yeast Infection said, on April 19, 2013 at 5:55 am

    I really love your site.. Very nice colors & theme. Did you develop this web site yourself?
    Please reply back as I’m looking to create my own personal website
    and would like to find out where you got this from or exactly what the theme is named.
    Thank you!

  43. Ian John McKay (1953-1982) said, on April 20, 2013 at 6:07 am

    He believed in them and that given the chance these people will repay the loans.
    One of the comments I’ve seen is that he was nominated before February 1 of this year. Despite how it measures, it can still spark your interest, that’s for sure.

  44. Mitchell said, on April 23, 2013 at 10:54 am

    It’s an amazing article for all the online visitors; they will get advantage from it I am sure.

  45. Eric said, on April 24, 2013 at 1:07 am

    When I initially commented I clicked the “Notify me when new comments are added” checkbox and now
    each time a comment is added I get four e-mails with the same comment.
    Is there any way you can remove people from that service?
    Thank you!

  46. http://www.abstracts-on-line.com/ said, on April 27, 2013 at 10:18 am

    After all, Assange is facing extradition to Sweden for questioning in a rape case.
    It just seems difficult for men, especially, to settle down and not work.

    There are lots of people all over the world doing this.

  47. Facebook Profile said, on April 27, 2013 at 8:34 pm

    Good respond in return of this difficulty with genuine arguments and telling all concerning that.

  48. samualarm.wordpress.com said, on May 7, 2013 at 3:04 pm

    I’m really impressed with your writing skills as well as with the layout on your weblog. Is this a paid theme or did you modify it yourself? Either way keep up the excellent quality writing, it’s rare to
    see a nice blog like this one these days.

  49. Pretty great post. I simply stumbled upon your weblog and wished
    to say that I’ve really enjoyed surfing around your weblog posts. After all I’ll be
    subscribing in your rss feed and I hope you write again soon!

  50. Connie said, on May 12, 2013 at 5:57 am

    hi!,I like your writing so much! share we keep in touch extra about your post on AOL?

    I require a specialist in this space to resolve my problem.
    Maybe that’s you! Taking a look ahead to look you.

  51. Famous War Heroes said, on May 12, 2013 at 7:05 pm

    However, his detention has not stopped Xiaobo from acting as an outspoken critic of Chinese authorities.
    It just seems difficult for men, especially, to settle down and
    not work. s definition to include climate activism, human rights, and micro-financing.

  52. sarahedge.lovelyish.com said, on May 13, 2013 at 9:07 am

    Obama science adviser John Holdren proposed blasting sulphate particles into the atmosphere to block the rays of the sun.
    This can impact both worldwide and local patterns of climate and atmosphere-ocean
    circulation. Vegetables I sometimes grow in the Fall include beets, carrots, cabbage, brussel sprouts, lettuce, and
    broccoli.

  53. People who follow strict diets and workout plans, as well as those who
    received Slimband surgery, have better sleeping habits as compared with their
    routines before. More than 90 percent of patients with Type II Diabetes see optimistic results, sometimes with the disorder going into remission.
    All studies show the positive effects of bariatric surgery and could
    help type 2 diabetes and obese patients receive
    free insurance coverage for weight loss surgery.

  54. youngs purist centrepin reel said, on May 13, 2013 at 7:55 pm

    Hi there, I found your web site via Google at the same time as looking for a related
    subject, your site came up, it appears good. I have
    bookmarked it in my google bookmarks.
    Hello there, just changed into aware of your weblog thru Google, and located that it is really informative.
    I’m going to watch out for brussels. I will appreciate in the event you continue this in future. Lots of other people shall be benefited from your writing. Cheers!

  55. buy a car online said, on May 14, 2013 at 10:36 pm

    Great post. I used to be checking constantly this weblog and I am
    impressed! Very useful information specially the remaining section :
    ) I handle such information a lot. I was seeking this certain information for a long
    time. Thanks and best of luck.

  56. Continuing said, on May 15, 2013 at 12:01 pm

    I like it whenever people come together and share
    opinions. Great site, continue the good
    work!

  57. miu miu 新作 said, on May 17, 2013 at 5:24 am

    Great post.

  58. ski Vacation said, on May 18, 2013 at 12:05 pm

    Hey I am so delighted I found your web site, I really
    found you by accident, while I was researching on Aol for
    something else, Nonetheless I am here now and would
    just like to say thanks a lot for a remarkable post and a all round exciting
    blog (I also love the theme/design), I don’t have time to browse it all at the moment but I have saved it and also added your RSS feeds, so when I have time I will be back to read a great deal more, Please do keep up the great job.

  59. fossickerbooks.com said, on May 20, 2013 at 9:50 pm

    Hi! I simply would like to give you a huge thumbs up for your excellent info you have got right here
    on this post. I’ll be returning to your web site for more soon.

  60. Jayme said, on May 22, 2013 at 10:36 am

    Heya i am for the first time here. I found this board and
    I find It truly useful & it helped me out much.
    I hope to give something back and aid others like you
    helped me.

  61. cars said, on May 25, 2013 at 12:07 am

    All large purchases are always intimidating, especially if you are uninformed about the industry.
    One of the scariest purchases is buying cars. Many people
    fear they are getting ripped off and you surely don’t want that. Avoid buying a lemon by looking through these great tips and tricks regarding car purchases.

  62. skinception said, on May 25, 2013 at 1:57 am

    Right now it appears like WordPress is the preferred blogging platform out there right now.
    (from what I’ve read) Is that what you are using on your blog?

  63. please click the next website said, on May 25, 2013 at 9:47 am

    All you need to do is enter the loan amount, duration of the loan and all the
    other calculations are taken care of by the automated software.

    Can you afford to lend the money. The first is to look at debt consolidation loans through
    traditional lenders.

  64. When tomorrow’s bread is not assured from today’s labor,
    with day to day grim experiences, one is inclined to go for criminal
    options. Various flora and fauna compose calls, alerting Carter
    to the position of the men in the bush. Mandate a five-day, forty-hour work
    week for the House of Representatives and the Senate.

  65. small garden ideas images said, on June 1, 2013 at 1:19 am

    Other homeowners enjoy furniture for the garden that pop out and end up getting attention.
    Ornamental grasses and native wildflowers provide refuge to wildlife and add maintenance-free vegetation to your
    outdoor space. If you choose a fountain that is too big for your garden
    it will dominate the area and eclipse your entire garden.

  66. Carolyn said, on June 4, 2013 at 4:37 pm

    Hello There. I found your blog using msn. This is an extremely well
    written article. I’ll be sure to bookmark it and return to read more of your useful information. Thanks for the post. I’ll certainly comeback.

  67. www.gitenormandie.com said, on June 5, 2013 at 5:11 pm

    Very nice post. I just stumbled upon your weblog and wished to mention that I’ve truly loved browsing your weblog posts. In any case I’ll be
    subscribing on your rss feed and I’m hoping you write once more very soon!

  68. www.certpal.com said, on June 7, 2013 at 11:41 am

    Thanks for sharing such a pleasant opinion, article is fastidious, thats why i
    have read it fully

  69. The Cornwall Hotel Spa and Estate said, on June 7, 2013 at 12:44 pm

    I seldom leave a response, but i did a few searching and wound up here Write an Internet search engine with 200 lines of Ruby code | saush.

    And I do have some questions for you if it’s allright. Is it simply me or does it look like some of the comments appear like they are left by brain dead people? :-P And, if you are writing at other social sites, I would like to follow anything fresh you have to post. Could you list of the complete urls of all your community sites like your Facebook page, twitter feed, or linkedin profile?

  70. Paula said, on June 8, 2013 at 9:24 pm

    Amazing blog! Is your theme custom made or did you download it from somewhere?

    A design like yours with a few simple adjustements would really make my blog shine.
    Please let me know where you got your theme. Cheers

  71. wikispaces.com said, on June 8, 2013 at 11:12 pm

    Proponents of the paleolithic diet believe that the healthiest and most appropriate foods for us to eat are simple, unprocessed foods similar to the ones that
    were consumed by our early hunter-gatherer ancestors during the
    Paleolithic period. Over the last Christmas holiday, I thought
    it would be fun to stop working out, eat whatever I wanted, and drink as much as I wanted.
    One of those diets that have been in the media lately promising immediate results is the
    paleo diet.

  72. Ask yourself, how is it that the majority of things on the IARC 2B list, including engine exhausts, DDT,
    Kepone (also known as chlordecone, is a carcinogenic insecticide), PCBs and
    other industrial agents, receive serious regulatory
    attention. While this makes installation fairly simple,
    the unit must become part of your home dcor
    as it must be installed in a central location and will
    be visible. The study, carried out by a team of German researchers, investigated uveal melanoma a form of eye cancer called in which tumors
    form in the layer that makes up the iris and base of the
    retina.

  73. steps to get your ex back said, on June 14, 2013 at 6:07 am

    Please let me know if you’re looking for a author for your blog. You have some really great posts and I think I would be a good asset. If you ever want to take some of the load off, I’d absolutely
    love to write some content for your blog in exchange for a link back to mine.
    Please shoot me an e-mail if interested. Thanks!

  74. Mathias said, on June 15, 2013 at 9:07 pm

    Social media marketing focuses on writing good content and
    marketing this content, as well. Look past the yakkers,
    hobbyists, and political mobs. If you blog is interesting and helpful,
    followers will be more likely to continue reading it.

  75. nail fungus said, on June 16, 2013 at 3:53 am

    If you want to grow your knowledge just keep visiting this web site
    and be updated with the most up-to-date news update posted here.

  76. on Swiftys web site said, on June 19, 2013 at 7:43 pm

    Awesome issues here. I am very happy to look your post.
    Thanks a lot and I’m having a look forward to contact you. Will you kindly drop me a e-mail?

  77. It’s going to be end of mine day, but before end I am reading this enormous piece of writing to improve my knowledge.

  78. http://www.gzlpw.com said, on June 21, 2013 at 8:51 am

    The race has gone on since 1985 under several different names including The Winston, Winston Select and NASCAR NEXTEL All-Star Challenge.
    Of note, Jimmie Johnson has won the race twice, in 2003
    and 2006, as has Mark Martin (1998, 2005). Kyle Busch
    has been the top racer in this race recently, winning two of
    the last three Fed – Ex 400′s.

  79. insane home fat loss on youtube said, on June 25, 2013 at 4:55 am

    I’d like to thanks for this imformative post. I’m pretty fit
    but would like ot shed far more weight arought the intestine.
    Nevertheless, I research every little thing before I commence using it.
    These “caveats” helped me in looking forward to more solutions concerning this.
    I searched far more on this and identified some youtube videos by which
    a single girl was providing it praise and stated that it was
    safe due to the fact it was seen on the Dr. Oz show, and that anything around the Dr.
    Oz show should be secure. Wow…so far in the reality.
    You can find experts and Dr.s who know a lot more
    than Dr. Oz and there happen to be errors within the past with some thing that seemed to become “the miracle drug”

  80. Karolyn said, on July 1, 2013 at 5:33 pm

    Seed saving has been traditionally learned and passed down from generation to generation.
    Step 5-Place The Pot Against A Sunny Window – Indoor vegetable gardening, just like anything else, is all about location.
    You need to get that stuff down to a small grade so that
    it will be manageable.

  81. Lake Erie Counties Map said, on July 15, 2013 at 9:59 am

    Excellent goods from you, man. I’ve understand your stuff previous to and you’re just extremely great.
    I actually like what you have acquired here, certainly like
    what you’re saying and the way in which you say it. You make it entertaining and you still care for to keep it wise. I can not wait to read far more from you. This is actually a tremendous site.

  82. Sherman said, on July 16, 2013 at 9:21 am

    Thanks in support of sharing such a nice idea, paragraph is pleasant, thats
    why i have read it entirely

  83. Extra Blog said, on July 18, 2013 at 11:06 pm

    It is actually a great and useful piece of info.

    I am happy that you shared this helpful info with
    us. Please stay us informed like this. Thank you for sharing.

  84. Candy said, on July 22, 2013 at 11:52 am

    I really like it when people come together and share views.
    Great site, continue the good work!

  85. Rosemarie said, on July 24, 2013 at 1:03 am

    Because the admin of this website is working, no
    uncertainty very rapidly it will be famous, due to its feature contents.

  86. drinkbar said, on July 24, 2013 at 1:14 am

    Hey there I am so happy I found your weblog, I really found
    you by accident, while I was searching on Digg for something else, Anyhow I am here now
    and would just like to say many thanks for a fantastic post and a all round thrilling blog (I also love the theme/design), I don’t have time to go through it all at the minute but I have book-marked it and also added in your RSS feeds, so when I have time I will be back to read a great deal more, Please do keep up the superb job.

  87. Hey there! I just wanted to ask if you ever have any trouble
    with hackers? My last blog (wordpress) was hacked
    and I ended up losing many months of hard work due to no back up.
    Do you have any solutions to protect against hackers?

  88. Jaime said, on July 24, 2013 at 1:39 am

    What you said was actually very reasonable.
    But, think on this, what if you composed a catchier title?

    I mean, I don’t wish to tell you how to run your website, but suppose you added a post title that grabbed a person’s attention?
    I mean Write an Internet search engine with 200 lines of Ruby code
    | saush is kinda plain. You ought to peek at Yahoo’s home page and see how they create news titles to get people interested. You might try adding a video or a picture or two to get readers interested about everything’ve written.
    Just my opinion, it would bring your website a little livelier.

  89. Icam4.blogspot.com said, on July 28, 2013 at 7:22 am

    Flavonoid aglycones are responsible for the activity.
    Natural products from dietary components such as Indian species and medicinal plants
    are known to possess antioxidant activity.
    I have been using two security camera systems for over a year and have been more pleased with
    the Logitech 750i system over the First Alert system.

  90. 公司清潔 said, on September 3, 2013 at 5:58 pm

    Cool blog! Is your theme custom made or did you download it from
    somewhere? A theme like yours with a few simple tweeks would really make
    my blog stand out. Please let me know where you got your theme.
    Thanks a lot

  91. Amysrobot.Com said, on September 5, 2013 at 9:12 am

    The variations can easily be adapted to envelop stir-fry with an oriental flavor, curries for Indian fare and used to serve spinach and other
    dips for family gatherings and parties. How about making a delightful and easy to make chicken salad.
    If you desire a longer and better quality of life choosing include HEALTHY EATING HABITS into your lifestyle is a smart choice.

  92. microsoft office 2003 compatibility pack said, on October 18, 2013 at 1:55 am

    The final option is to have the panels pressure treated;
    this allows you to never worry about treating the shed yourself as the pressure
    treatment will fully protect the wood. But I will show you that everyone
    can do it with just a couple of tools very fast based on a proper design.
    Using the level, check to make sure the foundation is level.

  93. Librerias en guadalajara said, on October 18, 2013 at 1:56 am

    Hi it’s me, I am also visiting this web site daily, this site
    is in fact good and the viewers are actually sharing good thoughts.

  94. Thanks , I have recently been looking for information
    approximately this subject for a long time
    and yours is the greatest I have discovered till now. But, what
    about the bottom line? Are you positive concerning the supply?

  95. jobcentre.bobbyand.com said, on October 18, 2013 at 6:04 am

    My brother recommended I may like this web site.
    He used to be totally right. This submit actually
    made my day. You cann’t consider just how much time I had spent for this info!
    Thanks!

  96. luxury dog collars said, on October 18, 2013 at 11:07 am

    I like the valuable info you provide in your articles.
    I will bookmark your weblog and check again here regularly.
    I’m quite sure I’ll learn plenty of new stuff right here!

    Good luck for the next!

  97. atkinsa dieta said, on November 28, 2013 at 12:35 pm

    Na stałe. Zdrowie. 4.Nośmy w sąsiedztwie sobie landrynki, tik-taki itp.
    1.Osoba, które asystent. Świetnie. Rywalizacji
    jest niezmiernie silna natomiast większość, że chce cisnąć?
    Świadomość przydatny efektywnie papierosa. 2.Rywalizujcie między wypalane każda punkt programu kwoty paczki papierosa.

    2.Rywalizacja W ciągu jederman uzależnienie, bachnąć
    aż do porzucenia “przyzwyczajenia” na stałe. 4. Zdrowie.Nośmy.

  98. care2.com said, on December 14, 2013 at 8:35 pm

    He is the founder of Philly Hypnosis, The Philadelphia area’s Premiere Neuro-Medical Hypnosis Practice.
    Recent clinical research has shown that L-Theanine is the component in green tea responsible for the feeling of relaxation,
    serenity and focus associated with drinking green
    tea. In this place anxiety, fear, stress and anger do
    not exist.

  99. google.com said, on December 18, 2013 at 10:16 am

    This is a topic which is close to my heart…
    Many thanks! Exactly where are your contact details though?

  100. quanto costa phen375 said, on December 22, 2013 at 2:29 am

    Click on the hyperlink under to read consumer reviews of the Internets prime ranked fat burners, together with New
    Physique New Life, Burn the Fat, Eat Stop Eat and more.

    Phen375 not only controls your appetite, it also helps you preserve a high level of energy to make you burn more calories.
    It was initially launched back in 2009 and shot to fame when several celebrities mentioned that they were firm
    advocates.

  101. youtube online downloader said, on March 1, 2014 at 11:19 pm

    In the early 1990s the mouse became common place and the price started to drop, and
    by the mid 1990s computers were including a second USB port solely for the mouse.
    Employment is a contract between two parties,
    one being the employer and the other being the employee.
    The above described ideas have a great ptential to bring in lot of money as a part of yyour profession.

  102. girls twin bed said, on March 5, 2014 at 3:17 am

    Charlie proposed to her just last Saturday. I’ll just give you the basics
    to start with: Always swallow. Our “intimate” time together began to change drastically.

  103. jugar maquina tragaperras said, on March 11, 2014 at 1:29 pm

    Wow, this post is good, my younger sister is analyzing these things,
    so I am going to inform her.

  104. ชุดคอสเพลย์ said, on March 14, 2014 at 10:29 pm

    hey there and thank you for your information – I have definitely picked up anything new from right here.
    I did however expertise several technical issues using this site, as I experienced to reload the site a lot of times previous to
    I could get it to load correctly. I had been wondering if your hosting is OK?
    Not that I’m complaining, but slow loading instances times will sometimes affect
    your placement in google and can damage your high-quality
    score if ads and marketing with Adwords. Well I am adding this RSS
    to my email and can look out for much more of your respective intriguing content.
    Make sure you update this again very soon.

  105. masseuse said, on April 1, 2014 at 9:08 pm

    Ahaa, its good dialogue regarding this paragraph at this place at this weblog,
    I have read all that, so at this time me also commenting at this place.

  106. primark franchise said, on April 4, 2014 at 5:26 pm

    A number of our products are exclusively manufactured for Essential Beauty.
    When you are planning to franchise your business always ensure that you tie-up with a professional consultant who can
    help you in developing the program and the whole structure.
    Who is expert in setting that up so that there isn’t any negative tax implications.

  107. how to lose belly fat said, on April 19, 2014 at 11:41 am

    These are the respiratory and the cardiovascular systems.
    Because it’s high-intensity, you only need about 30 – 45 minutes
    a day, 3 times a week to achieve your belly fat loss goal.
    That core portion of our physique is accountable for keeping us upright.

  108. kafarek.com.pl said, on May 24, 2014 at 12:57 am

    I don’t even know how I ended up here, but
    I thought this post was good. I don’t know who you are but certainly you’re going to a famous blogger if you are not
    already ;) Cheers!

  109. Santtu said, on May 24, 2014 at 3:41 pm

    Its like you learn my mind! You seem to grasp so
    much about this, like you wrote the e-book in it orr something.
    I believe that you just could do with a few percent to drive the message home
    a little bit, however other than that, this is excellent blog.

    An excellent read. I’ll definitely be back.

  110. Amparo said, on May 26, 2014 at 5:12 pm

    Hey just wanted to give you a quick heads up and let you know a
    few of the images aren’t loading properly. I’m not sure
    why but I think its a linking issue. I’ve tried it in two different web browsers and both show the same results.

  111. ทัวร์ยุโรป said, on June 4, 2014 at 4:39 pm

    Great blog! Is your theme custom made or did you download it from somewhere?
    A theme like yours with a few simple tweeks would really make my blog shine.
    Please let me know where you got your design. Cheers

  112. There are a lot of perpetrators who only endeavor to do more harm rather than to help.

    You try to maintain peace and quiet around you, and help in creating a positive atmosphere around you.
    Also, Cuba sent athletes to France, becoming the first Caribbean country to do
    so in the history of the game. Senegal, a Francophone
    nation in sub-Saharan Black Africa, also had a good performance in Far
    East. La planta es muy robusto, y no requiere fumigacion.
    What I present to you here is just a smattering of what you can explore
    in this landlocked country which spans the extremes between Andean and Amazonian landscapes and cultures.

  113. fat burner said, on June 13, 2014 at 1:37 am

    Everything iss very oen with a cear description of the issues.
    It was definitely informative. Your site is extremely helpful.
    Thank you for sharing!

  114. Free Xbox Gift Cards said, on July 6, 2014 at 3:14 pm

    It is actually ideal time to make a several ideas for the end and it’s also time for you to be very glad. I’ve understand this text of course, if I may simply I need to suggest you actually several exciting problems or recommendations. Perhaps you can create future articles or blog posts in regards to this report. I would like to go through a lot more things about this!

  115. new paleo books said, on July 8, 2014 at 6:05 am

    Hi there to all, for the reason that I am truly eager of reading this
    blog’s post to be updated on a regular basis. It includes fastidious information.

  116. SEO said, on July 10, 2014 at 3:23 am

    It’s an awesome paragraph in favor of all the online viewers; they will take advantage from it I am sure.

  117. Lance said, on July 10, 2014 at 7:30 pm

    Too much else can be a very bad thing so as long
    because you don’t drink too much, you’ll be fine.
    From decaf, to breakfast blends, dark roasts, extra bold, flavored coffees,
    world-class hot chocolates, ciders and more, you will possess scores of options with all the k cup system
    to your single-cup brewing. In this type of coffee brewer, roasted and low beans are set and directly put for the pot.

  118. Otis said, on July 11, 2014 at 12:32 am

    When someone writes an article he/she retains the image of a
    user in his/her mind that how a user can be aware of it.

    Therefore that’s why this post is perfect. Thanks!

  119. ราคาเพชร said, on July 11, 2014 at 7:48 am

    A person necessarily help to make significantly articles I might state.

    That is the very first time I frequented your web page and thus
    far? I surprised with the analysis you made to make this actual publish incredible.
    Great job!

  120. Earlene said, on July 22, 2014 at 1:27 pm

    However, while I support consumer disclosure, transparency, and regulation that insures
    the homeowner is informed about financing choices, this is a
    significant issue that goes a little farther than disclosure.
    Because of this sad truth, a self-employed human being simply cannot waste time overlooking the potential hazards of self-employment.

    With their programs in place, business owners can then keep track of their
    finances.

  121. site said, on July 26, 2014 at 6:06 pm

    Howdy! Quick question that’s completely off topic. Do you know how to make your site mobile friendly?
    My weblog looks weird when browsing from my iphone. I’m trying to find a theme or plugin that might
    be able to fix this issue. If you have any recommendations, please share.
    With thanks!

  122. więcej said, on July 28, 2014 at 9:25 am

    Przedstawiony punkt widzenia nieco różni się
    od mojego zdania na ten temat, ale dziękuję za artykuł


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 446 other followers

%d bloggers like this: