Word stemming in Ruby


The concept of stemming in natural language processing or NLP (processing human languages such as English) is quite a simple one. In most languages, words relate to each other in certain ways. For example, ‘fish’, ‘fishes’, ‘fishing’ and ‘fisher’ are different inflected words that are related to each other. In NLP, sometimes to reduce the number of words to process, it is convenient to group such words together and treat them as the same. To do that, we want to reduce these different variants of a word into a root word or a ‘stem’ and this is called ‘stemming’.

There are numerous strategies and algorithms for stemming. A widely-used algorithm for English stemming is the Porter stemming algorithm, written by Martin Porter in 1980. The Porter stemmer follows a strategy of suffix stripping, which basically uses a set of rules to strip away suffixes. For example, a word that ends with ‘-ed’ might be suffix-stripped to remove the ‘-ed’. The Porter stemmer follows a sequence of steps in stripping suffixes.

Stemming is closely related to lemmatisation, which is the process of grouping different inflected forms of a word to determine the lemma for that word. A lemma is the base form of the word and may change when inflected, while a stem does not change. For example, for the inflected word ‘produced’, the lemma is ‘produce’ while the stem is ‘produc’ as there is an inflected form like ‘production’. As a result, stems are not necessarily complete words.

The complete Porter stemming algorithm is found in this page, maintained by the creator of the algorithm. The Ruby implementation, by Ray Pereda is also implemented as a Ruby gem. To use the Porter stemmer in Ruby, you can just install the gem:

gem install stemmer

If you’re interested in the original Porter stemming algorithm paper, you can read it here.

9 thoughts on “Word stemming in Ruby

  1. Hey There. I discovered your blog the usage of msn. This is an extremely well written
    article. I will be sure to bookmark it and come back to learn extra
    of your useful info. Thanks for the post. I will definitely comeback.

  2. May I simply just say what a relief to uncover an
    individual who truly knows what they are discussing on the
    web. You definitely know how to bring a problem to light and
    make it important. More and more people need to check this out and understand
    this side of the story. It’s surprising you aren’t more popular because you definitely
    have the gift.

  3. Hi there very nice blog!! Guy .. Beautiful .. Superb .
    . I will bookmark your blog and take the feeds additionally?
    I’m satisfied to search out numerous useful info here within the put up, we need develop more techniques in this regard, thank you for sharing. . . . . .

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s