Anagrams

Uncategorized — Titus Barik on July 27, 2005 at 5:04 pm

Due to recent events, I’ve been thinking about anagrams. In a nutshell, two words are anagrams if they contain the same letters. Consider for example, a function like the following:

boolean is_anagram(string worda, string wordb);

The most straight-forward approach to determining whether two words are anagrams is polynomial time. We simply compare each letter in the first word with each letter in the second word to see if exists, and mark it as being found. A more elegant approach involves normalizing the word with a hash table, and using its signature, instead of the word itself. For example, the word egg and ego could be represented as a 26-digit number:

egg: 00001020000000000000000000
ego: 00001010000000100000000000

A comparison can then be made in linear time. This technique is, unsurprisingly, the letter-signature method. If we have control over the structure of the dictionary itself, we have more efficient mechanisms. Consider then, the method signature:

array anagrams(string word, array dictionary)

In this case, we can canonicalize the words in the dictionary by rearranging each word so that its letters are alphabetical and then sorting the canonicalized set of words. Determining the anagrams of a set is simply a matter of a binary-search through the canonicalized dictionary.

A sample assignment can be found here, with further explanation. This is also a good interview question.

4 Comments »

  1. You would have to make rules for one letter words, etc. (’a’ and ‘I’). I did a project like this one time, also had one where you find the closest number of letter flips (using 1 letter changes) to get from word a to word b, only flipping 1 letter at a time and the new word has to be a word in some dictionary. Ask bobafred he is the Adelson Velskii Landis mastaa… and don’t get him started on Red Black Tree’s after he has been drinking.

    Comment by Cyanbane — July 27, 2005 @ 6:20 pm
  2. I don’t believe you need any extra rules for single letter words.

    Comment by Titus Barik — July 27, 2005 @ 7:17 pm
  3. Upon further review you are right, the only thing you would have to make a rule for is that a word that is an anagram can’t double on itself (ie "a a a a", "racecar racecar". I stand corrected.

    Comment by Cyanbane — July 27, 2005 @ 10:58 pm
  4. That’s not an issue either. You’re looking at anagrams, not palindromes. An anagram is always an anagram of itself. Iterating over the letter-signature is all you need to do.

    Comment by Titus Barik — July 27, 2005 @ 11:06 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment

titus@barik.net | The Weblog of Titus Barik