The English Words In The Protein Universe

(all for the sake of doing something completely biologically uninteresting)

 

Well, someone had to do it. Here, I've cataloged common english words written in many of the known protein sequences. Out of the 2942 words longer than five letters, Chapstick is the longest word my tools found. How did I do it? I intersected the Swiss-Prot protein database with the complete english ispell library. Below you can find all words that are six letters or longer and links to the sequences they appear in. If you are curious, there are 101,602 sequences in my outdated version of Swiss-prot comprising a total of 37,416,817 characters. What have we found? Not much. But, the award for the best misinterpretation of the genetic code goes to the word ALLELE, which is the 5th most common word we found. (Runner up goes to VALINE, which occurs twice) Questions? Mail Sean Mooney (me) This page has been viewed times.

This idea came from Prof. Randy Lewis via Prof. Theodor Hanekamp from his question on what the longest word in the swiss-prot database was. Click here to read about me, the developer of this madness



 

The List

Q

 

 

The Longest Words
(Number of times word appears)

The Most Common Words longer than 5 letters
(Number of times word appears)

CHAPSTICK (1)
 
KILTER (164)
VILLAINY (1)
 
SEARED (84)
TALIESIN (1)
 
DEEDED (50)
STETTING (1)
 
TAILER (31)
SAVAGISM (1)
 
ALLELE (30) (Hah!)
SALARIAT (1)
 
GIVETH (28)
REVEALED (1)
 
DIDDLE (28)
PILEATED (1)
 
ALASKA (28)
PAPERING (4)
 
KEELER (24)
PALESTRA (1)
 
GLASSY (24)
KINDLIER (1)
 
SITING (20)
KALEVALA (1)
 
SEEDED (19)
GEDANKEN (1) (Hey, wait a ..)
 
REELER (19)
FREAKIER (1)
 
LADLES (19)
FITTINGS (1)
 
PLIANT (18)
CRICKETS (1)
 
LESLIE (18)
AVENTAIL (1)
 
KELLER (18)
ATLANTES (1)
 
ENEMATA (18)
ASSESSES (1)
 
LYSIAS (17)
ASSAILED (1)
 
VEALER (16)
ASCIDIAN (1)
 
THEIRS (16)
AIRLINES (1)
 
LIGETI (16)