Everything Scheme

Sunday, April 29, 2007

Writing a Spelling Corrector in PLT Scheme

[Remember to leave a comment: Was this post silly? Enlightning? Old news?]

Introduction

Peter Norvig recently wrote a great piece on How to Write a Spelling Corrector. Since Norvig used Python, Shiro decided to write a version in Gauche Scheme. In the following I'll present a solution in PLT Scheme. But first let's look at string manipulations.

String Manipulations

Scheme offers the usual operations on strings: string concatenation (string-append), referencing a character in a string (string-ref), extracting a substring (substring) and converting characters to strings (string). Compared to other languages code for string manipulations tend to become verbose, unless one "cheats" and uses some sort of utility such as format. [The advantage of this verboseness is that Scheme compilers can generate efficient code.]

As an example consider this expression from Norvig's spelling corrector:

  [word[0:i]+c+word[i:]

Here word[0:i] is the substring from index 0 (inclusive) to index i (exclusive).
The c is a character, and word[i:] is the substring from index i to the end.
The concatenation operator + converts automatically the character c to a string before the strings are concatenated.

A literal translation to Scheme reads as follows:

 (string-append (substring word 0 i) (string c) (substring word i (string-length word)))

Ouch!

A utility is clearly needed. Below follows my shot at one such utility, namely, concat. It allows us to write the expression as:

(concat word 0 i  c  word i _)

A string followed by two numbers is a substring, so word 0 i is short for (substring word 0 i). An underscore is allowed instead of a number either to indicate the beginning or the end. Thus word i _ is short for (substring word i (string-length word)). Finally a character is automatically converted to a string, so c is short for (string c). Not used in the example, but useful in general: A string followed by a single number, like word i, is short for (string-ref word i).

The code for concat is found at the bottom of this post.

The Spelling Corrector

There are two parts of the spelling corrector. The first part reads in correctly spelled words from a corpus, and counts the number of occurences of each word. The second part uses the training data to check whether a given word is known, or if it is incorrectly spelled to find the intended word.

Reading words from a file

We can split a string in words with the help of regexp-split.

 > (regexp-split #rx"[^a-zA-Z]" "foo bar baz")
 ("foo" "bar" "baz")

Taking advantage of the fact that PLT Scheme's regular expression functions work both on strings as well as directly on input ports, we can call regexp-split directly on the port.

; words : port -> list-of-strings
(define (words text)
 (list-ec (: word (regexp-split #rx"[^a-zA-Z]" text))
              (string-downcase (bytes->string/latin-1 word))))

Counting words

Now we'll count how many times each word is seen. We'll use a hash table to hold words and counts.

; train : list-of-strings -> hash-table-from-strings-to-numbers
(define (train features)
 (let ([model (make-hash-table 'equal)])
   (do-ec (: word features)
          (hash-table-put! model word (add1 (hash-table-get model word 1))))
   model))

Let's read in the data from the small corpus, and at the same time define word-count that returns the count of a word, and known? that checks whether a word was seen in the training set.

(define NWORDS (train (words (open-input-file "small.txt"))))

(define (word-count word)
 (hash-table-get NWORDS word 0))

(define (known? word)
 (hash-table-get NWORDS word #f))

The set of strings with edit distance 1

When a word is incorrectly spelled, we want to suggest a "similar" word known to be correctly spelled.

Given a word w, the set of words with edit distance 1 consists of the words, we can generate from w by either deleting one character, or transposing to character, or altering one character, or inserting one character.

Deleting the character with index i from a word is easy with the help of concat:

   (concat word 0 i     word (+ i 1) _)

If n is the length of w, then the set of strings generated by deleting a single character is given by:

 (set-ec string-compare (: i n) (concat word 0 i     word (+ i 1) _))

Here (: i n) makes i run through the numbers 0, 1, ... n-1. For each i the string (concat word 0 i word (+ i 1) _) is calculated, and all strings are collected in a set.

All words with edit distance 1 is given by:

(define (edits1 word)
 (define alphabet (string-ec (: c #\a #\z) c))
 (define n (string-length word))
 (union*
  (set-ec string-compare (: i n)                (concat word 0 i     word (+ i 1) _))
  (set-ec string-compare (: i (- n 1))          (concat word 0 i     word (+ i 1)     word i   word (+ i 2) _))
  (set-ec string-compare (: i n) (: c alphabet) (concat word 0 i  c  word (+ i 1) _))
  (set-ec string-compare (: i n) (: c alphabet) (concat word 0 i  c  word i _))))

(define (union* . sets)
 (foldl union (empty string-compare) sets))

The set of strings with edit distance 2

Given a word w, the set of words with edit distance 2 is the set generated by 2 deletions, transponations, alterations, or insertions. Luckily we have done the hard work in edits1.

(define (edits2 word)
 (set-ec string-compare
         (: e1 (elements (edits1 word)))
         (: e2 (elements (edits1 e1)))
         e2))

There is just one catch. The set is awfully large, so instead, we'll concentrate on correctly spelled (known) words with a edit distance of two:

; known : set-of-strings -> set-of-strings
;   remove the unknown words in words
(define (known words)
 (set-ec string-compare
         (:set w words) (if (known? w)) w))

(define (known-edits2 word)
 (set-ec string-compare
         (:set e1 (edits1 word))
         (:set e2 (edits1 e1))
         (if (known? e2))
         e2))

Maximum

If we have more than one suggestion for an incorrectly spelled word, we want to find the most common of them - that is the one with the largest word count in the training set. Python's max has a nice feature, where given a list of xs, it finds the x that maximizes f(x), for a function f. Here is a Scheme version:

(define (maximum xs key)
 ; return the x with the largest key
 (define max-key -inf.0)
 (define max-x #f)
 (do-ec (: x xs)
        (:let k (key x))
        (if (> k max-key))
        (begin
          (set! max-key k)
          (set! max-x x)))
 max-x)

Correcting a word

To correct a word, we first check whether it is known. We try to find a known word with edit distance 1. If unsuccessful we try to find one with edit distance 2.

(define (correct word)
 (define (falsify s) (if (empty? s) #f s))
 (let ([candidates
        (or (falsify (known (set word)))
            (falsify (known (edits1 word)))
            (falsify (known-edits2 word))
            (falsify (list word)))])
   (maximum (elements candidates) word-count)))

Let's try it:

 > (correct "hose")
 "house"

Last Remarks

Were it not for the concat macro, the function edits1 would have been clumsy. I am not sure concat is the answer. Thanks to macros, one can at least experiment with language extensions in Scheme.

Remember to read Norvig's piece, where he explains the math behind.
[Hmm. Should I turn it into a math exercise for my class?]

Remember to leave a comment: Was this post silly? Enlightning? Old news?]

A Spelling Corrector in PLT Scheme

(require (lib "42.ss" "srfi")
         (lib "67.ss" "srfi")
         (planet "set.scm" ("soegaard" "galore.plt" 2 (= 2))))

; words : port -> list-of-strings
(define (words text)
  (list-ec (: word (regexp-split #rx"[^a-zA-Z]" text))
           (string-downcase (bytes->string/latin-1 word))))

; train : list-of-strings -> hash-table-from-strings-to-numbers
(define (train features)
  (let ([model (make-hash-table 'equal)])
    (do-ec (: word features)
           (hash-table-put! model word (add1 (hash-table-get model word 1))))
    model))

(define file "small.txt")
;(define file "big.txt")
(define NWORDS (train (words (open-input-file file))))

(define (word-count word)
  (hash-table-get NWORDS word 0))

(define (known? word)
  (hash-table-get NWORDS word #f))

(define (known words)
  (set-ec string-compare
          (:set w words) (if (known? w)) w))

(define (concat-it spec)
  (define (underscore? o) (eq? o '_))
  ; spec :: = '()  | ( string num num  . spec ) | (string num _) | |
  (define (subs spec)
    (match spec
      [() '()]
      [((? string? s) (? number? n1) (? number? n2) . spec)
       (cons (substring s n1 n2) (subs spec))]
      [((? string? s) (? number? n1) (? underscore? _) . spec)
       (cons (substring s n1 (string-length s)) (subs spec))]
      [((? string? s) (? underscore? _) (? number? n2) . spec)
       (cons (substring s 0 n2) (subs spec))]
      [((? string? s) (? number? n) . spec)
       (cons (string (string-ref s n)) (subs spec))]
      [((? symbol? s) . spec)
       (cons (symbol->string s) (subs spec))]
      [((? char? c) . spec)
       (cons (string c) (subs spec))]
      [((? string? s) . spec)
       (cons s (subs spec))]
      [else (error)]))
  (apply string-append (subs spec)))

(define-syntax (concat stx)
  (syntax-case stx (_)
    [(string-it spec ...)
     #`(concat-it
        (list #,@(map (lambda (so)
                       (syntax-case so (_)
                         [_ #''_]
                         [else so]))
                     (syntax->list #'(spec ...)))))]))

(define (union* . sets)
  (foldl union (empty string-compare) sets))

(define (edits1 word)
  (define alphabet (string-ec (: c #\a #\z) c))
  (define n (string-length word))
  (union* 
   (set-ec string-compare (: i n)                (concat word 0 i     word (+ i 1) _))
   (set-ec string-compare (: i (- n 1))          (concat word 0 i     word (+ i 1)     word i   word (+ i 2) _))
   (set-ec string-compare (: i n) (: c alphabet) (concat word 0 i  c  word (+ i 1) _))
   (set-ec string-compare (: i n) (: c alphabet) (concat word 0 i  c  word i _))))

(define (edits2 word)
  (set-ec string-compare 
          (: e1 (elements (edits1 word)))
          (: e2 (elements (edits1 e1)))
          e2))

(define (known-edits2 word)
  (set-ec string-compare
          (:set e1 (edits1 word))
          (:set e2 (edits1 e1))
          (if (known? e2))
          e2))

(define (maximum xs key)
  ; return the x with the largest key
  (define max-key -inf.0)
  (define max-x #f)
  (do-ec (: x xs)
         (:let k (key x))
         (if (> k max-key))
         (begin
           (set! max-key k)
           (set! max-x x)))
  max-x)

(define (correct word)
  (define (falsify s) (if (empty? s) #f s))
  (let ([candidates 
         (or (falsify (known (set word)))
             (falsify (known (edits1 word)))
             (falsify (known-edits2 word))
             (falsify (list->set (list word))))])
    (maximum (elements candidates) word-count)))

10 Comments:

Shiro Kawai said...: Nice article! I love how you analyze the problem and nail down where to attack (namely, concat); I've just gone blindly instead.

And set-ec is nice addition to srfi-42. I'll consider to add it to Gauche.; 02:53
Anonymous said...: Wonderfully done. I had read Norvig's piece on the probabilistic spelling checker in Python, and was going to implement it myself, but I couldn't think of a source for words; the corpus as you put it. Besides, I've got too many open projects as it is. And I certainly wouldn't have done as well at explaining it as you've done. Another great article. Your on a roll lately. Keep up the good stuff.

--kyle; 23:04
Jens Axel Søgaard said...: Hi Shiro,

Feel free to steel any code you like from Galore.

The action is in
raw-red-black-tree-set.scm
and the finishing touches is in:
red-black-tree-set.scm"

Since Gauche has a port of Wright's pattern matcher it ought to straight forward to port.

/Jens Axel; 18:43
Jens Axel Søgaard said...: Hi Kyle,

It was fun to see, whether I could make it almost as short as the Python version. The sticky point was the string manipulation.

Btw - Norvig links to two corpora.

/Jens Axel; 18:45
Phil said...: Thanks for posting this. I have three comments.

1) I normally use the Textual language. But regex-split requires the Pretty Big language. Is there a way in DrScheme to force the language version?

2) Addresable is one of the words in Norvig's test corpus. Calling (correct "addresable") causes an error: top-level broke the contract (-> set boolean) on empty?; expected < set >, given: ("addresable"). (Note: I had to add spaces around the word "set" in the error message to get around the html verifier in the comment box, which thought it was seeing an html tag named "set".) What's wrong?

3) Norvig achieves a speed of 0.1 seconds per correction with Python, but the Scheme equivalent is much slower: on my machine, (time (correct "korrecter")) takes 8578 milliseconds, a factor of about a hundred times slower. I assume the difference is that sets are built in to Python. How could we get that kind of improvement in Scheme?; 15:05
Jens Axel Søgaard said...: Hi Phil,

ad 1)

Use (require (lib "string.ss")) to load regexp-split. Here is how to figure this out: Search for regexp-split in the HelpDesk. The page for "string.ss" starts with load instructions.

ad 2)

That's a bug. The contract says that empty? expected a set, but got a list. To fix this, in the definition of "correct" change (list word) to (list->set (list word)).

ad 3)

You need to run the Python program on your own computer in order to compare the running times. The set implementation is written in Scheme, and as is very general - it supports sets of anything.

In this case we only need sets of strings, so a specialized version would be faster. It would be interesting to implement sets-of-strings on top of the builtin hash-tables. My hunch is that you'd get comparable speed.

Do write if you try it!

Thanks for the comments,
Jens Axel; 15:27
Phil said...: 1) With language set to Textual, I added (lib "string.ss") to the (require ...) statement at the beginning of the program, then clicked Run. Now I get an error message: application: bad syntax (illegal use of `.') in: ((? string? s) (? number? n1) (? number? n2) . spec). (The error is in the first non-null pattern in the match of concat-it.) That looks okay to me. Changing to language Pretty Big makes the problem go away.

2) Fixed.

3) I know I can't directly compare timings across two machines. But I don't have Python installed on my machine, and don't want to bother just for this purpose. Your answer is interesting: to fix the Scheme program, rewrite it to use the built-in hash tables instead of your sets. How are your sets implemented? Are the built-in hash tables implemented in C or Scheme? I wrote a version of the spelling corrector that uses treaps, and got timings similar to those using your sets.

4) I don't like your concat macro, because parsing the various specifications can be ambiguous, as different specifications take different arities, making life difficult for both the macro compiler and the human reader. Here's my version (sorry, it probably won't wrap very well in your comment box), with each specification, except a bare character or string, surrounded by parentheses; it also uses the symbol "last" instead of an underscore to denote the end of the string, which I think makes things look more Scheme-ly and less Perl-ly; and I changed the name to cat, because that's how unix spells it, and it's shorter, and I'm lazy:

> (define-syntax cat
(syntax-rules (last)
((_ 'aux base)
(apply string-append (reverse base)))
((_ 'aux base (str i last) clauses ...)
(cat 'aux (cons (substring str i (string-length str)) base) clauses ...))
((_ 'aux base (str i j) clauses ...)
(cat 'aux (cons (substring str i j) base) clauses ...))
((_ 'aux base (str i) clauses ...)
(cat 'aux (cons (substring str i (+ i 1)) base) clauses ...))
((_ 'aux base c-or-s clauses ...)
(cat 'aux (cons (if (char? c-or-s) (string c-or-s) c-or-s) base) clauses ...))
((_ clauses ...) (cat 'aux '() clauses ...))))
> (cat ("hello" 2 4))
"ll"
> (cat ("hello" 0 1) ("hello" 2 last))
"hllo"
> (cat ("hello" 0 1) #\e ("hello" 2 last))
"hello"
> (cat ("hello" 0 1) "e" ("hello" 2 last))
"hello"
> (cat ("hello" 0 1) ("hello" 1) ("hello" 2 last))
"hello"
> (cat ("hello" 0 0))
""; 16:28
Jens Axel Søgaard said...: Hi Phil,

ad 1)

To run it in Textual you'll also need to import the pattern matching library: (require (lib "match.ss")).

ad 3)

Here is how I see it: It is fair to compare the speed of sets implemented in Scheme with sets implemented in Python. Since the sets in Python are implemented in C, then it is fair to use hashtables (which is implemented in C).

There are two different implementations of sets in Galore. The normal implementation is via functional Red-Black-trees (see Okadaki's book). The source is here:

http://planet.plt-scheme.org/package-source/soegaard/galore.plt/2/2/private/raw-red-black-tree-set.scm

ad 4)

It's an example of the eternal battle of convenience and speed. Your version allows a more effecient expansion, where as mine reduces the number of parentheses. And I agree, your version is more "Schemey".; 17:07
Jens Axel Søgaard said...: Hi Phil,

A version where sets are implemented on top of hash-tables can be found at

http://www.scheme.dk/tmp/spelling-ht.scm

But... The implementation of :set is very naive. I'll improve it later.

/Jens Axel; 18:40
Phil said...: Here's my version of the spelling corrector using PLT hash tables. On my machine, evaluating (time (spelltest tests1)) takes 68625 milliseconds of cpu time, and (time (spelltest tests2)) takes 117375 milliseconds of cpu time, so the average is 278 milliseconds per word, about triple Norvig's goal.

(current-directory "c:\\documents and settings\\pbewig\\desktop")

(define (read-word . port)
(let ((p (if (null? port) (current-input-port) (car port))))
(let loop ((in-word? #f) (c (read-char p)) (word '()))
(cond ((eof-object? c) (if in-word? (list->string (reverse word)) c))
((char-alphabetic? c) (loop #t (read-char p) (cons (char-downcase c) word)))
(in-word? (list->string (reverse word)))
(else (loop #f (read-char p) word))))))

(define (for-each-port reader proc . port)
(let ((p (if (null? port) (current-input-port) (car port))))
(let loop ((item (reader p)))
(if (not (eof-object? item))
(begin (proc item) (loop (reader p)))))))

(define nwords (make-hash-table 'equal))

(with-input-from-file "big.txt"
(lambda ()
(for-each-port read-word
(lambda (word)
(let ((n (hash-table-get nwords word 0)))
(hash-table-put! nwords word (+ n 1)))))))

(define-syntax for-range
(syntax-rules ()
((_ x j e1 e2 ...)
(do ((x 0 (+ x 1))) ((<= j x)) e1 e2 ...))))

(define-syntax for-alpha
(syntax-rules ()
((_ c e1 e2 ...)
(do ((i 97 (+ i 1))) ((< 122 i)) (let ((c (integer->char i))) e1 e2 ...)))))

(define-syntax cat
(syntax-rules (last)
((_ 'aux base)
(apply string-append (reverse base)))
((_ 'aux base (str i last) clauses ...)
(cat 'aux (cons (substring str i (string-length str)) base) clauses ...))
((_ 'aux base (str i j) clauses ...)
(cat 'aux (cons (substring str i j) base) clauses ...))
((_ 'aux base (str i) clauses ...)
(cat 'aux (cons (substring str i (+ i 1)) base) clauses ...))
((_ 'aux base c-or-s clauses ...)
(cat 'aux (cons (if (char? c-or-s) (string c-or-s) c-or-s) base) clauses ...))
((_ clauses ...) (cat 'aux '() clauses ...))))

(define (edits1 word)
(let ((n (string-length word)) (edits (make-hash-table 'equal)))
(for-range i n (hash-table-put! edits ; deletion
(cat (word 0 i) (word (+ i 1) last)) #t))
(for-range i (- n 1) (hash-table-put! edits ; transposition
(cat (word 0 i) (word (+ i 1)) (word i) (word (+ i 2) last)) #t))
(for-range i n (for-alpha c (hash-table-put! edits ; alteration
(cat (word 0 i) c (word (+ i 1) last)) #t)))
(for-range i (+ n 1) (for-alpha c (hash-table-put! edits ; insertion
(cat (word 0 i) c (word i last)) #t)))
edits))

(define (known words)
(let ((s (make-hash-table 'equal)))
(hash-table-for-each words
(lambda (k v)
(if (hash-table-get nwords k #f)
(hash-table-put! s k #t))))
s))

(define-syntax for-set
(syntax-rules ()
((_ x s e1 e2 ...) (hash-table-for-each s (lambda (k v) (let ((x k)) e1 e2 ...))))))

(define (known-edits2 word)
(let ((s (make-hash-table 'equal)))
(for-set e1 (edits1 word)
(for-set e2 (edits1 e1)
(if (hash-table-get nwords e2 #f)
(hash-table-put! s e2 #t))))
s))

(define (max-count words)
(let ((count 0) (word ""))
(hash-table-for-each words
(lambda (k v)
(let ((n (hash-table-get nwords k 0)))
(if (< count n)
(begin (set! count n) (set! word k))))))
word))

(define (correct word)
(if (< 0 (hash-table-get nwords word 0))
word
(let ((words (known (edits1 word))))
(if (< 0 (hash-table-count words))
(max-count words)
(let ((words (known-edits2 word)))
(if (< 0 (hash-table-count words))
(max-count words)
word))))))

(define (spelltest tests)
(let loop1 ((count 0) (okay 0) (tests tests))
(if (null? tests)
(begin (display "tested: ")
(display count)
(display "; fixed: ")
(display okay)
(newline))
(let* ((test (car tests))
(good-word (car test))
(bad-words (cdr test)))
(let loop2 ((words bad-words))
(cond ((null? words) (loop1 count okay (cdr tests)))
((string=? good-word (correct (car words)))
(set! count (+ count 1)) (set! okay (+ okay 1))
(loop2 (cdr words)))
(else (set! count (+ count 1)) (loop2 (cdr words)))))))))

(define tests1
'(("access" "acess") ("accessing" "accesing") ("accommodation"
"accomodation" "acommodation" "acomodation") ("account" "acount")
("address" "adress" "adres") ("addressable" "addresable") ("arranged"
"aranged" "arrainged") ("arrangeing" "aranging") ("arrangement"
"arragment") ("articles" "articals") ("aunt" "annt" "anut" "arnt")
("auxiliary" "auxillary") ("available" "avaible") ("awful" "awfall"
"afful") ("basically" "basicaly") ("beginning" "begining") ("benefit"
"benifit") ("benefits" "benifits") ("between" "beetween") ("bicycle"
"bicycal" "bycicle" "bycycle") ("biscuits" "biscits" "biscutes" "biscuts"
"bisquits" "buiscits" "buiscuts") ("built" "biult") ("cake" "cak")
("career" "carrer") ("cemetery" "cemetary" "semetary") ("centrally"
"centraly") ("certain" "cirtain") ("challenges" "chalenges" "chalenges")
("chapter" "chaper" "chaphter" "chaptur") ("choice" "choise") ("choosing"
"chosing") ("clerical" "clearical") ("committee" "comittee") ("compare"
"compair") ("completely" "completly") ("consider" "concider")
("considerable" "conciderable") ("contented" "contenpted" "contende"
"contended" "contentid") ("curtains" "cartains" "certans" "courtens"
"cuaritains" "curtans" "curtians" "curtions") ("decide" "descide")
("decided" "descided") ("definitely" "definately" "difinately")
("definition" "defenition") ("definitions" "defenitions") ("description"
"discription") ("desiccate" "desicate" "dessicate" "dessiccate")
("diagrammatically" "diagrammaticaally") ("different" "diffrent")
("driven" "dirven") ("ecstasy" "exstacy" "ecstacy") ("embarrass" "embaras"
"embarass") ("establishing" "astablishing" "establising") ("experience"
"experance" "experiance") ("experiences" "experances") ("extended"
"extented") ("extremely" "extreamly") ("fails" "failes") ("families"
"familes") ("february" "febuary") ("further" "futher") ("gallery" "galery"
"gallary" "gallerry" "gallrey") ("hierarchal" "hierachial") ("hierarchy"
"hierchy") ("inconvenient" "inconvienient" "inconvient" "inconvinient")
("independent" "independant" "independant") ("initial" "intial")
("initials" "inetials" "inistals" "initails" "initals" "intials") ("juice"
"guic" "juce" "jucie" "juise" "juse") ("latest" "lates" "latets" "latiest"
"latist") ("laugh" "lagh" "lauf" "laught" "lugh") ("level" "leval")
("levels" "levals") ("liaison" "liaision" "liason") ("lieu" "liew")
("literature" "litriture") ("loans" "lones") ("locally" "localy")
("magnificent" "magnificnet" "magificent" "magnifcent" "magnifecent"
"magnifiscant" "magnifisent" "magnificant") ("management" "managment")
("meant" "ment") ("minuscule" "miniscule") ("minutes" "muinets")
("monitoring" "monitering") ("necessary" "neccesary" "necesary" "neccesary"
"necassary" "necassery" "neccasary") ("occurrence" "occurence" "occurence")
("often" "ofen" "offen" "offten" "ofton") ("opposite" "opisite" "oppasite"
"oppesite" "oppisit" "oppisite" "opposit" "oppossite" "oppossitte")
("parallel" "paralel" "paralell" "parrallel" "parralell" "parrallell")
("particular" "particulaur") ("perhaps" "perhapse") ("personnel"
"personnell") ("planned" "planed") ("poem" "poame") ("poems" "poims"
"pomes") ("poetry" "poartry" "poertry" "poetre" "poety" "powetry")
("position" "possition") ("possible" "possable") ("pretend" "pertend"
"protend" "prtend" "pritend") ("problem" "problam" "proble" "promblem"
"proplen") ("pronunciation" "pronounciation") ("purple" "perple" "perpul"
"poarple") ("questionnaire" "questionaire") ("really" "realy" "relley"
"relly") ("receipt" "receit" "receite" "reciet" "recipt") ("receive"
"recieve") ("refreshment" "reafreshment" "refreshmant" "refresment"
"refressmunt") ("remember" "rember" "remeber" "rememmer" "rermember")
("remind" "remine" "remined") ("scarcely" "scarcly" "scarecly" "scarely"
"scarsely") ("scissors" "scisors" "sissors") ("separate" "seperate")
("singular" "singulaur") ("someone" "somone") ("sources" "sorces")
("southern" "southen") ("special" "speaical" "specail" "specal" "speical")
("splendid" "spledid" "splended" "splened" "splended") ("standardizing"
"stanerdizing") ("stomach" "stomac" "stomache" "stomec" "stumache")
("supersede" "supercede" "superceed") ("there" "ther") ("totally" "totaly")
("transferred" "transfred") ("transportability" "transportibility")
("triangular" "triangulaur") ("understand" "undersand" "undistand")
("unexpected" "unexpcted" "unexpeted" "unexspected") ("unfortunately"
"unfortunatly") ("unique" "uneque") ("useful" "usefull") ("valuable"
"valubale" "valuble") ("variable" "varable") ("variant" "vairiant")
("various" "vairious") ("visited" "fisited" "viseted" "vistid" "vistied")
("visitors" "vistors") ("voluntary" "volantry") ("voting" "voteing")
("wanted" "wantid" "wonted") ("whether" "wether") ("wrote" "rote" "wote")))

(define tests2
'(("forbidden" "forbiden") ("decisions" "deciscions" "descisions")
("supposedly" "supposidly") ("embellishing" "embelishing") ("technique")
("tecnique") ("permanently" "perminantly") ("confirmation" "confermation")
("appointment" "appoitment") ("progression" "progresion") ("accompanying"
"acompaning") ("applicable" "aplicable") ("regained" "regined")
("guidelines" "guidlines") ("surrounding" "serounding") ("titles"
"tittles") ("unavailable" "unavailble") ("advantageous" "advantageos")
("brief" "brif") ("appeal" "apeal") ("consisting" "consisiting") ("clerk"
"cleark" "clerck") ("component" "componant") ("favourable" "faverable")
("separation" "seperation") ("search" "serch") ("receive" "recieve")
("employees" "emploies") ("prior" "piror") ("resulting" "reulting")
("suggestion" "sugestion") ("opinion" "oppinion") ("cancellation"
"cancelation") ("criticism" "citisum") ("useful" "usful") ("humour"
"humor") ("anomalies" "anomolies") ("would" "whould") ("doubt" "doupt")
("examination" "eximination") ("therefore" "therefoe") ("recommend"
"recomend") ("separated" "seperated") ("successful" "sucssuful"
"succesful") ("apparent" "apparant") ("occurred" "occureed") ("particular"
"paerticulaur") ("pivoting" "pivting") ("announcing" "anouncing")
("challenge" "chalange") ("arrangements" "araingements") ("proportions"
"proprtions") ("organized" "oranised") ("accept" "acept") ("dependence"
"dependance") ("unequalled" "unequaled") ("numbers" "numbuers") ("sense"
"sence") ("conversely" "conversly") ("provide" "provid") ("arrangement"
"arrangment") ("responsibilities" "responsiblities") ("fourth" "forth")
("ordinary" "ordenary") ("description" "desription" "descvription"
"desacription") ("inconceivable" "inconcievable") ("data" "dsata")
("register" "rgister") ("supervision" "supervison") ("encompassing"
"encompasing") ("negligible" "negligable") ("allow" "alow") ("operations"
"operatins") ("executed" "executted") ("interpretation" "interpritation")
("hierarchy" "heiarky") ("indeed" "indead") ("years" "yesars") ("through"
"throut") ("committee" "committe") ("inquiries" "equiries") ("before"
"befor") ("continued" "contuned") ("permanent" "perminant") ("choose"
"chose") ("virtually" "vertually") ("correspondence" "correspondance")
("eventually" "eventully") ("lonely" "lonley") ("profession" "preffeson")
("they" "thay") ("now" "noe") ("desperately" "despratly") ("university"
"unversity") ("adjournment" "adjurnment") ("possibilities" "possablities")
("stopped" "stoped") ("mean" "meen") ("weighted" "wagted") ("adequately"
"adequattly") ("shown" "hown") ("matrix" "matriiix") ("profit" "proffit")
("encourage" "encorage")("collate" "colate") ("disaggregate" "disaggreagte"
"disaggreaget") ("receiving" "recieving" "reciving") ("proviso" "provisoe")
("umbrella" "umberalla") ("approached" "aproached") ("pleasant" "plesent")
("difficulty" "dificulty") ("appointments" "apointments") ("base" "basse")
("conditioning" "conditining") ("earliest" "earlyest") ("beginning"
"begining") ("universally" "universaly") ("unresolved" "unresloved")
("length" "lengh") ("exponentially" "exponentualy") ("utilized" "utalised")
("set" "et") ("surveys" "servays") ("families" "familys") ("system"
"sysem") ("approximately" "aproximatly") ("their" "ther") ("scheme"
"scheem") ("speaking" "speeking") ("repetitive" "repetative")
("inefficient" "ineffiect") ("geneva" "geniva") ("exactly" "exsactly")
("immediate" "imediate") ("appreciation" "apreciation") ("luckily"
"luckeley") ("eliminated" "elimiated") ("believe" "belive") ("appreciated"
"apreciated") ("readjusted" "reajusted") ("were" "wer" "where") ("feeling"
"fealing") ("and" "anf") ("false" "faulse") ("seen" "seeen")
("interrogating" "interogationg") ("academically" "academicly")
("relatively" "relativly" "relitivly") ("traditionally" "traditionaly")
("studying" "studing") ("majority" "majorty") ("build" "biuld")
("aggravating" "agravating") ("transactions" "trasactions") ("arguing"
"aurguing") ("sheets" "sheertes") ("successive" "sucsesive" "sucessive")
("segment" "segemnt") ("especially" "especaily") ("later" "latter")
("senior" "sienior") ("dragged" "draged") ("atmosphere" "atmospher")
("drastically" "drasticaly") ("particularly" "particulary") ("visitor"
"vistor") ("session" "sesion") ("continually" "contually") ("availability"
"avaiblity") ("busy" "buisy") ("parameters" "perametres") ("surroundings"
"suroundings" "seroundings") ("employed" "emploied") ("adequate"
"adiquate") ("handle" "handel") ("means" "meens") ("familiar" "familer")
("between" "beeteen") ("overall" "overal") ("timing" "timeing")
("committees" "comittees" "commitees") ("queries" "quies") ("econometric"
"economtric") ("erroneous" "errounous") ("decides" "descides") ("reference"
"refereence" "refference") ("intelligence" "inteligence") ("edition"
"ediion" "ediition") ("are" "arte") ("apologies" "appologies")
("thermawear" "thermawere" "thermawhere") ("techniques" "tecniques")
("voluntary" "volantary") ("subsequent" "subsequant" "subsiquent")
("currently" "curruntly") ("forecast" "forcast") ("weapons" "wepons")
("routine" "rouint") ("neither" "niether") ("approach" "aproach")
("available" "availble") ("recently" "reciently") ("ability" "ablity")
("nature" "natior") ("commercial" "comersial") ("agencies" "agences")
("however" "howeverr") ("suggested" "sugested") ("career" "carear") ("many"
"mony") ("annual" "anual") ("according" "acording") ("receives" "recives"
"recieves") ("interesting" "intresting") ("expense" "expence") ("relevant"
"relavent" "relevaant") ("table" "tasble") ("throughout" "throuout")
("conference" "conferance") ("sensible" "sensable") ("described"
"discribed" "describd") ("union" "unioun") ("interest" "intrest")
("flexible" "flexable") ("refered" "reffered") ("controlled" "controled")
("sufficient" "suficient") ("dissension" "desention") ("adaptable"
"adabtable") ("representative" "representitive") ("irrelevant"
"irrelavent") ("unnecessarily" "unessasarily") ("applied" "upplied")
("apologised" "appologised") ("these" "thees" "thess") ("choices"
"choises") ("will" "wil") ("procedure" "proceduer") ("shortened"
"shortend") ("manually" "manualy") ("disappointing" "dissapoiting")
("excessively" "exessively") ("comments" "coments") ("containing"
"containg") ("develop" "develope") ("credit" "creadit") ("government"
"goverment") ("acquaintances" "aquantences") ("orientated" "orentated")
("widely" "widly") ("advise" "advice") ("difficult" "dificult")
("investigated" "investegated") ("bonus" "bonas") ("conceived" "concieved")
("nationally" "nationaly") ("compared" "comppared" "compased") ("moving"
"moveing") ("necessity" "nessesity") ("opportunity" "oppertunity"
"oppotunity" "opperttunity") ("thoughts" "thorts") ("equalled" "equaled")
("variety" "variatry") ("analysis" "analiss" "analsis" "analisis")
("patterns" "pattarns") ("qualities" "quaties") ("easily" "easyly")
("organization" "oranisation" "oragnisation") ("the" "thw" "hte" "thi")
("corporate" "corparate") ("composed" "compossed") ("enormously"
"enomosly") ("financially" "financialy") ("functionally" "functionaly")
("discipline" "disiplin") ("announcement" "anouncement") ("progresses"
"progressess") ("except" "excxept") ("recommending" "recomending")
("mathematically" "mathematicaly") ("source" "sorce") ("combine"
"comibine") ("input" "inut") ("careers" "currers" "carrers") ("resolved"
"resoved") ("demands" "diemands") ("unequivocally" "unequivocaly")
("suffering" "suufering") ("immediately" "imidatly" "imediatly")
("accepted" "acepted") ("projects" "projeccts") ("necessary" "necasery"
"nessasary" "nessisary" "neccassary") ("journalism" "journaism")
("unnecessary" "unessessay") ("night" "nite") ("output" "oputput")
("security" "seurity") ("essential" "esential") ("beneficial" "benificial"
"benficial") ("explaining" "explaning") ("supplementary" "suplementary")
("questionnaire" "questionare") ("employment" "empolyment") ("proceeding"
"proceding") ("decision" "descisions" "descision") ("per" "pere")
("discretion" "discresion") ("reaching" "reching") ("analysed" "analised")
("expansion" "expanion") ("although" "athough") ("subtract" "subtrcat")
("analysing" "aalysing") ("comparison" "comparrison") ("months" "monthes")
("hierarchal" "hierachial") ("misleading" "missleading") ("commit" "comit")
("auguments" "aurgument") ("within" "withing") ("obtaining" "optaning")
("accounts" "acounts") ("primarily" "pimarily") ("operator" "opertor")
("accumulated" "acumulated") ("extremely" "extreemly") ("there" "thear")
("summarys" "sumarys") ("analyse" "analiss") ("understandable"
"understadable") ("safeguard" "safegaurd") ("consist" "consisit")
("declarations" "declaratrions") ("minutes" "muinutes" "muiuets")
("associated" "assosiated") ("accessibility" "accessability") ("examine"
"examin") ("surveying" "servaying") ("politics" "polatics") ("annoying"
"anoying") ("again" "agiin") ("assessing" "accesing") ("ideally" "idealy")
("scrutinized" "scrutiniesed") ("simular" "similar") ("personnel"
"personel") ("whereas" "wheras") ("when" "whn") ("geographically"
"goegraphicaly") ("gaining" "ganing") ("requested" "rquested") ("separate"
"seporate") ("students" "studens") ("prepared" "prepaired") ("generated"
"generataed") ("graphically" "graphicaly") ("suited" "suted") ("variable"
"varible" "vaiable") ("building" "biulding") ("required" "reequired")
("necessitates" "nessisitates") ("together" "togehter") ("profits"
"proffits"))); 21:55

Sunday, April 29, 2007

Writing a Spelling Corrector in PLT Scheme

10 Comments:

About Me

Previous Posts

Archives