Chapter 4. Lookup Word

The Lookup bar as show in Figure 2-1 provides the basic word lookup for the user. If you want to specific more lookup options, you can use the File/Lookup menu command.

The File/Lookup command use dialog box in Figure 2-1 to allow user to specific database (dictionary) and search strategy (algorithm) to use in the lookup.

Figure 2-1. MrclDict Lookup Dialog box

The database and strategy are provided by the DICT™ server which may be differ from server to server.

Databases

Below is list of standard DICT™ databases as indicated in dict.org.
Webster's Revised Unabridged Dictionary (1913)
The Webster's Revised Unabridged Dictionary (G & C. Merriam Co., 1913, edited by Noah Porter), is provided by Patrick Cassidy of MICRA, Inc., Plainfield, NJ, USA. The raw data is available, as well as another web interface.
The WordNet® 1.6 Database
WordNet is a lexical database for English. Software and data are available via ftp.
The Jargon File
The Jargon file is a public domain lexicon of hacker jargon, edited by Eric Raymond.
The Free On-line Dictionary of Computing
FOLDOC is a searchable dictionary of acronyms, jargon, programming languages, tools, architecture, operating systems, networking, theory, conventions, standards, mathematics, telecoms, electronics, institutions, companies, projects, products, history, in fact anything to do with computing.
The Elements Database
A freely-distributed database of elemental information, edited by Jay Kominek.
The U.S. Gazetteer (1990)
The original U.S. Gazetteer data are provided by the U.S. Census Bureau and are available via ftp.
Easton's 1897 Bible Dictionary
Easton's Bible Dictionary is based on M.G. Easton M.A., D.D.'s Illustrated Bible Dictionary, Third Edition, published by Thomas Nelson, 1897. The raw data for this database is available in the public domain.
Hitchcock's Bible Names Dictionary
Hitchcock's Bible Names Dictionary is derived from Hitchcock's New and Complete Analysis of the Holy Bible, published in the late 1800's. The raw data for this database is available in the public domain.
The 2002 CIA World Factbook
David Frey submitted patches to the dict-misc package, but are not currently available on the ftp site.
Ambrose Beirce's Devil's Dictionary
David Frey submitted patches to the dict-misc package, but are not currently available on the ftp site.
V.E.R.A. - A Dictionary of Computer Related Acronyms
The GNU V.E.R.A. (Virtual Entity of Relevant Acronyms) is a free list of acronyms, all of which are used in the field of computing. V.E.R.A. is primarily meant to be used as an online reference, although some efforts have been taken to make its TeX output look acceptable. It contains approximately 8100 acronyms.
Note: Taken from dict.org as indicated above

Strategies

For dictd server version 1.9.7 from dict.org, available search algorithm are:
exact
An exact match. This algorithm uses a binary search and is one of the fastest search algorithms available.
lev
The Levenshtein algorithm (string edit distance of one). This algorithm searches for all words which are within an edit distance of one from the target word. An “edit” means an insertion, deletion, or transposition. This is a rapid algorithm for correcting spelling errors, since many spelling errors are within a Levenshtein distance of one from the original word.
prefix
Prefix match. This algorithm also uses a binary search and is very fast.
re
POSIX 1003.2 (modern) regular expression search. Modern regular expressions are the ones used by egrep(1). These regular expressions allow predefined character classes (e.g., [[:alnum:]], [[:alpha:]], [[:digit:]], and [[:xdigit:]] are useful for this application); uses * to match a sequence 0 or more matches of the previous atom; uses + to match a sequence of 1 or more matches of the previous atom; uses ? to match a sequence of 0 or 1 matches of the previous atom; used ^ to match the beginning of a word, uses $ to match the end of a word, and allows nested subexpression and alternation with () and |. For example, “(foo|bar)” matches all words that contain either “foo“ or “bar”. To match these special characters, they must be quoted with two backslashes (due to the quoting characteristics of the server). Warning: Regular expression matches can take 10 to 300 times longer than substring matches. On a busy server, with many databases, this can required more than 5 minutes of waiting time, depending on the complexity of the regular expression.
regexp
Old (basic) regular expressions. These regular expressions don't support |, +, or ?. Groups use escaped parentheses. While modern regular expressions are generally easier to use, basic regular expressions have a back reference feature. This can be used to match a second occurrence of something that was already matched. For example, the following expression finds all words that begin and end with the same three letters:
     ^\\(...\\).*\\1$
Note the use of the double backslashes to escape the special characters. This is required by the DICT protocol string specification (a single backslash quotes the next character -- we use two to get a single backslash through to the regular expression engine). Warning: Note that the use of backtracking is even slower than the use of general regular expressions.
soundex
The Soundex algorithm, a classic algorithm for finding words that sound similar to each other. The algorithm encodes each word using the first letter of the word and up to three digits. Since the first letter is known, this search is relatively fast, and it sometimes good for correcting spelling errors when the Levenshtein algorithm doesn't help.
substring
Match a substring anywhere in the headword. This search strategy uses a modified Boyer-Moore-Horspool algorithm. Since it must search the whole index file, it is not as fast as the exact and prefix matches.
suffix
Suffix match. This search strategy also uses a modified Boyer-Moore-Horspool algorithm, and is as fast as the substring search. If the optional index_suffix string file is listed in the configuration file this search is much faster.
word
Match any single word, even if part of a multi-word entry. If the optional index_word string file is listed in the configuration file this search is much faster.
Note: Taken from dictd man page

To report a bug or make a suggestion regarding this application or this documentation, e-mail <cws@miraclenet.co.th>.
Last update : $Id: lookup.html,v 1.1 2004/02/24 00:44:31 cws Exp $