Chapter 4. Lookup Word
The Lookup bar as show in Figure 2-1
provides the basic word lookup for the user. If you want to specific more
lookup options, you can use the File/Lookup menu command.
The File/Lookup command use dialog box in Figure 2-1 to allow user to specific
database (dictionary) and search strategy (algorithm) to use in the lookup.
The database and strategy are provided by the DICT™ server which may be
differ from server to server.
Databases
Below is list of standard DICT™ databases as indicated in
dict.org.
- Webster's Revised Unabridged Dictionary (1913)
-
The Webster's Revised Unabridged Dictionary (G & C. Merriam Co.,
1913, edited by Noah Porter), is provided by Patrick
Cassidy of MICRA,
Inc., Plainfield, NJ, USA. The
raw data is available, as well as another
web interface.
- The WordNet® 1.6 Database
-
WordNet is a lexical
database for English. Software and data are available via
ftp.
- The Jargon File
-
The Jargon file is a public domain
lexicon of hacker jargon, edited by Eric Raymond.
- The Free On-line Dictionary of Computing
-
FOLDOC is a searchable dictionary of
acronyms, jargon, programming languages, tools, architecture, operating
systems, networking, theory, conventions, standards, mathematics, telecoms,
electronics, institutions, companies, projects, products, history, in fact
anything to do with computing.
- The Elements Database
-
A freely-distributed database of elemental information, edited by Jay Kominek.
- The U.S. Gazetteer (1990)
-
The original U.S. Gazetteer
data are provided by the U.S. Census Bureau
and are available via ftp.
- Easton's 1897 Bible Dictionary
-
Easton's Bible Dictionary
is based on M.G. Easton M.A., D.D.'s Illustrated Bible Dictionary, Third
Edition, published by Thomas Nelson, 1897. The
raw data for this database is available in the public domain.
- Hitchcock's Bible Names Dictionary
-
Hitchcock's Bible Names Dictionary
is derived from Hitchcock's New and Complete Analysis of the Holy
Bible, published in the late 1800's. The
raw data for this database is available in the public domain.
- The 2002 CIA World Factbook
-
David Frey submitted
patches to the dict-misc package, but are not currently available on
the ftp site.
- Ambrose Beirce's Devil's Dictionary
-
David Frey submitted
patches to the dict-misc package, but are not currently available on
the ftp site.
- V.E.R.A. - A Dictionary of Computer Related Acronyms
-
The GNU V.E.R.A.
(Virtual Entity of Relevant Acronyms) is a free list of acronyms, all of which
are used in the field of computing. V.E.R.A. is primarily meant to be used as
an online reference, although some efforts have been taken to make its TeX
output look acceptable. It contains approximately 8100 acronyms.
Note: Taken from dict.org as indicated above
Strategies
For dictd server version 1.9.7 from
dict.org,
available search algorithm are:
- exact
-
An exact match. This algorithm uses a binary search and is one of the fastest
search algorithms available.
- lev
-
The Levenshtein algorithm (string edit distance of one). This algorithm
searches for all words which are within an edit distance of one from the target
word. An “edit” means an insertion, deletion, or transposition.
This is a rapid algorithm for correcting spelling errors, since many spelling
errors are within a Levenshtein distance of one from the original word.
- prefix
-
Prefix match. This algorithm also uses a binary search and is very fast.
- re
-
POSIX 1003.2 (modern) regular expression search. Modern regular expressions are
the ones used by egrep(1). These regular expressions allow predefined character
classes (e.g., [[:alnum:]], [[:alpha:]], [[:digit:]], and [[:xdigit:]] are
useful for this application); uses * to match a sequence 0 or more matches of
the previous atom; uses + to match a sequence of 1 or more matches of the
previous atom; uses ? to match a sequence of 0 or 1 matches of the previous
atom; used ^ to match the beginning of a word, uses $ to match the end of a
word, and allows nested subexpression and alternation with () and |. For
example, “(foo|bar)” matches all words that contain either
“foo“ or “bar”. To match these special characters, they
must be quoted with two backslashes (due to the quoting characteristics of the
server). Warning: Regular expression matches can take 10 to 300 times longer
than substring matches. On a busy server, with many databases, this can
required more than 5 minutes of waiting time, depending on the complexity of
the regular expression.
- regexp
-
Old (basic) regular expressions. These regular expressions don't support |, +,
or ?. Groups use escaped parentheses. While modern regular expressions are
generally easier to use, basic regular expressions have a back reference
feature. This can be used to match a second occurrence of something that was
already matched. For example, the following expression finds all words that
begin and end with the same three letters:
^\\(...\\).*\\1$
Note the use of the double backslashes to escape the special characters. This
is required by the DICT protocol string specification (a single backslash
quotes the next character -- we use two to get a single backslash through to
the regular expression engine). Warning: Note that the use of backtracking is
even slower than the use of general regular expressions.
- soundex
-
The Soundex algorithm, a classic algorithm for finding words that sound similar
to each other. The algorithm encodes each word using the first letter of the
word and up to three digits. Since the first letter is known, this search is
relatively fast, and it sometimes good for correcting spelling errors when the
Levenshtein algorithm doesn't help.
- substring
-
Match a substring anywhere in the headword. This search strategy uses a
modified Boyer-Moore-Horspool algorithm. Since it must search the whole index
file, it is not as fast as the exact and prefix matches.
- suffix
-
Suffix match. This search strategy also uses a modified Boyer-Moore-Horspool
algorithm, and is as fast as the substring search. If the optional index_suffix
string file is listed in the configuration file this search is much faster.
- word
-
Match any single word, even if part of a multi-word entry. If the optional index_word
string file is listed in the configuration file this search is much faster.
Note: Taken from dictd man page
To report a bug or make a suggestion regarding this application or this
documentation, e-mail <cws@miraclenet.co.th>.
Last update : $Id: lookup.html,v 1.1 2004/02/24 00:44:31 cws Exp $