Word delimitation when indexing

Top  Previous  Next

When building the index, the indexer must decide how to split a sentence up into words. Any characters in the following range are considered to be part of a word.

Lower case characters, ‘a’ to ‘z’
Upper case characters, ‘A’ to ‘Z’
Numbers, ‘0’ to ‘9’
Foreign characters, ‘À’ to ‘ÿ’
A join character (defined by the user – eg. dot (‘.’), dash/hyphen (‘-‘), underscore, etc.) immediately followed by another valid character (one of the above), eg. “2.5” and “F.B.I.”. See "Indexing options" for more information.

Any characters not in this range will force the current word to end and a new word to start. For example, based on the default configuration, this sentence,

“Record number 653-45+ABCD is invalid”

will be broken up into 6 words,

“Record”,  “number”, “653-45”,  “ABCD”, “is”, “invalid”