Package com.ctc.wstx.util
Class WordResolver
java.lang.Object
com.ctc.wstx.util.WordResolver
A specialized Map/Symbol table - like data structure that can be used
for both checking whether a word (passed in as a char array) exists
in certain set of words AND getting that word as a String.
It is reasonably efficient both time and speed-wise, at least for
certain use cases; specifically, if there is no existing key to use,
it is more efficient way to get to a shared copy of that String
The general usage pattern is expected
to be such that most checks are positive, ie. that the word indeed
is contained in the structure.
Although this is an efficient data struct for specific set of usage patterns, one restriction is that the full set of words to include has to be known before constructing the instnace. Also, the size of the set is limited to total word content of about 20k characters.
TODO: Should document the internal data structure...
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescription(package private) static final char
static final int
Maximum number of words (Strings) an instance can contain(package private) final char[]
Compressed presentation of the word set.(package private) static final int
This is actually just a guess; but in general linear search should be faster for short sequences (definitely for 4 or less; maybe up to 8 or less?)(package private) final String[]
Array of actual words returned resolved for matches.(package private) static final int
Offset added to numbers to mark 'negative' numbers. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic WordResolver
constructInstance
(TreeSet<String> wordSet) Tries to construct an instance given ordered set of words.find
(char[] str, int start, int end) private String
findFromOne
(char[] str, int start, int end) int
size()
toString()
-
Field Details
-
MAX_WORDS
public static final int MAX_WORDSMaximum number of words (Strings) an instance can contain- See Also:
-
CHAR_NULL
static final char CHAR_NULL- See Also:
-
NEGATIVE_OFFSET
static final int NEGATIVE_OFFSETOffset added to numbers to mark 'negative' numbers. Asymmetric, since range of negative markers needed is smaller than positive numbers...- See Also:
-
MIN_BINARY_SEARCH
static final int MIN_BINARY_SEARCHThis is actually just a guess; but in general linear search should be faster for short sequences (definitely for 4 or less; maybe up to 8 or less?)- See Also:
-
mData
final char[] mDataCompressed presentation of the word set. -
mWords
Array of actual words returned resolved for matches.
-
-
Constructor Details
-
WordResolver
WordResolver(String[] words, char[] index)
-
-
Method Details
-
constructInstance
Tries to construct an instance given ordered set of words.Note: currently maximum number of words that can be contained is limited to
MAX_WORDS
; additionally, maximum length of all such words can not exceed roughly 28000 characters.- Returns:
- WordResolver constructed for given set of words, if the word set size is not too big; null to indicate "too big" instance.
-
size
public int size()- Returns:
- Number of words contained
-
find
- Parameters:
str
- Character array that contains the word to findstart
- Index of the first character of the wordend
- Index following the last character of the word, so thatend - start
equals word length (similar to the wayString.substring()
has).- Returns:
- (Shared) string instance of the word, if it exists in the word set; null if not.
-
findFromOne
-
find
- Returns:
- (Shared) string instance of the word, if it exists in the word set; null if not.
-
toString
-