Class WordResolver

java.lang.Object
com.ctc.wstx.util.WordResolver

public final class WordResolver extends Object
A specialized Map/Symbol table - like data structure that can be used for both checking whether a word (passed in as a char array) exists in certain set of words AND getting that word as a String. It is reasonably efficient both time and speed-wise, at least for certain use cases; specifically, if there is no existing key to use, it is more efficient way to get to a shared copy of that String The general usage pattern is expected to be such that most checks are positive, ie. that the word indeed is contained in the structure.

Although this is an efficient data struct for specific set of usage patterns, one restriction is that the full set of words to include has to be known before constructing the instnace. Also, the size of the set is limited to total word content of about 20k characters.

TODO: Should document the internal data structure...

  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    private static final class 
     
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    (package private) static final char
     
    static final int
    Maximum number of words (Strings) an instance can contain
    (package private) final char[]
    Compressed presentation of the word set.
    (package private) static final int
    This is actually just a guess; but in general linear search should be faster for short sequences (definitely for 4 or less; maybe up to 8 or less?)
    (package private) final String[]
    Array of actual words returned resolved for matches.
    (package private) static final int
    Offset added to numbers to mark 'negative' numbers.
  • Constructor Summary

    Constructors
    Constructor
    Description
    WordResolver(String[] words, char[] index)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    Tries to construct an instance given ordered set of words.
    find(char[] str, int start, int end)
     
    find(String str)
     
    private String
    findFromOne(char[] str, int start, int end)
     
    int
     
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

    • MAX_WORDS

      public static final int MAX_WORDS
      Maximum number of words (Strings) an instance can contain
      See Also:
    • CHAR_NULL

      static final char CHAR_NULL
      See Also:
    • NEGATIVE_OFFSET

      static final int NEGATIVE_OFFSET
      Offset added to numbers to mark 'negative' numbers. Asymmetric, since range of negative markers needed is smaller than positive numbers...
      See Also:
    • mData

      final char[] mData
      Compressed presentation of the word set.
    • mWords

      final String[] mWords
      Array of actual words returned resolved for matches.
  • Constructor Details

    • WordResolver

      WordResolver(String[] words, char[] index)
  • Method Details

    • constructInstance

      public static WordResolver constructInstance(TreeSet<String> wordSet)
      Tries to construct an instance given ordered set of words.

      Note: currently maximum number of words that can be contained is limited to MAX_WORDS; additionally, maximum length of all such words can not exceed roughly 28000 characters.

      Returns:
      WordResolver constructed for given set of words, if the word set size is not too big; null to indicate "too big" instance.
    • size

      public int size()
      Returns:
      Number of words contained
    • find

      public String find(char[] str, int start, int end)
      Parameters:
      str - Character array that contains the word to find
      start - Index of the first character of the word
      end - Index following the last character of the word, so that end - start equals word length (similar to the way String.substring() has).
      Returns:
      (Shared) string instance of the word, if it exists in the word set; null if not.
    • findFromOne

      private String findFromOne(char[] str, int start, int end)
    • find

      public String find(String str)
      Returns:
      (Shared) string instance of the word, if it exists in the word set; null if not.
    • toString

      public String toString()
      Overrides:
      toString in class Object