Class SymbolTable

java.lang.Object
com.ctc.wstx.util.SymbolTable

public class SymbolTable extends Object
This class is a kind of specialized type-safe Map, from char array to String value. Specialization means that in addition to type-safety and specific access patterns (key char array, Value optionally interned String; values added on access if necessary), and that instances are meant to be used concurrently, but by using well-defined mechanisms to obtain such concurrently usable instances. Main use for the class is to store symbol table information for things like compilers and parsers; especially when number of symbols (keywords) is limited.

For optimal performance, usage pattern should be one where matches should be very common (esp. after "warm-up"), and as with most hash-based maps/sets, that hash codes are uniformly distributed. Also, collisions are slightly more expensive than with HashMap or HashSet, since hash codes are not used in resolving collisions; that is, equals() comparison is done with all symbols in same bucket index.
Finally, rehashing is also more expensive, as hash codes are not stored; rehashing requires all entries' hash codes to be recalculated. Reason for not storing hash codes is reduced memory usage, hoping for better memory locality.

Usual usage pattern is to create a single "master" instance, and either use that instance in sequential fashion, or to create derived "child" instances, which after use, are asked to return possible symbol additions to master instance. In either case benefit is that symbol table gets initialized so that further uses are more efficient, as eventually all symbols needed will already be in symbol table. At that point no more Symbol String allocations are needed, nor changes to symbol table itself.

Note that while individual SymbolTable instances are NOT thread-safe (much like generic collection classes), concurrently used "child" instances can be freely used without synchronization. However, using master table concurrently with child instances can only be done if access to master instance is read-only (ie. no modifications done).

  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    (package private) static final class 
    This class is a symbol table entry.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected static final float
     
    protected static final int
    Default initial table size; no need to make it miniscule, due to couple of things: first, overhead of array reallocation is significant, and second, overhead of rehashing is also non-negligible.
    protected static final String
     
    protected SymbolTable.Bucket[]
    Overflow buckets; if primary doesn't match, lookup is done from here.
    protected boolean
    Flag that indicates if any changes have been made to the data; used to both determine if bucket array needs to be copied when (first) change is made, and potentially if updated bucket list is to be resync'ed back to master instance.
    protected int
    Mask used to get index from hash values; equal to mBuckets.length - 1, when mBuckets.length is a power of two.
    protected boolean
    Flag that determines whether Strings to be added need to be interned before being added or not.
    protected int
    Current size (number of entries); needed to know if and when rehash.
    protected int
    Limit that indicates maximum size this instance can hold before it needs to be expanded and rehashed.
    protected String[]
    Primary matching symbols; it's expected most match occur from here.
    protected int
    Version of this table instance; used when deriving new concurrently used versions from existing 'master' instance.
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
     
    Method for constructing a master symbol table instance; this one will create master instance with default size, and with interning enabled.
     
    SymbolTable(boolean internStrings)
    Method for constructing a master symbol table instance.
     
    SymbolTable(boolean internStrings, int initialSize)
    Method for constructing a master symbol table instance.
     
    SymbolTable(boolean internStrings, int initialSize, float fillFactor)
    Main method for constructing a master symbol table instance; will be called by other public constructors.
    private
    SymbolTable(boolean internStrings, String[] symbols, SymbolTable.Bucket[] buckets, int size, int sizeThreshold, int indexMask, int version)
    Internal constructor used when creating child instances.
  • Method Summary

    Modifier and Type
    Method
    Description
    double
     
    static int
    calcHash(char[] buffer, int start, int len)
    Implementation of a hashing method for variable length Strings.
    static int
     
    private void
    Method called when copy-on-write is needed; generally when first change is made to a derived symbol table.
    findSymbol(char[] buffer, int start, int len, int hash)
    Main access method; will check if actual symbol String exists; if so, returns it; if not, will create, add and return it.
    Similar to to findSymbol(char[],int,int,int); used to either do potentially cheap intern() (if table already has intern()ed version), or to pre-populate symbol table with known values.
    findSymbolIfExists(char[] buffer, int start, int len, int hash)
    Similar to {link #findSymbol}, but will not add passed in symbol if it is not in symbol table yet.
    boolean
     
    boolean
     
    "Factory" method; will create a new child instance of this symbol table.
    void
    Method that allows contents of child table to potentially be "merged in" with contents of this symbol table.
    private void
    Method called when size (number of entries) of symbol table grows so big that load factor is exceeded.
    void
    setInternStrings(boolean state)
     
    int
     
    int
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • DEFAULT_TABLE_SIZE

      protected static final int DEFAULT_TABLE_SIZE
      Default initial table size; no need to make it miniscule, due to couple of things: first, overhead of array reallocation is significant, and second, overhead of rehashing is also non-negligible.

      Let's use 128 as the default; it allows for up to 96 symbols, and uses about 512 bytes on 32-bit machines.

      See Also:
    • DEFAULT_FILL_FACTOR

      protected static final float DEFAULT_FILL_FACTOR
      See Also:
    • EMPTY_STRING

      protected static final String EMPTY_STRING
      See Also:
    • mInternStrings

      protected boolean mInternStrings
      Flag that determines whether Strings to be added need to be interned before being added or not. Forcing intern()ing will add some overhead when adding new Strings, but may be beneficial if such Strings are generally used by other parts of system. Note that even without interning, all returned String instances are guaranteed to be comparable with equality (==) operator; it's just that such guarantees are not made for Strings other classes return.
    • mSymbols

      protected String[] mSymbols
      Primary matching symbols; it's expected most match occur from here.
    • mBuckets

      protected SymbolTable.Bucket[] mBuckets
      Overflow buckets; if primary doesn't match, lookup is done from here.

      Note: Number of buckets is half of number of symbol entries, on assumption there's less need for buckets.

    • mSize

      protected int mSize
      Current size (number of entries); needed to know if and when rehash.
    • mSizeThreshold

      protected int mSizeThreshold
      Limit that indicates maximum size this instance can hold before it needs to be expanded and rehashed. Calculated using fill factor passed in to constructor.
    • mIndexMask

      protected int mIndexMask
      Mask used to get index from hash values; equal to mBuckets.length - 1, when mBuckets.length is a power of two.
    • mThisVersion

      protected int mThisVersion
      Version of this table instance; used when deriving new concurrently used versions from existing 'master' instance.
    • mDirty

      protected boolean mDirty
      Flag that indicates if any changes have been made to the data; used to both determine if bucket array needs to be copied when (first) change is made, and potentially if updated bucket list is to be resync'ed back to master instance.
  • Constructor Details

    • SymbolTable

      public SymbolTable()
      Method for constructing a master symbol table instance; this one will create master instance with default size, and with interning enabled.
    • SymbolTable

      public SymbolTable(boolean internStrings)
      Method for constructing a master symbol table instance.
    • SymbolTable

      public SymbolTable(boolean internStrings, int initialSize)
      Method for constructing a master symbol table instance.
    • SymbolTable

      public SymbolTable(boolean internStrings, int initialSize, float fillFactor)
      Main method for constructing a master symbol table instance; will be called by other public constructors.
      Parameters:
      internStrings - Whether Strings to add are intern()ed or not
      initialSize - Minimum initial size for bucket array; internally will always use a power of two equal to or bigger than this value.
      fillFactor - Maximum fill factor allowed for bucket table; when more entries are added, table will be expanded.
    • SymbolTable

      private SymbolTable(boolean internStrings, String[] symbols, SymbolTable.Bucket[] buckets, int size, int sizeThreshold, int indexMask, int version)
      Internal constructor used when creating child instances.
  • Method Details

    • makeChild

      public SymbolTable makeChild()
      "Factory" method; will create a new child instance of this symbol table. It will be a copy-on-write instance, ie. it will only use read-only copy of parent's data, but when changes are needed, a copy will be created.

      Note: while data access part of this method is synchronized, it is generally not safe to both use makeChild/mergeChild, AND to use instance actively. Instead, a separate 'root' instance should be used on which only makeChild/mergeChild are called, but instance itself is not used as a symbol table.

    • mergeChild

      public void mergeChild(SymbolTable child)
      Method that allows contents of child table to potentially be "merged in" with contents of this symbol table.

      Note that caller has to make sure symbol table passed in is really a child or sibling of this symbol table.

    • setInternStrings

      public void setInternStrings(boolean state)
    • size

      public int size()
    • version

      public int version()
    • isDirty

      public boolean isDirty()
    • isDirectChildOf

      public boolean isDirectChildOf(SymbolTable t)
    • findSymbol

      public String findSymbol(char[] buffer, int start, int len, int hash)
      Main access method; will check if actual symbol String exists; if so, returns it; if not, will create, add and return it.
      Returns:
      The symbol matching String in input array
    • findSymbolIfExists

      public String findSymbolIfExists(char[] buffer, int start, int len, int hash)
      Similar to {link #findSymbol}, but will not add passed in symbol if it is not in symbol table yet.
    • findSymbol

      public String findSymbol(String str)
      Similar to to findSymbol(char[],int,int,int); used to either do potentially cheap intern() (if table already has intern()ed version), or to pre-populate symbol table with known values.
    • calcHash

      public static int calcHash(char[] buffer, int start, int len)
      Implementation of a hashing method for variable length Strings. Most of the time intention is that this calculation is done by caller during parsing, not here; however, sometimes it needs to be done for parsed "String" too.
      Parameters:
      len - Length of String; has to be at least 1 (caller guarantees this pre-condition)
    • calcHash

      public static int calcHash(String key)
    • copyArrays

      private void copyArrays()
      Method called when copy-on-write is needed; generally when first change is made to a derived symbol table.
    • rehash

      private void rehash()
      Method called when size (number of entries) of symbol table grows so big that load factor is exceeded. Since size has to remain power of two, arrays will then always be doubled. Main work is really redistributing old entries into new String/Bucket entries.
    • calcAvgSeek

      public double calcAvgSeek()