Class PositiveIntSet_impl

java.lang.Object
org.apache.uima.internal.util.PositiveIntSet_impl
All Implemented Interfaces:
PositiveIntSet

public class PositiveIntSet_impl extends Object implements PositiveIntSet
An set of non-zero integers, ability to iterate over them (possibly in a sorted way), with O(1) operations for adding, removing, and testing for contains. Optimized for adding mostly increasing integers (with some space for less-than-first-one-added). - major uses: indexes of Feature Structures. Optimize for small sets Graceful degradation for completely random integer operations on large sets; keep O(1) operations, at the expense of extra space (< 3 x size). Uses several different representations depending on the range of ints stored and the total number of ints stored. Sizes: Tiny, medium, large Ranges: semi-knowable, unknowable; if semi-knowable, dense, small-range (< 65K), large-range For all sizes, if dense, use IntBitSet (with offset) else For large, (implies large-range, too) use IntHashSet For medium, if small-range < 65K, use IntHashSet with offset else use IntHashSet Arrange switching between representations to occur infrequently, especially as cost of switching (size) grows Arrange checking for switching to occur infrequently, taking into account how underlying data structures grow (which is often by doubling) Switch when adding new member(s) if alternative representation is sufficiently smaller
  • Field Details

    • HYSTERESIS

      private static final int HYSTERESIS
      See Also:
    • INT_SET_MAX_SIZE

      private static final int INT_SET_MAX_SIZE
      See Also:
    • HASH_SET_SHORT_MAX_SIZE

      private static final int HASH_SET_SHORT_MAX_SIZE
      See Also:
    • BIT_SET_OVERALLOCATE

      private static final int BIT_SET_OVERALLOCATE
      See Also:
    • HASH_SET_OVERALLOCATE_DIVIDER_SHIFT

      private static final int HASH_SET_OVERALLOCATE_DIVIDER_SHIFT
      See Also:
    • intSet

      private PositiveIntSet intSet
    • isBitSet

      boolean isBitSet
    • isIntSet

      boolean isIntSet
    • isHashSet

      boolean isHashSet
    • secondTimeShrinkable

      boolean secondTimeShrinkable
    • useOffset

      boolean useOffset
      Set false once we find we have to reduce the bit offset
  • Constructor Details

    • PositiveIntSet_impl

      public PositiveIntSet_impl()
    • PositiveIntSet_impl

      public PositiveIntSet_impl(int initialSize, int estMin, int estMax)
      Set up a Positive Bit Set
      Parameters:
      initialSize - - if 0, don't allocate yet, wait for first add. If isBitSetDense, then this number is interpreted as the first int to be added, typically with an offset. The next two params are used only if initialSize is not 0.
      estMin - - the estimated minimum int value to be added
      estMax - - the estimated max int value to be added
  • Method Details

    • iterator

      public IntListIterator iterator()
      Specified by:
      iterator in interface PositiveIntSet
      Returns:
      an iterator (may be ordered or unordered) over the members of the set
    • getUnorderedIterator

      public IntListIterator getUnorderedIterator()
    • getOrderedIterator

      public IntListIterator getOrderedIterator()
    • clear

      public void clear()
      Description copied from interface: PositiveIntSet
      remove all members of the set
      Specified by:
      clear in interface PositiveIntSet
    • contains

      public boolean contains(int key)
      Specified by:
      contains in interface PositiveIntSet
      Parameters:
      key - -
      Returns:
      true if key is in the set
    • find

      public int find(int element)
      Specified by:
      find in interface PositiveIntSet
      Parameters:
      element - an item which may be in the set
      Returns:
      -1 if the item is not in the set, or a position value that can be used with iterators to start at that item.
    • add

      public boolean add(int key)
      Specified by:
      add in interface PositiveIntSet
      Parameters:
      key - -
      Returns:
      true if this set did not already contain the specified element
    • remove

      public boolean remove(int key)
      Specified by:
      remove in interface PositiveIntSet
      Parameters:
      key - -
      Returns:
      true if the set had this element before the remove
    • size

      public int size()
      Specified by:
      size in interface PositiveIntSet
      Returns:
      number of elements in the set
    • adjustBitSetForLowerOffset

      private void adjustBitSetForLowerOffset(int key)
      When a key is added which is lower than the offset, we need to adjust the whole int set table. Although an array copy could do this, we don't have access to the bitset impl. So we just set the offset to 0, and either convert the whole thing to a inthashset or copy all the bits to a new bit set.
      Parameters:
      key -
    • getHashSetOffset

      private static int getHashSetOffset(int estMax, int estMin)
    • toUnorderedIntArray

      public int[] toUnorderedIntArray()
    • toOrderedIntArray

      public int[] toOrderedIntArray()
    • getBitSetSpaceFromRange

      private static int getBitSetSpaceFromRange(int adjKey, int spaceUsed)
      Only called if key won't fit in existing IntHashSet, IntSet or IntBitSet allocation If IntBitSet (spaceUsed != 0) compute what its size() (capacity) would be if the key were added (causing doubling); include the overhead of the intbitset class. long words required = larger of double the current size, or the number of long words required
      Parameters:
      adjKey - - the range of bits needed
      spaceUsed - - the number of bits currently allocated (if currently a bit set), else 0
      Returns:
      number of words needed to store the range,
    • getIntSetSpace

      private static int getIntSetSpace(int size)
      Parameters:
      size - including new element to be added
      Returns:
      number of words including overhead to store that many elements unless is bigger than INT_SET_MAX_SIZE, then return MAX_VALUE
    • getHashSetSpace

      private int getHashSetSpace()
      Only called if key won't fit in existing allocation returns new hash table size ( usually double) + overhead
      Parameters:
      numberOfEntries -
      Returns:
      new hash table size ( usually double) + overhead
    • getHashSetSpace

      private static int getHashSetSpace(int numberOfElements, int estMax, int estMin)
      For the case where the intHashSet doesn't yet exist
      Parameters:
      numberOfElements - -
      estMax - - the largest int to be stored
      estMin - - the smallest int to be stored
      Returns:
      the size in words
    • getHashSetOverAllocateSize

      private static int getHashSetOverAllocateSize(int existingSize)
      When converting from bitset to hash set, the initial hash set should be big enough so that it takes a while before it is doubled-expanded. Reduce overallocation for small sizes because the cost to switch is low, and the % overallocate could be a large.
      Parameters:
      existingSize -
      Returns:
      overallocate size
    • maybeSwitchRepresentation

      private void maybeSwitchRepresentation(int newKey)
      logic for switching representation BitSet preferred - is faster, can be more compact, and permits ordered iteration HashSet used when BitSet is too space inefficient. Avoid switching if the new key being added won't increase the representation size - for hash: if number of elements +1 won't cause doubling of the hash table - for bitset: if key will fit in existing "capacity" Compute space needed after adding - bitset: array doubles - hashset: array doubles Preventing jitters back and forth: - removes don't cause switching - compare against other size including its non-linear space jumps.
    • handleBitSet

      private void handleBitSet(int newKey)
      Bit Set
    • handleIntSet

      private void handleIntSet(int newKey)
      IntSet *
    • handleHashSet

      private void handleHashSet(int newKey)
      Hash Set
    • allocateIntBitSet

      private boolean allocateIntBitSet(int estMax, int estMin, int offsetSpec)
      Allocate a new IntBitSet
      Parameters:
      estMax - the largest int to store in the bit set - determines the initial size
      estMin - this is a lower bound on the value that can be in the bit set
      offsetSpec - this is the offset to use, or -1, to calculate the offset from the initial_size (assuming a small amount of adds will be for ints somewhat less than the initial_size entry (being treated as the first int to be added)
      Returns:
      true if allocated with offset, implying guarantee the estMax key fits.
    • estimatedBitSetOffset

      private static int estimatedBitSetOffset(int estMin, int offsetSpec)
    • switchFromBitSet

      private void switchFromBitSet(int size, int offset)
      switching from bit set to either IntSet or IntHashSet
      Parameters:
      size - - space needed including new item
      offset - - for IntHashSet values kept as short, the offset subtracted from values before storing them offset == Integer.MIN_VALUE means use 4 byte ints
    • switchToIntSet

      private void switchToIntSet(int size)
      switch to IntSet
      Parameters:
      size - number of elements to be able to store without expanding
    • switchToHashSet

      private void switchToHashSet(int size, int offset)
      Switch from any intSet impl to Hash Set
      Parameters:
      size - - the capacity - this many elements could be stored before swtiching
      offset - - used only when the size invalid input: '<' 65K, and is a value subtracted from elements before storing them, in an attempt to fit the results into just "short"s instead of "int"s if == MIN_VALUE, then force 4 byte ints
    • switchToBitSet

      private void switchToBitSet(int estMax, int estMin, int offset)
    • get

      public int get(int position)
      Description copied from interface: PositiveIntSet
      For FSBagIndex low level iterator use DOESN"T WORK WITH INCREMENTING position VALUES
      Specified by:
      get in interface PositiveIntSet
      Parameters:
      position - - get the element at this position. This is for iterator use only, and is not related to any key
      Returns:
      the element
    • moveToFirst

      public int moveToFirst()
      Description copied from interface: PositiveIntSet
      For FSBagIndex low level iterator use
      Specified by:
      moveToFirst in interface PositiveIntSet
      Returns:
      the position of the first element, or -1;
    • moveToLast

      public int moveToLast()
      Description copied from interface: PositiveIntSet
      For FSBagIndex low level iterator use
      Specified by:
      moveToLast in interface PositiveIntSet
      Returns:
      the position of the last element, or -1;
    • moveToNext

      public int moveToNext(int position)
      Description copied from interface: PositiveIntSet
      For FSBagIndex low level iterator use
      Specified by:
      moveToNext in interface PositiveIntSet
      Parameters:
      position - -
      Returns:
      the position of the next element, or -1;
    • moveToPrevious

      public int moveToPrevious(int position)
      Description copied from interface: PositiveIntSet
      For FSBagIndex low level iterator use
      Specified by:
      moveToPrevious in interface PositiveIntSet
      Parameters:
      position - -
      Returns:
      the position of the next element, or -1;
    • isValid

      public boolean isValid(int position)
      Description copied from interface: PositiveIntSet
      For FSBagIndex low level iterator use
      Specified by:
      isValid in interface PositiveIntSet
      Parameters:
      position - -
      Returns:
      true if the position is between the first and last element inclusive.
    • bulkAddTo

      public void bulkAddTo(IntVector v)
      Description copied from interface: PositiveIntSet
      add all elements in this set to the IntVector v as a bulk operation
      Specified by:
      bulkAddTo in interface PositiveIntSet
      Parameters:
      v - - to be added to
    • toIntArray

      public int[] toIntArray()
      Specified by:
      toIntArray in interface PositiveIntSet
      Returns:
      the set as an arbitrarily ordered int array
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • isOffsetBitSet

      boolean isOffsetBitSet()
    • isShortHashSet

      boolean isShortHashSet()