Class TextBuffer

java.lang.Object
com.ctc.wstx.util.TextBuffer

public final class TextBuffer extends Object
TextBuffer is a class similar to StringBuilder, with following differences:
  • TextBuffer uses segments character arrays, to avoid having to do additional array copies when array is not big enough. This means that only reallocating that is necessary is done only once -- if and when caller wants to access contents in a linear array (char[], String).
  • TextBuffer is not synchronized.

Over time more and more cruft has accumulated here, mostly to support efficient access to collected text. Since access is easiest to do efficiently using callbacks, this class now needs to known interfaces of SAX classes and validators.

Notes about usage: for debugging purposes, it's suggested to use toString() method, as opposed to contentsAsArray() or contentsAsString(). Internally resulting code paths may or may not be different, WRT caching.

  • Field Details

    • DEF_INITIAL_BUFFER_SIZE

      static final int DEF_INITIAL_BUFFER_SIZE
      Size of the first text segment buffer to allocate; need not contain the biggest segment, since new ones will get allocated as needed. However, it's sensible to use something that often is big enough to contain segments.
      See Also:
    • MAX_SEGMENT_LENGTH

      static final int MAX_SEGMENT_LENGTH
      We will also restrict maximum length of individual segments to allocate (not including cases where we must return a single segment). Value is somewhat arbitrary, let's use it so that memory used is no more than 1/2 megabytes.
      See Also:
    • INT_SPACE

      static final int INT_SPACE
      See Also:
    • mConfig

      private final ReaderConfig mConfig
    • mInputBuffer

      private char[] mInputBuffer
      Shared input buffer; stored here in case some input can be returned as is, without being copied to collector's own buffers. Note that this is read-only for this Objet.
    • mInputStart

      private int mInputStart
      Character offset of first char in input buffer; -1 to indicate that input buffer currently does not contain any useful char data
    • mInputLen

      private int mInputLen
      When using shared buffer, offset after the last character in shared buffer
    • mHasSegments

      private boolean mHasSegments
    • mSegments

      private ArrayList<char[]> mSegments
      List of segments prior to currently active segment.
    • mSegmentSize

      private int mSegmentSize
      Amount of characters in segments in mSegments
    • mCurrentSegment

      private char[] mCurrentSegment
    • mCurrentSize

      private int mCurrentSize
      Number of characters in currently active (last) segment
    • mResultString

      private String mResultString
      String that will be constructed when the whole contents are needed; will be temporarily stored in case asked for again.
    • mResultArray

      private char[] mResultArray
    • MAX_INDENT_SPACES

      public static final int MAX_INDENT_SPACES
      See Also:
    • MAX_INDENT_TABS

      public static final int MAX_INDENT_TABS
      See Also:
    • sIndSpaces

      private static final String sIndSpaces
      See Also:
    • sIndSpacesArray

      private static final char[] sIndSpacesArray
    • sIndSpacesStrings

      private static final String[] sIndSpacesStrings
    • sIndTabs

      private static final String sIndTabs
      See Also:
    • sIndTabsArray

      private static final char[] sIndTabsArray
    • sIndTabsStrings

      private static final String[] sIndTabsStrings
  • Constructor Details

  • Method Details

    • createRecyclableBuffer

      public static TextBuffer createRecyclableBuffer(ReaderConfig cfg)
    • createTemporaryBuffer

      public static TextBuffer createTemporaryBuffer()
    • recycle

      public void recycle(boolean force)
      Method called to indicate that the underlying buffers should now be recycled if they haven't yet been recycled. Although caller can still use this text buffer, it is not advisable to call this method if that is likely, since next time a buffer is needed, buffers need to reallocated. Note: calling this method automatically also clears contents of the buffer.
    • resetWithEmpty

      public void resetWithEmpty()
      Method called to clear out any content text buffer may have, and initializes buffer to use non-shared data.
    • resetWithEmptyString

      public void resetWithEmptyString()
      Similar to resetWithEmpty(), but actively marks current text content to be empty string (whereas former method leaves content as undefined).
    • resetWithShared

      public void resetWithShared(char[] buf, int start, int len)
      Method called to initialize the buffer with a shared copy of data; this means that buffer will just have pointers to actual data. It also means that if anything is to be appended to the buffer, it will first have to unshare it (make a local copy).
    • resetWithCopy

      public void resetWithCopy(char[] buf, int start, int len)
    • resetInitialized

      public void resetInitialized()
      Method called to make sure there is a non-shared segment to use, without appending any content yet.
    • allocBuffer

      private final char[] allocBuffer(int needed)
    • clearSegments

      private final void clearSegments()
    • resetWithIndentation

      public void resetWithIndentation(int indCharCount, char indChar)
    • size

      public int size()
      Returns:
      Number of characters currently stored by this collector
    • getTextStart

      public int getTextStart()
    • getTextBuffer

      public char[] getTextBuffer()
    • decode

      public void decode(org.codehaus.stax2.typed.TypedValueDecoder tvd) throws IllegalArgumentException
      Generic pass-through method which call given decoder with accumulated data
      Throws:
      IllegalArgumentException
    • decodeElements

      public int decodeElements(org.codehaus.stax2.typed.TypedArrayDecoder tad, InputProblemReporter rep) throws org.codehaus.stax2.typed.TypedXMLStreamException
      Pass-through decode method called to find find the next token, decode it, and repeat the process as long as there are more tokens and the array decoder accepts more entries. All tokens processed will be "consumed", such that they will not be visible via buffer.
      Returns:
      Number of tokens decoded; 0 means that no (more) tokens were found from this buffer.
      Throws:
      org.codehaus.stax2.typed.TypedXMLStreamException
    • initBinaryChunks

      public void initBinaryChunks(org.codehaus.stax2.typed.Base64Variant v, org.codehaus.stax2.ri.typed.CharArrayBase64Decoder dec, boolean firstChunk)
      Method that needs to be called to configure given base64 decoder with textual contents collected by this buffer.
      Parameters:
      dec - Decoder that will need data
      firstChunk - Whether this is the first segment fed or not; if it is, state needs to be fullt reset; if not, only partially.
    • contentsAsString

      public String contentsAsString()
    • contentsAsStringBuilder

      public StringBuilder contentsAsStringBuilder(int extraSpace)
      Similar to contentsAsString(), but constructs a StringBuilder for further appends.
      Parameters:
      extraSpace - Number of extra characters to preserve in StringBuilder beyond space immediately needed to hold the contents
    • contentsToStringBuilder

      public void contentsToStringBuilder(StringBuilder sb)
    • contentsAsArray

      public char[] contentsAsArray()
    • contentsToArray

      public int contentsToArray(int srcStart, char[] dst, int dstStart, int len)
    • rawContentsTo

      public int rawContentsTo(Writer w) throws IOException
      Method that will stream contents of this buffer into specified Writer.
      Throws:
      IOException
    • rawContentsViaReader

      @Deprecated public Reader rawContentsViaReader() throws IOException
      Deprecated.
      Throws:
      IOException
    • isAllWhitespace

      public boolean isAllWhitespace()
    • equalsString

      public boolean equalsString(String str)
      Note: it is assumed that this method is not used often enough to be a bottleneck, or for long segments. Based on this, it is optimized for common simple cases where there is only one single character segment to use; fallback for other cases is to create such segment.
    • fireSaxCharacterEvents

      public void fireSaxCharacterEvents(ContentHandler h) throws SAXException
      Throws:
      SAXException
    • fireSaxSpaceEvents

      public void fireSaxSpaceEvents(ContentHandler h) throws SAXException
      Throws:
      SAXException
    • fireSaxCommentEvent

      public void fireSaxCommentEvent(LexicalHandler h) throws SAXException
      Throws:
      SAXException
    • fireDtdCommentEvent

      public void fireDtdCommentEvent(DTDEventListener l)
    • validateText

      public void validateText(org.codehaus.stax2.validation.XMLValidator vld, boolean lastSegment) throws XMLStreamException
      Throws:
      XMLStreamException
    • ensureNotShared

      public void ensureNotShared()
      Method called to make sure that buffer is not using shared input buffer; if it is, it will copy such contents to private buffer.
    • append

      public void append(char c)
    • append

      public void append(char[] c, int start, int len)
    • append

      public void append(String str)
    • getCurrentSegment

      public char[] getCurrentSegment()
    • getCurrentSegmentSize

      public int getCurrentSegmentSize()
    • setCurrentLength

      public void setCurrentLength(int len)
    • finishCurrentSegment

      public char[] finishCurrentSegment()
    • calcNewSize

      private int calcNewSize(int latestSize)
      Method used to determine size of the next segment to allocate to contain textual content.
    • toString

      public String toString()
      Note: calling this method may not be as efficient as calling contentsAsString(), since it's not guaranteed that resulting String is cached.
      Overrides:
      toString in class Object
    • unshare

      public void unshare(int needExtra)
      Method called if/when we need to append content when we have been initialized to use shared buffer.
    • expand

      private void expand(int roomNeeded)
      Method called when current segment is full, to allocate new segment.
      Parameters:
      roomNeeded - Number of characters that the resulting new buffer must have
    • buildResultArray

      private char[] buildResultArray()