Package com.ctc.wstx.io
Class StreamBootstrapper
java.lang.Object
com.ctc.wstx.io.InputBootstrapper
com.ctc.wstx.io.StreamBootstrapper
Input bootstrap class used with streams, when encoding is not known
(when encoding is specified by application, a reader is constructed,
and then reader-based bootstrapper is used).
Encoding used for an entity (including main document entity) is determined using algorithms suggested in XML 1.0#3 spec, appendix F
-
Field Summary
FieldsModifier and TypeFieldDescription(package private) boolean
private byte[]
(package private) boolean
(package private) int
For most encodings, number of physical characters needed for decoding xml declaration characters (which for variable length encodings like UTF-8 will be 1).(package private) boolean
Special case for 1-byte encodings: EBCDIC is problematic as it's not 7-bit ascii compatible.(package private) boolean
(package private) final InputStream
Underlying InputStream to use for reading content.(package private) static final int
Let's size buffer at least big enough to contain the longest possible prefix of a document needed to positively identify it starts with the XML declaration.(package private) String
private int
private int
private final boolean
Whether byte buffer is recyclable or not(package private) int[]
For single-byte non-ascii-compatible encodings (ok ok, really just EBCDIC), we'll have to use a lookup table.Fields inherited from class com.ctc.wstx.io.InputBootstrapper
BYTE_CR, BYTE_LF, BYTE_NULL, CHAR_CR, CHAR_LF, CHAR_NEL, CHAR_NULL, CHAR_SPACE, ERR_XMLDECL_END_MARKER, ERR_XMLDECL_EXP_ATTRVAL, ERR_XMLDECL_EXP_EQ, ERR_XMLDECL_EXP_SPACE, ERR_XMLDECL_KW_ENCODING, ERR_XMLDECL_KW_STANDALONE, ERR_XMLDECL_KW_VERSION, mDeclaredXmlVersion, mFoundEncoding, mInputProcessed, mInputRow, mInputRowStart, mKeywordBuffer, mPublicId, mStandalone, mSystemId, mXml11Handling
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprivate
StreamBootstrapper
(String pubId, SystemId sysId, byte[] data, int start, int end) private
StreamBootstrapper
(String pubId, SystemId sysId, InputStream in) -
Method Summary
Modifier and TypeMethodDescriptionbootstrapInput
(ReaderConfig cfg, boolean mainDoc, int xmlVersion) protected int
checkKeyword
(String exp) protected int
checkMbKeyword
(String expected) protected int
checkSbKeyword
(String expected) protected int
checkTranslatedKeyword
(String expected) protected boolean
ensureLoaded
(int minimum) int
Since this class only gets used when encoding is not explicitly passed, need use the encoding that was auto-detected...int
static StreamBootstrapper
getInstance
(String pubId, SystemId sysId, byte[] data, int start, int end) Factory method used when the underlying data provider is a pre-allocated block source, and no stream is used.static StreamBootstrapper
getInstance
(String pubId, SystemId sysId, InputStream in) Factory method used when the underlying data provider is an actual stream.protected Location
protected int
getNext()
protected int
getNextAfterWs
(boolean reqWs) protected boolean
protected void
loadMore()
protected byte
nextByte()
protected int
protected int
protected void
pushback()
protected int
readQuotedValue
(char[] outputBuffer, int quoteChar) private void
reportWeirdUCS4
(String type) protected void
Method called to try to figure out physical encoding the underlying input stream uses.protected void
skipMbLF
(int lf) protected int
skipMbWs()
protected void
skipSbLF
(byte lfByte) protected int
skipSbWs()
protected void
skipTranslatedLF
(int lf) protected int
private void
verifyEncoding
(String id, int bpc) private void
verifyEncoding
(String id, int bpc, boolean bigEndian) protected String
verifyXmlEncoding
(String enc) Methods inherited from class com.ctc.wstx.io.InputBootstrapper
declaredXml11, getDeclaredEncoding, getDeclaredVersion, getInputRow, getPublicId, getStandalone, getSystemId, initFrom, readXmlDecl, reportNull, reportUnexpectedChar, reportXmlProblem
-
Field Details
-
MIN_BUF_SIZE
static final int MIN_BUF_SIZELet's size buffer at least big enough to contain the longest possible prefix of a document needed to positively identify it starts with the XML declaration. That means having (optional) BOM, and then first 6 characters ("invalid input: '<'?xml "), in whatever encoding. With 4-byte encodings (UCS-4), that comes to 28 bytes. And for good measure, let's pad that a bit as well....- See Also:
-
mIn
Underlying InputStream to use for reading content. May be null if the actual data source is not stream-based but a block source. -
mByteBuffer
private byte[] mByteBuffer -
mRecycleBuffer
private final boolean mRecycleBufferWhether byte buffer is recyclable or not -
mInputPtr
private int mInputPtr -
mInputEnd
private int mInputEnd -
mBigEndian
boolean mBigEndian -
mHadBOM
boolean mHadBOM -
mByteSizeFound
boolean mByteSizeFound -
mBytesPerChar
int mBytesPerCharFor most encodings, number of physical characters needed for decoding xml declaration characters (which for variable length encodings like UTF-8 will be 1). Exception is EBCDIC, which while a single-byte encoding, is denoted by -1 since it needs an additional translation lookup. -
mEBCDIC
boolean mEBCDICSpecial case for 1-byte encodings: EBCDIC is problematic as it's not 7-bit ascii compatible. We can deal with it, still, but only with bit of extra state. -
mInputEncoding
String mInputEncoding -
mSingleByteTranslation
int[] mSingleByteTranslationFor single-byte non-ascii-compatible encodings (ok ok, really just EBCDIC), we'll have to use a lookup table.
-
-
Constructor Details
-
StreamBootstrapper
-
StreamBootstrapper
- Parameters:
start
- Pointer to the first valid byte in the bufferend
- Pointer to the offset after last valid byte in the buffer
-
-
Method Details
-
getInstance
Factory method used when the underlying data provider is an actual stream. -
getInstance
public static StreamBootstrapper getInstance(String pubId, SystemId sysId, byte[] data, int start, int end) Factory method used when the underlying data provider is a pre-allocated block source, and no stream is used. Additionally the buffer passed is not owned by the bootstrapper or Reader that is created, so it is not to be recycled. -
bootstrapInput
public Reader bootstrapInput(ReaderConfig cfg, boolean mainDoc, int xmlVersion) throws IOException, XMLStreamException - Specified by:
bootstrapInput
in classInputBootstrapper
- Parameters:
xmlVersion
- Optional xml version identifier of the main parsed document (if not bootstrapping the main document). Currently only relevant for checking that XML 1.0 document does not include XML 1.1 external parsed entities. If null, no checks will be done; when bootstrapping parsing of the main document, null should be passed for this argument.- Throws:
IOException
XMLStreamException
-
getInputEncoding
Since this class only gets used when encoding is not explicitly passed, need use the encoding that was auto-detected...- Specified by:
getInputEncoding
in classInputBootstrapper
- Returns:
- Input encoding in use, if it could be determined or was passed by the calling application
-
getInputTotal
public int getInputTotal()- Specified by:
getInputTotal
in classInputBootstrapper
- Returns:
- Total number of characters read from bootstrapped input (stream, reader)
-
getInputColumn
public int getInputColumn()- Specified by:
getInputColumn
in classInputBootstrapper
-
resolveStreamEncoding
Method called to try to figure out physical encoding the underlying input stream uses.- Throws:
IOException
WstxException
-
verifyXmlEncoding
- Returns:
- Normalized encoding name
- Throws:
WstxException
-
ensureLoaded
- Throws:
IOException
-
loadMore
- Throws:
IOException
WstxException
-
pushback
protected void pushback()- Specified by:
pushback
in classInputBootstrapper
-
getNext
- Specified by:
getNext
in classInputBootstrapper
- Throws:
IOException
WstxException
-
getNextAfterWs
- Specified by:
getNextAfterWs
in classInputBootstrapper
- Throws:
IOException
WstxException
-
checkKeyword
- Specified by:
checkKeyword
in classInputBootstrapper
- Returns:
- First character that does not match expected, if any; CHAR_NULL if match succeeded
- Throws:
IOException
WstxException
-
readQuotedValue
- Specified by:
readQuotedValue
in classInputBootstrapper
- Throws:
IOException
WstxException
-
hasXmlDecl
- Throws:
IOException
WstxException
-
getLocation
- Specified by:
getLocation
in classInputBootstrapper
-
nextByte
- Throws:
IOException
WstxException
-
skipSbWs
- Throws:
IOException
WstxException
-
skipSbLF
- Throws:
IOException
WstxException
-
checkSbKeyword
- Returns:
- First character that does not match expected, if any; CHAR_NULL if match succeeded
- Throws:
IOException
WstxException
-
nextMultiByte
- Throws:
IOException
WstxException
-
nextTranslated
- Throws:
IOException
WstxException
-
skipMbWs
- Throws:
IOException
WstxException
-
skipTranslatedWs
- Throws:
IOException
WstxException
-
skipMbLF
- Throws:
IOException
WstxException
-
skipTranslatedLF
- Throws:
IOException
WstxException
-
checkMbKeyword
- Returns:
- First character that does not match expected, if any; CHAR_NULL if match succeeded
- Throws:
IOException
WstxException
-
checkTranslatedKeyword
- Throws:
IOException
WstxException
-
verifyEncoding
- Throws:
WstxException
-
verifyEncoding
- Throws:
WstxException
-
reportWeirdUCS4
- Throws:
IOException
-