Class StreamScanner
- All Implemented Interfaces:
InputConfigFlags
,ParsingErrorMsgs
,InputProblemReporter
- Direct Known Subclasses:
BasicStreamReader
,MinimalDTDReader
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final char
Last (highest) char code of the three, LF, CR and NULLprotected static final char
Character that allows quick check of whether a char can potentially be some kind of markup, WRT input stream processing; has to contain linefeeds,&
,<
and>
(note:>
only matters when quoting text, as part of]]>
)protected static final char
First character in Unicode (ie one with lowest id) that is legal as part of a local name (all valid name chars minus ':').static final int
protected boolean
Flag that indicates whether all escaped chars are accepted in XML 1.0.Cache of internal character entities;protected final boolean
If true, Reader is namespace aware, and should do basic checks (usually enforcing limitations on having colons in names)protected boolean
note: left non-final on purpose: sub-class may need to modify the default value after construction.protected boolean
Flag for whether or not character references should be treated as entitiesprotected final ReaderConfig
Copy of the configuration object passed by the factory.protected int
This is the current depth of the input stack (same as what input element stack would return as its depth).protected EntityDecl
Entity reference stream currently points to.protected String
Local full name for the event, if it has one (note: element events do NOT use this variable; those names are stored in element stack): target for processing instructions.protected String
Input stream encoding, if known (passed in, or determined by auto-detection); null if not.protected String
Character encoding from xml declaration, if any; null if no declaration, or it didn't specify encoding.protected int
XML version as declared by the document; one of constants fromXmlConsts
(likeXmlConsts.XML_V_10
).protected int
Number of times a parsed general entity has been expanded; used for (optionally) limiting number of expansion to guard against denial-of-service attacks like "Billion Laughs".protected XMLResolver
Custom resolver used to handle external entities that are to be expanded by this reader (external param/general entity expander)protected WstxInputSource
Currently active input source; contains link to parent (nesting) input sources, if any.protected int
protected char[]
Temporary buffer used if local name can not be just directly constructed from input buffer (name is on a boundary or such).protected boolean
Flag that indicates whether linefeeds in the input data are to be normalized or not.protected final WstxInputSource
Top-most input source this reader can use; due to input source chaining, this is not necessarily the root of all input; for example, external DTD subset reader's root input still has original document input as its parent.(package private) final SymbolTable
protected int
Column on input row that current token starts; 0-based (although in the end it'll be converted to 1-based)protected int
Input row on which current token starts, 1-basedprotected long
Total number of characters read before start of current token.private static final byte
private static final byte
private static final byte
private static final byte
private static final byte[]
private static final byte[]
private static final int
We will only use validity array for first 256 characters, mostly because after those characters it's easier to do fairly simple block checks.private static final int
Public identifiers only use 7-bit ascii range.Fields inherited from class com.ctc.wstx.io.WstxInputData
CHAR_NULL, CHAR_SPACE, INT_NULL, INT_SPACE, MAX_UNICODE_CHAR, mCurrInputProcessed, mCurrInputRow, mCurrInputRowStart, mInputBuffer, mInputEnd, mInputPtr, mXml11
Fields inherited from interface com.ctc.wstx.cfg.InputConfigFlags
CFG_ALLOW_XML11_ESCAPED_CHARS_IN_XML10, CFG_AUTO_CLOSE_INPUT, CFG_CACHE_DTDS, CFG_CACHE_DTDS_BY_PUBLIC_ID, CFG_COALESCE_TEXT, CFG_INTERN_NAMES, CFG_INTERN_NS_URIS, CFG_JAXP_FEATURE_SECURE_PROCESSING, CFG_LAZY_PARSING, CFG_NAMESPACE_AWARE, CFG_NORMALIZE_LFS, CFG_PRESERVE_LOCATION, CFG_REPLACE_ENTITY_REFS, CFG_REPORT_CDATA, CFG_REPORT_PROLOG_WS, CFG_SUPPORT_DTD, CFG_SUPPORT_DTDPP, CFG_SUPPORT_EXTERNAL_ENTITIES, CFG_TREAT_CHAR_REFS_AS_ENTS, CFG_VALIDATE_AGAINST_DTD, CFG_XMLID_TYPING, CFG_XMLID_UNIQ_CHECKS
Fields inherited from interface com.ctc.wstx.cfg.ParsingErrorMsgs
SUFFIX_EOF_EXP_NAME, SUFFIX_IN_ATTR_VALUE, SUFFIX_IN_CDATA, SUFFIX_IN_CLOSE_ELEMENT, SUFFIX_IN_COMMENT, SUFFIX_IN_DEF_ATTR_VALUE, SUFFIX_IN_DOC, SUFFIX_IN_DTD, SUFFIX_IN_DTD_EXTERNAL, SUFFIX_IN_DTD_INTERNAL, SUFFIX_IN_ELEMENT, SUFFIX_IN_ENTITY_REF, SUFFIX_IN_EPILOG, SUFFIX_IN_NAME, SUFFIX_IN_PROC_INSTR, SUFFIX_IN_PROLOG, SUFFIX_IN_TEXT, SUFFIX_IN_XML_DECL
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
StreamScanner
(WstxInputSource input, ReaderConfig cfg, XMLResolver res) Constructor used when creating a complete new (main-level) reader that does not share its input buffers or state with another reader. -
Method Summary
Modifier and TypeMethodDescriptionprotected void
_reportProblem
(XMLReporter rep, String probType, String msg, Location loc) protected void
_reportProblem
(XMLReporter rep, org.codehaus.stax2.validation.XMLValidationProblem prob) protected void
closeAllInput
(boolean force) protected WstxException
Construct and return aXMLStreamException
to throw as a result of a failed Typed Access operation (but one not caused by a Well-Formedness Constraint or Validation Constraint problem)protected XMLStreamException
constructLimitViolation
(String type, long limit) protected WstxException
protected WstxException
protected boolean
ensureInput
(int minAmount) Method called to make sure current main-level input buffer has at least specified number of characters available consequtively, without having to callloadMore()
.protected final char[]
expandBy50Pct
(char[] buf) private void
expandEntity
(EntityDecl ed, boolean allowExt) note: defined as private for documentation, ie.protected EntityDecl
expandEntity
(String id, boolean allowExt, Object extraArg) Helper method that will try to expand a parsed entity (parameter or generic entity).private EntityDecl
note: only called from the local expandEntity() methodprotected abstract EntityDecl
findEntity
(String id, Object arg) Abstract method for sub-classes to implement, for finding a declared general or parsed entity.protected int
fullyResolveEntity
(boolean allowExt) Method that does full resolution of an entity reference, be it character entity, internal entity or external entity, including updating of input buffers, and depending on whether result is a character entity (or one of 5 pre-defined entities), returns char in question, or null character (code 0) to indicate it had to change input source.final WstxInputSource
Returns current input source this source uses.org.codehaus.stax2.XMLStreamLocation2
protected EntityDecl
getIntEntity
(int ch, char[] originalChars) Returns an entity (possibly from cache) for the argument character using the encoded representation in mInputBuffer[entityStartPos ...protected WstxInputLocation
Method that returns location of the last character returned by this reader; that is, location "one less" than the currently pointed to location.abstract Location
Returns location of last properly parsed token; as per StAX specs, apparently needs to be the end of current event, which is the same as the start of the following event (or EOF if that's next).protected final char[]
getNameBuffer
(int minSize) protected final int
getNext()
protected final int
Method that will skip through zero or more white space characters, and return either the character following white space, or -1 to indicate EOF (end of the outermost input source)/protected final char
getNextChar
(String errorMsg) protected final char
getNextCharAfterWS
(String errorMsg) protected final char
getNextCharFromCurrent
(String errorMsg) Similar togetNextChar(java.lang.String)
, but will not read more characters from parent input source(s) if the current input source doesn't have more content.protected final char
getNextInCurrAfterWS
(String errorMsg) protected final char
getNextInCurrAfterWS
(String errorMsg, char c) protected URL
org.codehaus.stax2.XMLStreamLocation2
protected String
protected abstract void
protected abstract void
This method gets called if a declaration for an entity was not found in entity expanding mode (enabled by default for xml reader, always enabled for dtd reader).protected void
initInputSource
(WstxInputSource newInput, boolean isExt, String entityId) Method called when an entity has been expanded (new input source has been created).protected final int
protected boolean
loadMore()
Method that will try to read one or more characters from currently open input sources; closing input sources if necessary.protected final boolean
protected boolean
protected final boolean
loadMoreFromCurrent
(String errorMsg) protected final void
markLF()
protected final void
markLF
(int inputPtr) protected final String
parseEntityName
(char c) protected String
Method called to read in full name, including unlimited number of namespace separators (':'), for the purpose of displaying name in an error message.protected String
Method that will parse 'full' name token; what full means depends on whether reader is namespace aware or not.protected String
parseFullName
(char c) protected String
parseFullName2
(int start, int hash) protected String
parseLocalName
(char c) Method that will parse name token (roughly equivalent to XML specs; although bit lenier for more efficient handling); either uri prefix, or local name.protected String
parseLocalName2
(int start, int hash) Second part of name token parsing; called when name can continue past input buffer end (so only part was read before calling this method to read the rest).protected final String
parsePublicId
(char quoteChar, String errorMsg) Simple parsing method that parses system ids, which are generally used in entities (from DOCTYPE declaration to internal/external subsets).protected final String
parseSystemId
(char quoteChar, boolean convertLFs, String errorMsg) Simple parsing method that parses system ids, which are generally used in entities (from DOCTYPE declaration to internal/external subsets).protected final void
parseUntil
(TextBuffer tb, char endChar, boolean convertLFs, String errorMsg) protected final int
peekNext()
Similar togetNext()
, but does not advance pointer in input buffer.protected final void
pushback()
Method to push back last character read; can only be called once, that is, no more than one char can be guaranteed to be succesfully returned.private void
reportIllegalChar
(int value) void
reportProblem
(String probType, String format, Object arg, Object arg2) void
private void
void
void
reportValidationProblem
(String msg, int severity) void
reportValidationProblem
(String format, Object arg, Object arg2) void
reportValidationProblem
(Location loc, String msg) void
reportValidationProblem
(org.codehaus.stax2.validation.XMLValidationProblem prob) Note: this is the base implementation used for implementingValidationContext
private int
resolveCharEnt
(StringBuffer originalCharacters) protected int
resolveCharOnlyEntity
(boolean checkStd) Method called to resolve character entities, and only character entities (except that pre-defined char entities -- amp, apos, lt, gt, quote -- MAY be "char entities" in this sense, depending on arguments).protected EntityDecl
Reverse ofresolveCharOnlyEntity(boolean)
; will only resolve entity if it is NOT a character entity (or pre-defined 'generic' entity; amp, apos, lt, gt or quot).protected int
resolveSimpleEntity
(boolean checkStd) Method that tries to resolve a character entity, or (if caller so specifies), a pre-defined internal entity (lt, gt, amp, apos, quot).protected final boolean
skipCRLF
(char c) Method called when a CR has been spotted in input; checks if next char is LF, and if so, skips it.protected int
skipFullName
(char c) Note: does not check for number of colons, amongst other things.protected void
throwFromIOE
(IOException ioe) protected void
throwFromStrE
(XMLStreamException strex) protected void
throwInvalidSpace
(int i) protected WstxException
throwInvalidSpace
(int i, boolean deferErrors) protected void
Method called to report an error, when caller's signature only allows runtime exceptions to be thrown.private void
throwNsColonException
(String name) Method called to throw an exception indicating that a name that should not be namespace-qualified (PI target, entity/notation name) is one, and reader is namespace aware.protected void
protected void
void
throwParseError
(String msg) void
throwParseError
(String format, Object arg, Object arg2) Throws generic parse error with specified message and current parsing location.private void
throwRecursionError
(String entityName) protected void
throwUnexpectedChar
(int i, String msg) protected void
throwUnexpectedEOB
(String msg) Similar tothrowUnexpectedEOF(java.lang.String)
, but only indicates ending of an input block.protected void
throwUnexpectedEOF
(String msg) throwWfcException
(String msg, boolean deferErrors) protected String
tokenTypeDesc
(int type) private final void
validateChar
(int value) Method that will verify that expanded Unicode codepoint is a valid XML content character.protected void
verifyLimit
(String type, long maxValue, long currentValue) Methods inherited from class com.ctc.wstx.io.WstxInputData
copyBufferStateFrom, findIllegalNameChar, findIllegalNmtokenChar, getCharDesc, isNameChar, isNameChar, isNameStartChar, isNameStartChar, isSpaceChar
-
Field Details
-
CHAR_CR_LF_OR_NULL
public static final char CHAR_CR_LF_OR_NULLLast (highest) char code of the three, LF, CR and NULL- See Also:
-
INT_CR_LF_OR_NULL
public static final int INT_CR_LF_OR_NULL- See Also:
-
CHAR_FIRST_PURE_TEXT
protected static final char CHAR_FIRST_PURE_TEXTCharacter that allows quick check of whether a char can potentially be some kind of markup, WRT input stream processing; has to contain linefeeds,&
,<
and>
(note:>
only matters when quoting text, as part of]]>
)- See Also:
-
CHAR_LOWEST_LEGAL_LOCALNAME_CHAR
protected static final char CHAR_LOWEST_LEGAL_LOCALNAME_CHARFirst character in Unicode (ie one with lowest id) that is legal as part of a local name (all valid name chars minus ':'). Used for doing quick check for local name end; usually name ends in a whitespace or equals sign.- See Also:
-
VALID_CHAR_COUNT
private static final int VALID_CHAR_COUNTWe will only use validity array for first 256 characters, mostly because after those characters it's easier to do fairly simple block checks.- See Also:
-
NAME_CHAR_INVALID_B
private static final byte NAME_CHAR_INVALID_B- See Also:
-
NAME_CHAR_ALL_VALID_B
private static final byte NAME_CHAR_ALL_VALID_B- See Also:
-
NAME_CHAR_VALID_NONFIRST_B
private static final byte NAME_CHAR_VALID_NONFIRST_B- See Also:
-
sCharValidity
private static final byte[] sCharValidity -
VALID_PUBID_CHAR_COUNT
private static final int VALID_PUBID_CHAR_COUNTPublic identifiers only use 7-bit ascii range.- See Also:
-
sPubidValidity
private static final byte[] sPubidValidity -
PUBID_CHAR_VALID_B
private static final byte PUBID_CHAR_VALID_B- See Also:
-
mConfig
Copy of the configuration object passed by the factory. Contains immutable settings for this reader (or in case of DTD parsers, reader that uses it) -
mCfgNsEnabled
protected final boolean mCfgNsEnabledIf true, Reader is namespace aware, and should do basic checks (usually enforcing limitations on having colons in names) -
mCfgReplaceEntities
protected boolean mCfgReplaceEntitiesnote: left non-final on purpose: sub-class may need to modify the default value after construction. -
mSymbols
-
mCurrName
Local full name for the event, if it has one (note: element events do NOT use this variable; those names are stored in element stack): target for processing instructions.Currently used for proc. instr. target, and entity name (at least when current entity reference is null).
Note: this variable is generally not cleared, since it comes from a symbol table, ie. this won't be the only reference.
-
mInput
Currently active input source; contains link to parent (nesting) input sources, if any. -
mRootInput
Top-most input source this reader can use; due to input source chaining, this is not necessarily the root of all input; for example, external DTD subset reader's root input still has original document input as its parent. -
mEntityResolver
Custom resolver used to handle external entities that are to be expanded by this reader (external param/general entity expander) -
mCurrDepth
protected int mCurrDepthThis is the current depth of the input stack (same as what input element stack would return as its depth). It is used to enforce input scope constraints for nesting of elements (for xml reader) and dtd declaration (for dtd reader) with regards to input block (entity expansion) boundaries.Basically this value is compared to
mInputTopDepth
, which indicates what was the depth at the point where the currently active input scope/block was started. -
mInputTopDepth
protected int mInputTopDepth -
mEntityExpansionCount
protected int mEntityExpansionCountNumber of times a parsed general entity has been expanded; used for (optionally) limiting number of expansion to guard against denial-of-service attacks like "Billion Laughs".- Since:
- 4.3
-
mNormalizeLFs
protected boolean mNormalizeLFsFlag that indicates whether linefeeds in the input data are to be normalized or not. Xml specs mandate that the line feeds are only normalized when they are from the external entities (main doc, external general/parsed entities), so normalization has to be suppressed when expanding internal general/parsed entities. -
mAllowXml11EscapedCharsInXml10
protected boolean mAllowXml11EscapedCharsInXml10Flag that indicates whether all escaped chars are accepted in XML 1.0.- Since:
- 5.2
-
mNameBuffer
protected char[] mNameBufferTemporary buffer used if local name can not be just directly constructed from input buffer (name is on a boundary or such). -
mTokenInputTotal
protected long mTokenInputTotalTotal number of characters read before start of current token. For big (gigabyte-sized) sizes are possible, needs to be long, unlike pointers and sizes related to in-memory buffers. -
mTokenInputRow
protected int mTokenInputRowInput row on which current token starts, 1-based -
mTokenInputCol
protected int mTokenInputColColumn on input row that current token starts; 0-based (although in the end it'll be converted to 1-based) -
mDocInputEncoding
Input stream encoding, if known (passed in, or determined by auto-detection); null if not. -
mDocXmlEncoding
Character encoding from xml declaration, if any; null if no declaration, or it didn't specify encoding. -
mDocXmlVersion
protected int mDocXmlVersionXML version as declared by the document; one of constants fromXmlConsts
(likeXmlConsts.XML_V_10
). -
mCachedEntities
Cache of internal character entities; -
mCfgTreatCharRefsAsEntities
protected boolean mCfgTreatCharRefsAsEntitiesFlag for whether or not character references should be treated as entities -
mCurrEntity
Entity reference stream currently points to.
-
-
Constructor Details
-
StreamScanner
Constructor used when creating a complete new (main-level) reader that does not share its input buffers or state with another reader.
-
-
Method Details
-
getConfig
- Since:
- 5.2
-
getLastCharLocation
Method that returns location of the last character returned by this reader; that is, location "one less" than the currently pointed to location. -
getSource
- Throws:
IOException
-
getSystemId
-
getLocation
Returns location of last properly parsed token; as per StAX specs, apparently needs to be the end of current event, which is the same as the start of the following event (or EOF if that's next).- Specified by:
getLocation
in interfaceInputProblemReporter
-
getStartLocation
public org.codehaus.stax2.XMLStreamLocation2 getStartLocation() -
getCurrentLocation
public org.codehaus.stax2.XMLStreamLocation2 getCurrentLocation() -
throwWfcException
- Throws:
WstxException
-
throwParseError
- Specified by:
throwParseError
in interfaceInputProblemReporter
- Throws:
XMLStreamException
-
throwParseError
Throws generic parse error with specified message and current parsing location.Note: public access only because core code in other packages needs to access it.
- Specified by:
throwParseError
in interfaceInputProblemReporter
- Throws:
XMLStreamException
-
reportProblem
public void reportProblem(String probType, String format, Object arg, Object arg2) throws XMLStreamException - Throws:
XMLStreamException
-
reportProblem
public void reportProblem(Location loc, String probType, String format, Object arg, Object arg2) throws XMLStreamException - Specified by:
reportProblem
in interfaceInputProblemReporter
- Throws:
XMLStreamException
-
_reportProblem
protected void _reportProblem(XMLReporter rep, String probType, String msg, Location loc) throws XMLStreamException - Throws:
XMLStreamException
-
_reportProblem
protected void _reportProblem(XMLReporter rep, org.codehaus.stax2.validation.XMLValidationProblem prob) throws XMLStreamException - Throws:
XMLStreamException
-
reportValidationProblem
public void reportValidationProblem(org.codehaus.stax2.validation.XMLValidationProblem prob) throws XMLStreamException Note: this is the base implementation used for implementing
ValidationContext
- Specified by:
reportValidationProblem
in interfaceInputProblemReporter
- Throws:
XMLStreamException
-
reportValidationProblem
- Throws:
XMLStreamException
-
reportValidationProblem
- Specified by:
reportValidationProblem
in interfaceInputProblemReporter
- Throws:
XMLStreamException
-
reportValidationProblem
- Throws:
XMLStreamException
-
reportValidationProblem
public void reportValidationProblem(String format, Object arg, Object arg2) throws XMLStreamException - Specified by:
reportValidationProblem
in interfaceInputProblemReporter
- Throws:
XMLStreamException
-
constructWfcException
-
constructFromIOE
Construct and return aXMLStreamException
to throw as a result of a failed Typed Access operation (but one not caused by a Well-Formedness Constraint or Validation Constraint problem) -
constructNullCharException
-
throwUnexpectedChar
- Throws:
WstxException
-
throwNullChar
- Throws:
WstxException
-
throwInvalidSpace
- Throws:
WstxException
-
throwInvalidSpace
- Throws:
WstxException
-
throwUnexpectedEOF
- Throws:
WstxException
-
throwUnexpectedEOB
Similar tothrowUnexpectedEOF(java.lang.String)
, but only indicates ending of an input block. Used when reading a token that can not span input block boundaries (ie. can not continue past end of an entity expansion).- Throws:
WstxException
-
throwFromIOE
- Throws:
WstxException
-
throwFromStrE
- Throws:
WstxException
-
throwLazyError
Method called to report an error, when caller's signature only allows runtime exceptions to be thrown. -
tokenTypeDesc
-
getCurrentInput
Returns current input source this source uses.Note: public only because some implementations are on different package.
-
inputInBuffer
protected final int inputInBuffer() -
getNext
- Throws:
XMLStreamException
-
peekNext
Similar togetNext()
, but does not advance pointer in input buffer.Note: this method only peeks within current input source; it does not close it and check nested input source (if any). This is necessary when checking keywords, since they can never cross input block boundary.
- Throws:
XMLStreamException
-
getNextChar
- Throws:
XMLStreamException
-
getNextCharFromCurrent
Similar togetNextChar(java.lang.String)
, but will not read more characters from parent input source(s) if the current input source doesn't have more content. This is often needed to prevent "runaway" content, such as comments that start in an entity but do not have matching close marker inside entity; XML specification specifically states such markup is not legal.- Throws:
XMLStreamException
-
getNextAfterWS
Method that will skip through zero or more white space characters, and return either the character following white space, or -1 to indicate EOF (end of the outermost input source)/- Throws:
XMLStreamException
-
getNextCharAfterWS
- Throws:
XMLStreamException
-
getNextInCurrAfterWS
- Throws:
XMLStreamException
-
getNextInCurrAfterWS
- Throws:
XMLStreamException
-
skipCRLF
Method called when a CR has been spotted in input; checks if next char is LF, and if so, skips it. Note that next character has to come from the current input source, to qualify; it can never come from another (nested) input source.- Returns:
- True, if passed in char is '\r' and next one is '\n'.
- Throws:
XMLStreamException
-
markLF
protected final void markLF() -
markLF
protected final void markLF(int inputPtr) -
pushback
protected final void pushback()Method to push back last character read; can only be called once, that is, no more than one char can be guaranteed to be succesfully returned. -
initInputSource
protected void initInputSource(WstxInputSource newInput, boolean isExt, String entityId) throws XMLStreamException Method called when an entity has been expanded (new input source has been created). Needs to initialize location information and change active input source.- Parameters:
entityId
- Name of the entity being expanded- Throws:
XMLStreamException
-
loadMore
Method that will try to read one or more characters from currently open input sources; closing input sources if necessary.- Returns:
- true if reading succeeded (or may succeed), false if we reached EOF.
- Throws:
XMLStreamException
-
loadMore
- Throws:
XMLStreamException
-
loadMoreFromCurrent
- Throws:
XMLStreamException
-
loadMoreFromCurrent
- Throws:
XMLStreamException
-
ensureInput
Method called to make sure current main-level input buffer has at least specified number of characters available consequtively, without having to callloadMore()
. It can only be called when input comes from main-level buffer; further, call can shift content in input buffer, so caller has to flush any data still pending. In short, caller has to know exactly what it's doing. :-)Note: method does not check for any other input sources than the current one -- if current source can not fulfill the request, a failure is indicated.
- Returns:
- true if there's now enough data; false if not (EOF)
- Throws:
XMLStreamException
-
closeAllInput
- Throws:
XMLStreamException
-
throwNullParent
- Parameters:
curr
- Input source currently in use
-
resolveSimpleEntity
Method that tries to resolve a character entity, or (if caller so specifies), a pre-defined internal entity (lt, gt, amp, apos, quot). It will succeed iff:- Entity in question is a simple character entity (either one of 5 pre-defined ones, or using decimal/hex notation), AND
- Entity fits completely inside current input buffer.
Note: On entry we are guaranteed there are at least 3 more characters in this buffer; otherwise we shouldn't be called.
- Parameters:
checkStd
- If true, will check pre-defined internal entities (gt, lt, amp, apos, quot); if false, will only check actual character entities.- Returns:
- (Valid) character value, if entity is a character reference, and could be resolved from current input buffer (does not span buffer boundary); null char (code 0) if not (either non-char entity, or spans input buffer boundary).
- Throws:
XMLStreamException
-
resolveCharOnlyEntity
Method called to resolve character entities, and only character entities (except that pre-defined char entities -- amp, apos, lt, gt, quote -- MAY be "char entities" in this sense, depending on arguments). Otherwise it is to return the null char; if so, the input pointer will point to the same point as when method entered (char after ampersand), plus the ampersand itself is guaranteed to be in the input buffer (so caller can just push it back if necessary).Most often this method is called when reader is not to expand non-char entities automatically, but to return them as separate events.
Main complication here is that we need to do 5-char lookahead. This is problematic if chars are on input buffer boundary. This is ok for the root level input buffer, but not for some nested buffers. However, according to XML specs, such split entities are actually illegal... so we can throw an exception in those cases.
- Parameters:
checkStd
- If true, will check pre-defined internal entities (gt, lt, amp, apos, quot) as character entities; if false, will only check actual 'real' character entities.- Returns:
- (Valid) character value, if entity is a character reference, and could be resolved from current input buffer (does not span buffer boundary); null char (code 0) if not (either non-char entity, or spans input buffer boundary).
- Throws:
XMLStreamException
-
resolveNonCharEntity
Reverse ofresolveCharOnlyEntity(boolean)
; will only resolve entity if it is NOT a character entity (or pre-defined 'generic' entity; amp, apos, lt, gt or quot). Only used in cases where entities are to be separately returned unexpanded (in non-entity-replacing mode); which means it's never called from dtd handler.- Throws:
XMLStreamException
-
fullyResolveEntity
Method that does full resolution of an entity reference, be it character entity, internal entity or external entity, including updating of input buffers, and depending on whether result is a character entity (or one of 5 pre-defined entities), returns char in question, or null character (code 0) to indicate it had to change input source.- Parameters:
allowExt
- If true, is allowed to expand external entities (expanding text); if false, is not (expanding attribute value).- Returns:
- Either single-character replacement (which is NOT to be reparsed), or null char (0) to indicate expansion is done via input source.
- Throws:
XMLStreamException
-
getIntEntity
Returns an entity (possibly from cache) for the argument character using the encoded representation in mInputBuffer[entityStartPos ... mInputPtr-1]. -
expandEntity
protected EntityDecl expandEntity(String id, boolean allowExt, Object extraArg) throws XMLStreamException Helper method that will try to expand a parsed entity (parameter or generic entity).note: called by sub-classes (dtd parser), needs to be protected.
- Parameters:
id
- Name of the entity being expandedallowExt
- Whether external entities can be expanded or not; if not, and the entity to expand would be external one, an exception will be thrown- Throws:
XMLStreamException
-
expandEntity
note: defined as private for documentation, ie. it's just called from within this class (not sub-classes), from one specific method (see above)
- Parameters:
ed
- Entity to be expandedallowExt
- Whether external entities are allowed or not.- Throws:
XMLStreamException
-
expandUnresolvedEntity
note: only called from the local expandEntity() method
- Throws:
XMLStreamException
-
findEntity
Abstract method for sub-classes to implement, for finding a declared general or parsed entity.- Parameters:
id
- Identifier of the entity to findarg
- Optional argument passed from caller; needed by DTD reader.- Throws:
XMLStreamException
-
handleUndeclaredEntity
This method gets called if a declaration for an entity was not found in entity expanding mode (enabled by default for xml reader, always enabled for dtd reader).- Throws:
XMLStreamException
-
handleIncompleteEntityProblem
protected abstract void handleIncompleteEntityProblem(WstxInputSource closing) throws XMLStreamException - Throws:
XMLStreamException
-
parseLocalName
Method that will parse name token (roughly equivalent to XML specs; although bit lenier for more efficient handling); either uri prefix, or local name.Much of complexity in this method has to do with the intention to try to avoid any character copies. In this optimal case algorithm would be fairly simple. However, this only works if all data is already in input buffer... if not, copy has to be made halfway through parsing, and that complicates things.
One thing to note is that String returned has been canonicalized and (if necessary) added to symbol table. It can thus be compared against other such (usually id) Strings, with simple equality operator.
- Parameters:
c
- First character of the name; not yet checked for validity- Returns:
- Canonicalized name String (which may have length 0, if EOF or non-name-start char encountered)
- Throws:
XMLStreamException
-
parseLocalName2
Second part of name token parsing; called when name can continue past input buffer end (so only part was read before calling this method to read the rest).Note that this isn't heavily optimized, on assumption it's not called very often.
- Throws:
XMLStreamException
-
parseFullName
Method that will parse 'full' name token; what full means depends on whether reader is namespace aware or not. If it is, full name means local name with no namespace prefix (PI target, entity/notation name); if not, name can contain arbitrary number of colons. Note that element and attribute names are NOT parsed here, so actual namespace prefix separation can be handled properly there.Similar to
parseLocalName(char)
, much of complexity stems from trying to avoid copying name characters from input buffer.Note that returned String will be canonicalized, similar to
parseLocalName(char)
, but without separating prefix/local name.- Returns:
- Canonicalized name String (which may have length 0, if EOF or non-name-start char encountered)
- Throws:
XMLStreamException
-
parseFullName
- Throws:
XMLStreamException
-
parseFullName2
- Throws:
XMLStreamException
-
parseFNameForError
Method called to read in full name, including unlimited number of namespace separators (':'), for the purpose of displaying name in an error message. Won't do any further validations, and parsing is not optimized: main need is just to get more meaningful error messages.- Throws:
XMLStreamException
-
parseEntityName
- Throws:
XMLStreamException
-
skipFullName
Note: does not check for number of colons, amongst other things. Main idea is to skip through what superficially seems like a valid id, nothing more. This is only done when really skipping through something we do not care about at all: not even whether names/ids would be valid (for example, when ignoring internal DTD subset).- Returns:
- Length of skipped name.
- Throws:
XMLStreamException
-
parseSystemId
protected final String parseSystemId(char quoteChar, boolean convertLFs, String errorMsg) throws XMLStreamException Simple parsing method that parses system ids, which are generally used in entities (from DOCTYPE declaration to internal/external subsets).NOTE: returned String is not canonicalized, on assumption that external ids may be longish, and are not shared all that often, as they are generally just used for resolving paths, if anything.
Also note that this method is not heavily optimized, as it's not likely to be a bottleneck for parsing.- Throws:
XMLStreamException
-
parsePublicId
Simple parsing method that parses system ids, which are generally used in entities (from DOCTYPE declaration to internal/external subsets).As per xml specs, the contents are actually normalized.
NOTE: returned String is not canonicalized, on assumption that external ids may be longish, and are not shared all that often, as they are generally just used for resolving paths, if anything.
Also note that this method is not heavily optimized, as it's not likely to be a bottleneck for parsing.- Throws:
XMLStreamException
-
parseUntil
protected final void parseUntil(TextBuffer tb, char endChar, boolean convertLFs, String errorMsg) throws XMLStreamException - Throws:
XMLStreamException
-
resolveCharEnt
- Throws:
XMLStreamException
-
validateChar
Method that will verify that expanded Unicode codepoint is a valid XML content character.- Throws:
XMLStreamException
-
getNameBuffer
protected final char[] getNameBuffer(int minSize) -
expandBy50Pct
protected final char[] expandBy50Pct(char[] buf) -
throwNsColonException
Method called to throw an exception indicating that a name that should not be namespace-qualified (PI target, entity/notation name) is one, and reader is namespace aware.- Throws:
XMLStreamException
-
throwRecursionError
- Throws:
XMLStreamException
-
reportUnicodeOverflow
- Throws:
XMLStreamException
-
reportIllegalChar
- Throws:
XMLStreamException
-
verifyLimit
- Throws:
XMLStreamException
-
constructLimitViolation
protected XMLStreamException constructLimitViolation(String type, long limit) throws XMLStreamException - Throws:
XMLStreamException
-