Class FullDTDReader

All Implemented Interfaces:
InputConfigFlags, ParsingErrorMsgs, InputProblemReporter

public class FullDTDReader extends MinimalDTDReader
Reader that reads in DTD information from internal or external subset.

There are 2 main modes for DTDReader, depending on whether it is parsing internal or external subset. Parsing of internal subset is somewhat simpler, since no dependency checking is needed. For external subset, handling of parameter entities is bit more complicated, as care has to be taken to distinguish between using PEs defined in int. subset, and ones defined in ext. subset itself. This determines cachability of external subsets.

Reader also implements simple stand-alone functionality for flattening DTD files (expanding all references to their eventual textual form); this is sometimes useful when optimizing modularized DTDs (which are more maintainable) into single monolithic DTDs (which in general can be more performant).

  • Field Details

    • INTERN_SHARED_NAMES

      static final boolean INTERN_SHARED_NAMES
      Flag that can be changed to enable or disable interning of shared names; shared names are used for enumerated values to reduce memory usage.
      See Also:
    • ENTITY_EXP_GE

      static final Boolean ENTITY_EXP_GE
    • ENTITY_EXP_PE

      static final Boolean ENTITY_EXP_PE
    • mConfigFlags

      final int mConfigFlags
    • mCfgSupportDTDPP

      final boolean mCfgSupportDTDPP
    • mCfgFullyValidating

      final boolean mCfgFullyValidating
      This flag indicates whether we should build a validating 'real' validator (true, the usual case), or a simpler pseudo-validator that can do all non-validation tasks that are based on DTD info (entity expansion, notation references, default attribute values). Latter is used in non-validating mode.

    • mParamEntities

      HashMap<String,EntityDecl> mParamEntities
      Set of parameter entities defined so far in the currently parsed subset. Note: the first definition sticks, entities can not be redefined.

      Keys are entity name Strings; values are instances of EntityDecl

    • mPredefdPEs

      final HashMap<String,EntityDecl> mPredefdPEs
      Set of parameter entities already defined for the subset being parsed; namely, PEs defined in the internal subset passed when parsing matching external subset. Null when parsing internal subset.
    • mRefdPEs

      Set<String> mRefdPEs
      Set of parameter entities (ids) that have been referenced by this DTD; only maintained for external subsets, and only as long as no pre-defined PE has been referenced.
    • mGeneralEntities

      HashMap<String,EntityDecl> mGeneralEntities
      Set of generic entities defined so far in this subset. As with parameter entities, the first definition sticks.

      Keys are entity name Strings; values are instances of EntityDecl

      Note: this Map only contains entities declared and defined in the subset being parsed; no previously defined values are passed.

    • mPredefdGEs

      final HashMap<String,EntityDecl> mPredefdGEs
      Set of general entities already defined for the subset being parsed; namely, PEs defined in the internal subset passed when parsing matching external subset. Null when parsing internal subset. Such entities are only needed directly for one purpose; to be expanded when reading attribute default value definitions.
    • mRefdGEs

      Set<String> mRefdGEs
      Set of general entities (ids) that have been referenced by this DTD; only maintained for external subsets, and only as long as no pre-defined GEs have been referenced.
    • mUsesPredefdEntities

      boolean mUsesPredefdEntities
      Flag used to keep track of whether current (external) subset has referenced at least one PE that was pre-defined.
    • mNotations

      Set of notations defined so far. Since it's illegal to (try to) redefine notations, there's no specific precedence.

      Keys are entity name Strings; values are instances of NotationDecl objects

    • mPredefdNotations

      final HashMap<String,NotationDeclaration> mPredefdNotations
      Notations already parsed before current subset; that is, notations from the internal subset if we are currently parsing matching external subset.
    • mUsesPredefdNotations

      boolean mUsesPredefdNotations
      Flag used to keep track of whether current (external) subset has referenced at least one notation that was defined in internal subset. If so, can not cache the external subset
    • mNotationForwardRefs

      HashMap<String,Location> mNotationForwardRefs
      Finally, we need to keep track of Notation references that were made prior to declaration. This is needed to ensure that all references can be properly resolved.
    • mSharedNames

      Map used to shared PrefixedName instances, to reduce memory usage of (qualified) element and attribute names
    • mElements

      Contains definition of elements and matching content specifications. Also contains temporary placeholders for elements that are indirectly "created" by ATTLIST declarations that precede actual declaration for the ELEMENT referred to.
    • mSharedEnumValues

      HashMap<String,String> mSharedEnumValues
      Map used for sharing legal enumeration values; used since oftentimes same enumeration values are used with multiple attributes
    • mCurrAttrDefault

      DefaultAttrValue mCurrAttrDefault
      This is the attribute default value that is currently being parsed. Needs to be a global member due to the way entity expansion failures are reported: problems need to be attached to this object, even thought the default value itself will not be passed through.
    • mExpandingPE

      boolean mExpandingPE
      Flag that indicates if the currently expanding (or last expanded) entity is a Parameter Entity or General Entity.
    • mValueBuffer

      TextBuffer mValueBuffer
      Text buffer used for constructing expansion value of the internal entities, and for default attribute values. Lazily constructed when needed, reused.
    • mIncludeCount

      int mIncludeCount
      Nesting count for conditionally included sections; 0 means that we are not inside such a section. Note that condition ignore is handled separately.
    • mCheckForbiddenPEs

      boolean mCheckForbiddenPEs
      This flag is used to catch uses of PEs in the internal subset within declarations (full declarations are ok, but not other types)
    • mCurrDeclaration

      String mCurrDeclaration
      Keyword of the declaration being currently parsed (if any). Can be used for error reporting purposes.
    • mAnyDTDppFeatures

      boolean mAnyDTDppFeatures
      Flag that indicates if any DTD++ features have been encountered (in DTD++-supporting mode).
    • mDefaultNsURI

      String mDefaultNsURI
      Currently active default namespace URI.
    • mNamespaces

      HashMap<String,String> mNamespaces
      Prefix-to-NsURI mappings for this DTD, if any: lazily constructed when needed
    • mFlattenWriter

      DTDWriter mFlattenWriter
    • mEventListener

      final DTDEventListener mEventListener
    • mTextBuffer

      transient TextBuffer mTextBuffer
    • mAccessKey

      final PrefixedName mAccessKey
  • Constructor Details

    • FullDTDReader

      private FullDTDReader(WstxInputSource input, ReaderConfig cfg, boolean constructFully, int xmlVersion)
      Constructor used for reading/skipping internal subset.
    • FullDTDReader

      private FullDTDReader(WstxInputSource input, ReaderConfig cfg, DTDSubset intSubset, boolean constructFully, int xmlVersion)
      Constructor used for reading external subset.
    • FullDTDReader

      private FullDTDReader(WstxInputSource input, ReaderConfig cfg, boolean isExt, DTDSubset intSubset, boolean constructFully, int xmlVersion)
      Common initialization part of int/ext subset constructors.
  • Method Details

    • readInternalSubset

      public static DTDSubset readInternalSubset(WstxInputData srcData, WstxInputSource input, ReaderConfig cfg, boolean constructFully, int xmlVersion) throws XMLStreamException
      Method called to read in the internal subset definition.
      Throws:
      XMLStreamException
    • readExternalSubset

      public static DTDSubset readExternalSubset(WstxInputSource src, ReaderConfig cfg, DTDSubset intSubset, boolean constructFully, int xmlVersion) throws XMLStreamException
      Method called to read in the external subset definition.
      Throws:
      XMLStreamException
    • flattenExternalSubset

      public static DTDSubset flattenExternalSubset(WstxInputSource src, Writer flattenWriter, boolean inclComments, boolean inclConditionals, boolean inclPEs) throws IOException, XMLStreamException
      Method that will parse, process and output contents of an external DTD subset. It will do processing similar to readExternalSubset(com.ctc.wstx.io.WstxInputSource, com.ctc.wstx.api.ReaderConfig, com.ctc.wstx.dtd.DTDSubset, boolean, int), but additionally will copy its processed ("flattened") input to specified writer.
      Parameters:
      src - Input source used to read the main external subset
      flattenWriter - Writer to output processed DTD content to
      inclComments - If true, will pass comments to the writer; if false, will strip comments out
      inclConditionals - If true, will include conditional block markers, as well as intervening content; if false, will strip out both markers and ignorable sections.
      inclPEs - If true, will output parameter entity declarations; if false will parse and use them, but not output.
      Throws:
      IOException
      XMLStreamException
    • getTextBuffer

      private TextBuffer getTextBuffer()
    • setFlattenWriter

      public void setFlattenWriter(Writer w, boolean inclComments, boolean inclConditionals, boolean inclPEs)
      Method that will set specified Writer as the 'flattening writer'; writer used to output flattened version of DTD read in. This is similar to running a C-preprocessor on C-sources, except that defining writer will not prevent normal parsing of DTD itself.
    • flushFlattenWriter

      private void flushFlattenWriter() throws XMLStreamException
      Throws:
      XMLStreamException
    • findEntity

      public EntityDecl findEntity(String entName)
      Method that may need to be called by attribute default value validation code, during parsing....

      Note: see base class for some additional remarks about this method.

      Overrides:
      findEntity in class MinimalDTDReader
    • parseDTD

      protected DTDSubset parseDTD() throws XMLStreamException
      Throws:
      XMLStreamException
    • parseDirective

      protected void parseDirective() throws XMLStreamException
      Throws:
      XMLStreamException
    • parseDirectiveFlattened

      protected void parseDirectiveFlattened() throws XMLStreamException
      Method similar to parseDirective(), but one that takes care to properly output dtd contents using com.ctc.wstx.dtd.DTDWriter as necessary. Separated to simplify both methods; otherwise would end up with 'if (... flatten...) ... else ...' spaghetti code.
      Throws:
      XMLStreamException
    • initInputSource

      protected void initInputSource(WstxInputSource newInput, boolean isExt, String entityId) throws XMLStreamException
      Description copied from class: StreamScanner
      Method called when an entity has been expanded (new input source has been created). Needs to initialize location information and change active input source.
      Overrides:
      initInputSource in class StreamScanner
      Parameters:
      entityId - Name of the entity being expanded
      Throws:
      XMLStreamException
    • loadMore

      protected boolean loadMore() throws XMLStreamException
      Need to override this method, to check couple of things: first, that nested input sources are balanced, when expanding parameter entities inside entity value definitions (as per XML specs), and secondly, to handle (optional) flattening output.
      Overrides:
      loadMore in class StreamScanner
      Returns:
      true if reading succeeded (or may succeed), false if we reached EOF.
      Throws:
      XMLStreamException
    • loadMoreFromCurrent

      protected boolean loadMoreFromCurrent() throws XMLStreamException
      Overrides:
      loadMoreFromCurrent in class StreamScanner
      Throws:
      XMLStreamException
    • ensureInput

      protected boolean ensureInput(int minAmount) throws XMLStreamException
      Description copied from class: StreamScanner
      Method called to make sure current main-level input buffer has at least specified number of characters available consequtively, without having to call StreamScanner.loadMore(). It can only be called when input comes from main-level buffer; further, call can shift content in input buffer, so caller has to flush any data still pending. In short, caller has to know exactly what it's doing. :-)

      Note: method does not check for any other input sources than the current one -- if current source can not fulfill the request, a failure is indicated.

      Overrides:
      ensureInput in class StreamScanner
      Returns:
      true if there's now enough data; false if not (EOF)
      Throws:
      XMLStreamException
    • loadMoreScoped

      private void loadMoreScoped(WstxInputSource currScope, String entityName, Location loc) throws XMLStreamException
      Throws:
      XMLStreamException
    • dtdNextIfAvailable

      private char dtdNextIfAvailable() throws XMLStreamException
      Returns:
      Next character from the current input block, if any left; NULL if end of block (entity expansion)
      Throws:
      XMLStreamException
    • getNextExpanded

      private char getNextExpanded() throws XMLStreamException
      Method that will get next character, and either return it as is (for normal chars), or expand parameter entity that starts with next character (which has to be '%').
      Throws:
      XMLStreamException
    • skipDtdWs

      private char skipDtdWs(boolean handlePEs) throws XMLStreamException
      Throws:
      XMLStreamException
    • skipObligatoryDtdWs

      private char skipObligatoryDtdWs() throws XMLStreamException
      Note: Apparently a parameter entity expansion does also count as white space (that is, PEs outside of quoted text are considered to be separated by white spaces on both sides). Fortunately this can be handled by 2 little hacks: both a start of a PE, and an end of input block (== end of PE expansion) count as succesful spaces.
      Returns:
      Character following the obligatory boundary (white space or PE start/end)
      Throws:
      XMLStreamException
    • expandPE

      private void expandPE() throws XMLStreamException
      Method called to handle expansion of parameter entities. When called, '%' character has been encountered as a reference indicator, and now we should get parameter entity name.
      Throws:
      XMLStreamException
    • checkDTDKeyword

      protected String checkDTDKeyword(String exp) throws XMLStreamException
      Method called to verify whether input has specified keyword; if it has, returns null and points to char after the keyword; if not, returns whatever constitutes a keyword matched, for error reporting purposes.
      Throws:
      XMLStreamException
    • readDTDKeyword

      protected String readDTDKeyword(String prefix) throws XMLStreamException
      Method called usually to indicate an error condition; will read rest of specified keyword (including characters that can be part of XML identifiers), append that to passed prefix (which is optional), and return resulting String.
      Parameters:
      prefix - Part of keyword already read in.
      Throws:
      XMLStreamException
    • checkPublicSystemKeyword

      private boolean checkPublicSystemKeyword(char c) throws XMLStreamException
      Returns:
      True, if input contains 'PUBLIC' keyword; false if it contains 'SYSTEM'; otherwise throws an exception.
      Throws:
      XMLStreamException
    • readDTDName

      private String readDTDName(char c) throws XMLStreamException
      Throws:
      XMLStreamException
    • readDTDLocalName

      private String readDTDLocalName(char c, boolean checkChar) throws XMLStreamException
      Throws:
      XMLStreamException
    • readDTDNmtoken

      private String readDTDNmtoken(char c) throws XMLStreamException
      Similar to readDTDName(char), except that the rules are bit looser, ie. there are no additional restrictions for the first char
      Throws:
      XMLStreamException
    • readDTDQName

      private PrefixedName readDTDQName(char firstChar) throws XMLStreamException
      Method that will read an element or attribute name from DTD; depending on namespace mode, it can have prefix as well.

      Note: returned PrefixedName instances are canonicalized so that all instances read during parsing of a single DTD subset so that identity comparison can be used instead of calling equals() method (but only within a single subset!). This also reduces memory usage to some extent.

      Throws:
      XMLStreamException
    • readArity

      private char readArity() throws XMLStreamException
      Throws:
      XMLStreamException
    • parseEntityValue

      private char[] parseEntityValue(String id, Location loc, char quoteChar) throws XMLStreamException
      Method that reads and pre-processes replacement text for an internal entity (parameter or generic).
      Throws:
      XMLStreamException
    • parseAttrDefaultValue

      private void parseAttrDefaultValue(DefaultAttrValue defVal, char quoteChar, PrefixedName attrName, Location loc, boolean gotFixed) throws XMLStreamException
      This method is similar to parseEntityValue(java.lang.String, javax.xml.stream.Location, char) in some ways, but has some notable differences, due to the way XML specs define differences. Main differences are that parameter entities are not allowed (or rather, recognized as entities), and that general entities need to be verified, but NOT expanded right away. Whether forward references are allowed or not is an open question right now.
      Throws:
      XMLStreamException
    • readPI

      protected void readPI() throws XMLStreamException
      Method similar to MinimalDTDReader.skipPI(), but one that does basic well-formedness checks.
      Throws:
      XMLStreamException
    • readComment

      protected void readComment(DTDEventListener l) throws XMLStreamException
      Method similar to MinimalDTDReader.skipComment(), but that has to collect contents, to be reported for a SAX handler.
      Throws:
      XMLStreamException
    • checkInclusion

      private void checkInclusion() throws XMLStreamException
      Throws:
      XMLStreamException
    • handleIncluded

      private void handleIncluded() throws XMLStreamException
      Throws:
      XMLStreamException
    • handleIgnored

      private void handleIgnored() throws XMLStreamException
      Throws:
      XMLStreamException
    • _reportUndefinedNotationRefs

      private void _reportUndefinedNotationRefs() throws XMLStreamException
      Throws:
      XMLStreamException
    • _reportBadDirective

      private void _reportBadDirective(String dir) throws XMLStreamException
      Throws:
      XMLStreamException
    • _reportVCViolation

      private void _reportVCViolation(String msg) throws XMLStreamException
      Throws:
      XMLStreamException
    • _reportWFCViolation

      private void _reportWFCViolation(String msg) throws XMLStreamException
      Throws:
      XMLStreamException
    • _reportWFCViolation

      private void _reportWFCViolation(String format, Object arg) throws XMLStreamException
      Throws:
      XMLStreamException
    • throwDTDElemError

      private void throwDTDElemError(String msg, Object elem) throws XMLStreamException
      Throws:
      XMLStreamException
    • throwDTDAttrError

      private void throwDTDAttrError(String msg, DTDElement elem, PrefixedName attrName) throws XMLStreamException
      Throws:
      XMLStreamException
    • throwDTDUnexpectedChar

      private void throwDTDUnexpectedChar(int i, String extraMsg) throws XMLStreamException
      Throws:
      XMLStreamException
    • throwForbiddenPE

      private void throwForbiddenPE() throws XMLStreamException
      Throws:
      XMLStreamException
    • elemDesc

      private String elemDesc(Object elem)
    • attrDesc

      private String attrDesc(Object elem, PrefixedName attrName)
    • entityDesc

      private String entityDesc(WstxInputSource input)
    • handleDeclaration

      private void handleDeclaration(char c) throws XMLStreamException

      Note: c is known to be a letter (from 'A' to 'Z') at this poit.

      Throws:
      XMLStreamException
    • handleSuppressedDeclaration

      private void handleSuppressedDeclaration() throws XMLStreamException
      Specialized method that handles potentially suppressable entity declaration. Specifically: at this point it is known that first letter is 'E', that we are outputting flattened DTD info, and that parameter entity declarations are to be suppressed. Furthermore, flatten output is still being disabled, and needs to be enabled by the method at some point.
      Throws:
      XMLStreamException
    • handleAttlistDecl

      private void handleAttlistDecl() throws XMLStreamException
      note: when this method is called, the keyword itself has been succesfully parsed.
      Throws:
      XMLStreamException
    • handleElementDecl

      private void handleElementDecl() throws XMLStreamException
      Throws:
      XMLStreamException
    • handleEntityDecl

      private void handleEntityDecl(boolean suppressPEDecl) throws XMLStreamException
      This method is tricky to implement, since it can contain parameter entities in multiple combinations... and yet declare one as well.
      Parameters:
      suppressPEDecl - If true, will need to take of enabling/disabling of flattened output.
      Throws:
      XMLStreamException
    • handleNotationDecl

      private void handleNotationDecl() throws XMLStreamException
      Method called to handle invalid input: '<!'NOTATION ... > declaration.
      Throws:
      XMLStreamException
    • handleTargetNsDecl

      private void handleTargetNsDecl() throws XMLStreamException
      Method called to handle invalid input: '<!'TARGETNS ... > declaration (the only new declaration type for DTD++)

      Note: only valid for DTD++, in 'plain DTD' mode shouldn't get called.

      Throws:
      XMLStreamException
    • handleAttrDecl

      private void handleAttrDecl(DTDElement elem, char c, int index, Location loc) throws XMLStreamException
      Parameters:
      elem - Element that contains this attribute
      c - First character of what should be the attribute name
      index - Sequential index number of this attribute as children of the element; used for creating bit masks later on.
      loc - Location of the element name in attribute list declaration
      Throws:
      XMLStreamException
    • parseEnumerated

      private WordResolver parseEnumerated(DTDElement elem, PrefixedName attrName, boolean isNotation) throws XMLStreamException
      Parsing method that reads a list of one or more space-separated tokens (nmtoken or name, depending on 'isNotation' argument)
      Throws:
      XMLStreamException
    • readNotationEntry

      private String readNotationEntry(char c, PrefixedName attrName, Location refLoc) throws XMLStreamException
      Method called to read a notation reference entry; done both for attributes of type NOTATION, and for external unparsed entities that refer to a notation. In both cases, notation referenced needs to have been defined earlier; but only if we are building a fully validating DTD subset object (there is the alternative of a minimal DTD in DTD-aware mode, which does no validation but allows attribute defaulting and normalization, as well as access to entity and notation declarations).
      Parameters:
      attrName - Name of attribute in declaration that refers to this entity
      refLoc - Starting location of the DTD component that contains the reference
      Throws:
      XMLStreamException
    • readEnumEntry

      private String readEnumEntry(char c, HashMap<String,String> sharedEnums) throws XMLStreamException
      Throws:
      XMLStreamException
    • readMixedSpec

      private StructValidator readMixedSpec(PrefixedName elemName, boolean construct) throws XMLStreamException
      Method called to parse what seems like a mixed content specification.
      Parameters:
      construct - If true, will build full object for validating content within mixed content model; if false, will just parse and discard information (done in non-validating DTD-supporting mode)
      Throws:
      XMLStreamException
    • readContentSpec

      private ContentSpec readContentSpec(PrefixedName elemName, boolean mainLevel, boolean construct) throws XMLStreamException
      Parameters:
      mainLevel - Whether this is the main-level content specification or nested
      Throws:
      XMLStreamException
    • combineArities

      private static char combineArities(char arity1, char arity2)
    • handleExternalEntityDecl

      private EntityDecl handleExternalEntityDecl(WstxInputSource inputSource, boolean isParam, String id, char c, Location evtLoc) throws XMLStreamException
      Method that handles rest of external entity declaration, after it's been figured out entity is not internal (does not continue with a quote).
      Parameters:
      inputSource - Input source for the start of the declaration. Needed for resolving relative system references, if any.
      isParam - True if this a parameter entity declaration; false if general entity declaration
      evtLoc - Location where entity declaration directive started; needed when construction event Objects for declarations.
      Throws:
      XMLStreamException
    • getElementMap

      private LinkedHashMap<PrefixedName,DTDElement> getElementMap()
    • findSharedName

      private PrefixedName findSharedName(String prefix, String localName)
      Method used to 'intern()' qualified names; main benefit is reduced memory usage as the name objects are shared. May also slightly speed up Map access, as more often identity comparisons catch matches.

      Note: it is assumed at this point that access is only from a single thread, and non-recursive -- generally valid assumption as readers are not shared. Restriction is needed since the method is not re-entrant: it uses mAccessKey during the method call.

    • findEntity

      protected EntityDecl findEntity(String id, Object arg)
      Description copied from class: StreamScanner
      Abstract method for sub-classes to implement, for finding a declared general or parsed entity.
      Overrides:
      findEntity in class MinimalDTDReader
      Parameters:
      id - Identifier of the entity to find
      arg - If Boolean.TRUE, we are expanding a general entity
    • handleUndeclaredEntity

      protected void handleUndeclaredEntity(String id) throws XMLStreamException
      Undeclared parameter entity is a VC, not WFC...
      Overrides:
      handleUndeclaredEntity in class MinimalDTDReader
      Throws:
      XMLStreamException
    • handleIncompleteEntityProblem

      protected void handleIncompleteEntityProblem(WstxInputSource closing) throws XMLStreamException
      Handling of PE matching problems is actually intricate; one type will be a WFC ("PE Between Declarations", which refers to PEs that start from outside declarations), and another just a VC ("Proper Declaration/PE Nesting", when PE is contained within declaration)
      Overrides:
      handleIncompleteEntityProblem in class MinimalDTDReader
      Throws:
      XMLStreamException
    • handleGreedyEntityProblem

      protected void handleGreedyEntityProblem(WstxInputSource input) throws XMLStreamException
      Throws:
      XMLStreamException
    • checkXmlSpaceAttr

      protected void checkXmlSpaceAttr(int type, WordResolver enumValues) throws XMLStreamException
      Throws:
      XMLStreamException
    • checkXmlIdAttr

      protected void checkXmlIdAttr(int type) throws XMLStreamException
      Throws:
      XMLStreamException
    • _reportWarning

      private void _reportWarning(XMLReporter rep, String probType, String msg, Location loc) throws XMLStreamException
      Throws:
      XMLStreamException