Class XmlDetagger

All Implemented Interfaces:
AnalysisComponent

public class XmlDetagger extends CasAnnotator_ImplBase
A multi-sofa annotator that does XML detagging. Reads XML data from the input Sofa (named "xmlDocument"); this data can be stored in the CAS as a string or array, or it can be a URI to a remote file. The XML is parsed using the JVM's default parser, and the plain-text content is written to a new sofa called "plainTextDocument".
  • Field Details

    • PARAM_TEXT_TAG

      public static final String PARAM_TEXT_TAG
      Name of optional configuration parameter that contains the name of an XML tag that appears in the input file. Only text that falls within this XML tag will be considered part of the "document" that it is added to the CAS by this CAS Initializer. If not specified, the entire file will be considered the document.
      See Also:
    • parserFactory

      private SAXParserFactory parserFactory
    • sourceDocInfoType

      private Type sourceDocInfoType
    • mXmlTagContainingText

      private String mXmlTagContainingText
  • Constructor Details

    • XmlDetagger

      public XmlDetagger()
  • Method Details

    • initialize

      public void initialize(UimaContext aContext) throws ResourceInitializationException
      Description copied from interface: AnalysisComponent
      Performs any startup tasks required by this component. The framework calls this method only once, just after the AnalysisComponent has been instantiated.

      The framework supplies this AnalysisComponent with a reference to the UimaContext that it will use, for example to access configuration settings or resources. This AnalysisComponent should store a reference to its the UimaContext for later use.

      Specified by:
      initialize in interface AnalysisComponent
      Overrides:
      initialize in class AnalysisComponent_ImplBase
      Parameters:
      aContext - Provides access to services and resources managed by the framework. This includes configuration parameters, logging, and access to external resources.
      Throws:
      ResourceInitializationException - if this AnalysisComponent cannot initialize successfully.
    • typeSystemInit

      public void typeSystemInit(TypeSystem aTypeSystem) throws AnalysisEngineProcessException
      Description copied from class: CasAnnotator_ImplBase
      Informs this annotator that the CAS TypeSystem has changed. The Analysis Engine calls this from PrimitiveAnalysisEngine_impl which-calls CasAnnotator_ImplBase.process which-calls checkTypeSystemChange

      In this method, the Annotator should use the TypeSystem to resolve the names of Type and Features to the actual Type and Feature objects, which can then be used during processing.

      Overrides:
      typeSystemInit in class CasAnnotator_ImplBase
      Parameters:
      aTypeSystem - the new type system to use as input to your initialization
      Throws:
      AnalysisEngineProcessException - if the provided type system is missing types or features required by this annotator
    • process

      public void process(CAS aCAS) throws AnalysisEngineProcessException
      Description copied from class: CasAnnotator_ImplBase
      Inputs a CAS to the AnalysisComponent. This method should be overriden by subclasses to perform analysis of the CAS.
      Specified by:
      process in class CasAnnotator_ImplBase
      Parameters:
      aCAS - A CAS that this AnalysisComponent should process.
      Throws:
      AnalysisEngineProcessException - if a problem occurs during processing
    • getDescription

      public static AnalysisEngineDescription getDescription() throws InvalidXMLException
      Parses and returns the descriptor for this Analysis Gnein. The descriptor is stored in the uima-core.jar file and located using the ClassLoader.
      Returns:
      an object containing all of the information parsed from the descriptor.
      Throws:
      InvalidXMLException - if the descriptor is invalid or missing
    • getDescriptorURL

      public static URL getDescriptorURL()