Interface OOXMLExtractor

    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      org.apache.poi.ooxml.POIXMLDocument getDocument()
      Returns the opened document.
      MetadataExtractor getMetadataExtractor()
      POIXMLTextExtractor.getMetadataTextExtractor() not yet supported for OOXML by POI.
      void getXHTML​(ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context)
      Parses the document into a sequence of XHTML SAX events sent to the given content handler.
    • Method Detail

      • getDocument

        org.apache.poi.ooxml.POIXMLDocument getDocument()
        Returns the opened document.
        See Also:
        POIXMLTextExtractor.getDocument()
      • getMetadataExtractor

        MetadataExtractor getMetadataExtractor()
        POIXMLTextExtractor.getMetadataTextExtractor() not yet supported for OOXML by POI.
      • getXHTML

        void getXHTML​(ContentHandler handler,
                      org.apache.tika.metadata.Metadata metadata,
                      org.apache.tika.parser.ParseContext context)
               throws SAXException,
                      org.apache.xmlbeans.XmlException,
                      IOException,
                      org.apache.tika.exception.TikaException
        Parses the document into a sequence of XHTML SAX events sent to the given content handler.
        Throws:
        SAXException
        org.apache.xmlbeans.XmlException
        IOException
        org.apache.tika.exception.TikaException