Class XMLProfiler

  • All Implemented Interfaces:
    Serializable, org.apache.tika.parser.Parser

    public class XMLProfiler
    extends Object
    implements org.apache.tika.parser.Parser

    This parser enables profiling of XML. It captures the root entity as well as entity uris/namespaces and entity local names in parallel arrays.

    This parser is not part of the default set of parsers and must be "turned on" via a tika config:

    <properties> <parsers> <parser class="org.apache.tika.parser.DefaultParser"/> <parser class="org.apache.tika.parser.xml.XMLProfiler"/> </parsers> </properties>

    This was initially designed to profile xmp and xfa in PDFs. Further work would need to be done to extract other types of xml and/or xmp in other file formats. Please open a ticket.

    See Also:
    Serialized Form
    • Field Detail

      • ROOT_ENTITY

        public static org.apache.tika.metadata.Property ROOT_ENTITY
      • ENTITY_URIS

        public static org.apache.tika.metadata.Property ENTITY_URIS
      • ENTITY_LOCAL_NAMES

        public static org.apache.tika.metadata.Property ENTITY_LOCAL_NAMES
    • Constructor Detail

      • XMLProfiler

        public XMLProfiler()
    • Method Detail

      • getSupportedTypes

        public Set<org.apache.tika.mime.MediaType> getSupportedTypes​(org.apache.tika.parser.ParseContext context)
        Specified by:
        getSupportedTypes in interface org.apache.tika.parser.Parser
      • parse

        public void parse​(InputStream stream,
                          ContentHandler handler,
                          org.apache.tika.metadata.Metadata metadata,
                          org.apache.tika.parser.ParseContext context)
                   throws IOException,
                          SAXException,
                          org.apache.tika.exception.TikaException
        Specified by:
        parse in interface org.apache.tika.parser.Parser
        Throws:
        IOException
        SAXException
        org.apache.tika.exception.TikaException