Class EMFParser

  • All Implemented Interfaces:
    Serializable, org.apache.tika.parser.Parser

    public class EMFParser
    extends Object
    implements org.apache.tika.parser.Parser
    Extracts files embedded in EMF and offers a very rough capability to extract text if there is text stored in the EMF.

    To improve text extraction, we'd have to implement quite a bit more at the POI level. We'd want to track changes in font and use that information for identifying character sets, inserting spaces and new lines.

    See Also:
    Serialized Form
    • Field Detail

      • EMF_ICON_ONLY

        public static org.apache.tika.metadata.Property EMF_ICON_ONLY
      • EMF_ICON_STRING

        public static org.apache.tika.metadata.Property EMF_ICON_STRING
    • Constructor Detail

      • EMFParser

        public EMFParser()
    • Method Detail

      • getSupportedTypes

        public Set<org.apache.tika.mime.MediaType> getSupportedTypes​(org.apache.tika.parser.ParseContext context)
        Specified by:
        getSupportedTypes in interface org.apache.tika.parser.Parser
      • parse

        public void parse​(InputStream stream,
                          ContentHandler handler,
                          org.apache.tika.metadata.Metadata metadata,
                          org.apache.tika.parser.ParseContext context)
                   throws IOException,
                          SAXException,
                          org.apache.tika.exception.TikaException
        Specified by:
        parse in interface org.apache.tika.parser.Parser
        Throws:
        IOException
        SAXException
        org.apache.tika.exception.TikaException