Tika Parsing Document to XHTMLTika uses ToXMLContentHandler class to get output in XHTML format. It returns XHTML content of the whole document as a string. This class contains the following constructors and methods. Tika ToXMLContentHandler ConstructorsFollowing are the constructors of ToXMLContentHandler class.
Tika ToXMLContentHandler MethodsFollowing are the methods of ToXMLContentHandler class.
Tika Parsing Document to XHTML ExampleThis example produce the output in XHTML format while the input is in text format. Output: Following is the content of hello.txt file. Hello Welcome to Javatpoint After extraction, it produces the output in XHTML format. See the below. <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" /> <meta name="X-Parsed-By" content="org.apache.tika.parser.txt.TXTParser" /> <meta name="Content-Encoding" content="ISO-8859-1" /> <meta name="Content-Type" content="text/plain; charset=ISO-8859-1" /> <title></title> </head> <body><p>Hello Welcome to Javatpoint</p> </body></html>
Next TopicTika Extracting HTML File
|