Tika Parsing Document to Plain TextTika allows us to get extracted content in various formats like text, html or xhtml etc. The ContentHandler class is responsible for returning content. We can use BodyContentHandler also if want to get content of the document's body as plain text. Lets see an example in which we are getting plain text output from the html file. Tika Parsing to Plain Text ExampleOutput: Following is the our html file. // index. html <html> <head> <title>Index Page</title> </head> <body> <h2>Hello, Welcome to Javatpoint. </h2> </body> </html> After extracting, it produces the output in plain text. Hello, Welcome to Javatpoint.
Next TopicTika Parsing Document to XHTML
|