Tika Parsing Document to Plain Text
Tika allows us to get extracted content in various formats like text, html or xhtml etc. The ContentHandler class is responsible for returning content. We can use BodyContentHandler also if want to get content of the document's body as plain text.
Lets see an example in which we are getting plain text output from the html file.
Tika Parsing to Plain Text Example
Following is the our html file.
// index. html
<html> <head> <title>Index Page</title> </head> <body> <h2>Hello, Welcome to Javatpoint. </h2> </body> </html>
After extracting, it produces the output in plain text.
Hello, Welcome to Javatpoint.