Javatpoint Logo
Javatpoint Logo

Tika Parsing Document to Plain Text

Tika allows us to get extracted content in various formats like text, html or xhtml etc. The ContentHandler class is responsible for returning content. We can use BodyContentHandler also if want to get content of the document's body as plain text.

Lets see an example in which we are getting plain text output from the html file.

Tika Parsing to Plain Text Example

Output:

Following is the our html file.

// index. html

<html>
<head>
<title>Index Page</title>
</head>
<body>
<h2>Hello, Welcome to Javatpoint. </h2>
</body>
</html>

After extracting, it produces the output in plain text.

Hello, Welcome to Javatpoint.





Please Share

facebook twitter google plus pinterest

Learn Latest Tutorials


B.Tech / MCA