Javatpoint Logo
Javatpoint Logo

Tika Parsing Document to Plain Text

Tika allows us to get extracted content in various formats like text, html or xhtml etc. The ContentHandler class is responsible for returning content. We can use BodyContentHandler also if want to get content of the document's body as plain text.

Lets see an example in which we are getting plain text output from the html file.

Tika Parsing to Plain Text Example

Output:

Following is the our html file.

// index. html

<html>
<head>
<title>Index Page</title>
</head>
<body>
<h2>Hello, Welcome to Javatpoint. </h2>
</body>
</html>

After extracting, it produces the output in plain text.

Hello, Welcome to Javatpoint.





Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA