Apache Tika provides numerous features, some of them are given below.
Large Number of Document Types Support
Apache Tika can identify over thousand of document types and can extract content and metadata of the document.
Non-Java Program Accessibility
In Tika, two major tools RESTful server and CLI tool allows non-Java programs to access apache Tika functionalities.
All the third party libraries are encapsulated by Tika, within the single parser interface. This feature ease to the user from parser library selection.
Tika is light weight because it uses less memory and resources. It is easily embeddable with Java programs and can also run over mobile devices.
MIME & Language Detection
Tika can detect all the media type listed in MIME standards. It can also identify language therefore can be used for multi language documents.