In Tika, document parsing can be done either using Tika facade or using Auto-Detect Parser. Both are used to parse document without specific parser.
Apache Tika provides a facade class for accessing Tika functionality. This class provides methods to implement parsing and detection operations.
Following are the constructors of Tika Facade class.
Constructor |
Description |
Tika() |
It is used to create a Tika facade using the default configuration. |
Tika(Detector detector) |
It is used to create a Tika facade using the given detector instance. |
Tika(Detector detector, Parser parser) |
It is used to create a Tika facade using the given detector and parser instances. |
Tika(Detector detector, Parser parser, Translator translator) |
It is used to create a Tika facade using the given detector, parser, and translator instances. |
Tika(TikaConfig config) |
It is used to create a Tika facade using the given configuration. |
Following are the methods of Tika Facade class.
Method |
Description |
public String detect(byte[] prefix) |
It detects the media type of the given document. |
public String detect(Path path) throws IOException |
It detects the media type of the file at the given path. |
public String detect(File file) throws IOException |
It detects the media type of the given file. |
public String detect(URL url) throws IOException |
It detects the media type of the resource at the given URL. |
public String detect(String name) |
It detects the media type of a document with the given file name. |
public String translate(String text, String sourceLanguage, String targetLanguage) |
It translates the given text String to and from the given languages. |
public String translate(String text, String targetLanguage) |
It translates the given text String to the given language. |
public Reader parse(InputStream stream, Metadata metadata) throws IOException |
It parses the given document and returns the extracted text content. |
public Reader parse(InputStream stream) throws IOException |
It parses the given document and returns the extracted text content. |
public Reader parse(Path path, Metadata metadata) throws IOException |
It parses the file at the given path and returns the extracted text content. |
public String parseToString(InputStream stream, Metadata metadata) throws IOException, TikaException |
It parses the given document and returns the extracted text content. |
public int getMaxStringLength() |
Returns the maximum length of strings returned by the parseToString methods. |
We are extracting content from text file using Tika facade.