Python Newspaper Module
Most of us are often not interested in reading the complete newspaper or even a complete article. In such cases, we only want to know about keywords, titles, or many such smaller things of the article so that we don't have to spend so much reading a complete article. This also becomes useful when we only want to read selected articles, but we don't know how we can select useful articles for us. We all must be aware of what web scraping is and how it works. We also know that how much web scrapping is important and how it is helpful for us to extract all the useful information from a source website. We can also perform this operation on the newspaper websites too from where we can take a link of an article and scrap and curate useful information from that article. It is possible for us to perform all this by using a Python program, and for this task, Python provides us with a very useful module, i.e., Newspaper Module. We are going to learn about this Newspaper Module of Python in this tutorial, and we will learn how we can use this module to perform newspaper scrapping and curation using a Python program.
Newspaper Module in Python
Newspaper in a Python Module, which is basically designed to curate useful information from a newspaper article. Therefore, we can use Python's Newspaper Module to perform scrapping and curation from an article by problem its web link in the Python program. We can retrieve all the useful information from an article, such as title, keywords, etc., by using the functions of the Newspaper Module. The newspaper Module of Python uses advanced algorithms with web scrapping features so that all the useful text can be extracted from a newspaper website. The Newspaper Module works very amazingly on the online newspaper websites we generally use in our daily life.
Note: We should note that the Newspaper Module performs a web scrapping process on the online newspaper websites. That's why, if we make multiple requests simultaneously from a website, it may lead to blocking from that website. Therefore, we have to use this module accordingly whenever we actually need to use it.
Newspaper Module: Installation
Newspaper Module is not a built-in module in Python, and therefore, we have to install this module first in our system, and only after that can we curate useful information from an article's web link. We can install this Newspaper Module from multiple sources and by using multiple methods, but the method we suggest is using the pip installer. Through pip installer, we can install the Newspaper Module very easily by using the following command in the command prompt terminal:
Once we write the command given above in the terminal shell of our device, we should press enter key to start the installation process, and then we have to wait for a while for the completion of the installation process. Once the installation process of this Newspaper Module is completed, it will show us the following success installation message window in the terminal shell:
We can see, Newspaper Module of Python is successfully installed in our system, and now, we can use functions from Newspaper Module by importing it into a Python program to perform newspaper scrapping.
Newspaper Module: Languages Supported
The Newspaper Module of Python supports it, and that's why it becomes even more popular because one can scrap news articles from their choice of language. Following languages are supported in the Newspaper Module, and their input codes are mentioned with them:
We have to provide the input code of a language while we create an instance for an article's web link in the program. The language code we provide in the program will help the newspaper module to execute and use its specific set of algorithms for a particular language to scrape and curate from the article.
Newspaper Module: Implementation
We have installed the newspaper module in our system, and now we all want to perform its implementation so that we can understand how this module works. Implementation of the newspaper module will also help us to learn how to curate multiple keywords and useful information from the article. But, before we use this newspaper module will in a Python program, we should note that we have to create an instance first for the article link. The instance for the article we create will fetch all the information from the article by using functions of the newspaper module. Therefore, first, we should learn about the syntax of the instance of the article and what parameters are used in it.
Syntax for Creating an Instance:
Following syntax, we have to use in the program to create an instance for the article:
As we can see in the syntax written above, we have used the following two parameters:
We have now learned the syntax for creating an article's instance, and now we can move forward with the implementation part of the newspaper module. We will use the following example program to understand the implementation of this Newspaper Module.
NLTK Module: The nltk module is also very important while performing the newspaper scrapping, and we have to use this module with the newspaper module in order to perform the newspaper scrapping on an article successfully. The nltk module is used to perform NLP on the article's link, and without performing NLP, we cannot curate useful information from the article. Therefore, while using the newspaper module in the example, we also have to use the nltk module with it and download the nltk data using the download('punkt') function in the program. We should also make sure that the ntlk module is installed in our system, and if it is not installed in our system, we can use the following command to install it:
After completing the installation process of the ntlk module, we can move ahead with the implementation part of the Newspaper Module and use it as an example program to perform newspaper scrapping.
Example 1: Look at the following Python program where we have used an article of TOI to perform newspaper scrapping with Newspaper Module:
We have successfully performed scrapping from the piece of article's link given in the code!
We have first imported the article from the newspaper module and nltk module in the program so that we can use the functions of both these modules to perform newspaper scrapping. After that, we gave an article's link (a TOI's latest article) inside the URL variable. Then, we initialized the instance of the article by using the article() function and gave initialized URL as an argument in it. Also, we have provided the language input code with the URL variable in the article() function. After that, we used the download() function to download the article in the program, and then we parsed the article using the parse() function. After that, we performed natural language processing (NLP) on the parsed article piece with the help of the nlp() function. Now, after performing NLP on the parsed article, we were able to print useful information from the article. Therefore, first, we used the ".title" function to print the article's title, and then, we printed text from the article using the ".text" function. Next, we printed the summary of the article using the ".summary" function, and after that, we printed important keywords of the article using the ".keywords" function.
Note: We should note that TOI removes some of their articles from time to time, and therefore, this article link of TOI given in the example may not work in the future. So, while using this example, we have to use a new link of another article.
Newspaper Module: Some Useful Functions
When we performed the scrapping from the article's link using the newspaper module, we used some important functions to complete that task successfully. These functions are very useful to perform newspaper scrapping & curation and to print important information in the output. Here, in this section, we will learn about such important functions of the Newspaper Module, which are given below:
These are all important functions of the Newspaper Module, and we can use them according to our choice of what type of information (like keywords, title, etc.) we want from the article.
It is not possible for all of us to read the complete newspaper and therefore, we want only useful information from the article. Newspaper Module provides us the option from where we can only get useful information of an article by performing newspaper scrapping on the article. We can perform newspaper scrapping using the functions of the Newspaper Module in a Python program and print all the useful information from an article's link in the output.