Python Site Connectivity Checker ProjectIn this tutorial, we will learn about the site connectivity checker in Python. It is an interesting project itself and levels up the skills. We will learn how to handle HTTP requests, create a command-line interface (CLI), and organize our application's code using common Python project layout practices. We will also discuss the asynchronous features that will help to understand the multiple HTTP requests efficiently. Project OverviewA website connectivity checker is a tool that helps determine whether a website is reachable. It can be useful in identifying when a website is down or unavailable. The user inputs the website URL they wish to check, and the application will verify its connectivity status, displaying the results for the user. Our application will take a few options through a minimal command-line interface (CLI). Below is the summary of these options -
By default, our application will run synchronously, which means site connectivity will check one by one. To run connectivity checks concurrently, we can use the -a or --asynchronous option and utilize Python's asynchronous features and the aiohttp library. However, asynchronous reviews can make the website connectivity checker faster and more efficient. PrerequisitesBefore moving further, one should have a basic understanding. Additionally, we should have familiarity with the following topics.
Knowing the aiohttp library is beneficial but optional for this project. If you are new to the library, don't be discouraged. Attempting the project will allow you to learn and grow, and you can always refer to resources for assistance if needed. Before diving into coding part of website connectivity checker project, setting up a proper working environment and organizing our project files is important. With the project overview and necessary prerequisites in mind, we can begin preparing our workspace and establishing a project layout that works best for you. It will make coding easier and ensure that your project runs smoothly. Setup Site Connectivity Checker in PythonThis section will have the structure of site connectivity checker app. First we will create the Python virtual environment for the project to isolate the dependencies from other project. In the next step, we will set up the project's layout by creating all the required files and the directory structure. Set Up Development Environment First, we will create the virtual environment and activate using the following command. Now, we will install the following library in the virtual environment with the pip the standard package manager. This command installs aiohttp into the virtual environment, which will be used in conjunction with Python's async functionality to handle asynchronous HTTP requests in our site connectivity checker app. Organize Site Connectivity Checker ProjectPython offers flexibility when organizing applications so that you may come across various structures in different projects. However, a common structure for small installable Python projects is to have a single package, which is usually named after the project itself. Following is the directory structure for the site connectivity checker app. Inside the my_project/ directory, you'll have the following files: __init__.py - It enables site_checker/ as a Python package. __main__.py - It works as an entry-point script for the app. checker.py - It provides the application's core functionalities. cli.py - it contains the command-line interface for the application. Check Website's Connectivity in PythonBefore going further, we add the application's version to the __init__.py file. The version variable, located at the module level, holds the current version number of your project. As you start a new application, the initial version is set to 0.1.0. With this basic setup in place, you can begin working on the application's core functionality, which is checking the website's connectivity. Implement a Connectivity Checker FunctionThere are multiple options in Python for checking a website's availability, including using third-party libraries such as requests. The requests library is popular, as it provides a user-friendly API for making HTTP requests. It allows us to easily check the status of a website and determine if it is online. We will also use the urllib package, which provides several methods for handling HTTP requests. To check the website is online, we can use the urlopen() function from urllib.request module. Example - Output: b'<!doctype html>\n<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 8]> <html class="no-js ie8 lt-ie9"> <![endif]-->\n<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr"> <!--<![endif]-->\n\n<head>\n <!-- Google tag (gtag.js) -->\n <script async src="https://www.googletagmanager.com/gtag/js?id=G-TF35YF9CVH"></script>\n <script>\n window.dataLayer = window.dataLayer || [];\n function gtag(){dataLayer.push(arguments);}\n gtag(\'js\', new Date());\n gtag(\'config\', \'G-TF35YF9CVH\');\n </script>\n\n <meta charset="utf-8">\n <meta http-equiv="X-UA-Compatible" content="IE=edge">\n\n <link rel="prefetch" href="//ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">\n <link rel="prefetch" href="//ajax.googleapis.com/ajax/libs/jqueryui/1.12.1/jquery-ui.min.js">\n\n <meta name="application-name" content="Python.org">\n <meta name="msapplication-tooltip" content="The official home of the Python Programming Language">\n <meta name="apple-mobile-web-app-title" content="Python.org">\n ............. Explanation - In the above code, the urlopen() function takes a URL as an argument and response.read() method returns the Request object. However, we only check if the website is online, so downloading the entire page would be wasteful. We use the HTTPConnection modules to make the HTTP requests using the different HTTP methods to make it more efficient. We can use the HEAD HTTP method to ask for a response containing only the headers of the target website. Let's understand the following code. Example - Output: [('Connection', 'close'), ('Content-Length', '0'), ('Server', 'Varnish'), ('Retry-After', '0'), ('Location', 'https://pypi.org/'), ('Accept-Ranges', 'bytes'), ('Date', 'Sun, 29 Jan 2023 11:51:48 GMT'), ('X-Served-By', 'cache-maa10221-MAA'), ('X-Cache', 'HIT'), ('X-Cache-Hits', '0'), ('X-Timer', 'S1674993109.864158,VS0,VE0'), ('X-Frame-Options', 'deny'), ('X-XSS-Protection', '1; mode=block'), ('X-Content-Type-Options', 'nosniff'), ('X-Permitted-Cross-Domain-Policies', 'none')] Explanation - The above code uses the HTTPConnection class from the http.client module to check the availability of the website at pypi.org. The HTTPConnection class creates a connection to an HTTP server, in this case, pypi.org, on port 80 with a timeout of 10 seconds. Then, a HEAD request is sent to the server using the request() method and passing in "HEAD" as the request method and "/" as the resource path. Finally, the response to the HEAD request is retrieved using the getresponse() method and the response headers are printed using the getheaders() method. The headers contain information about the response, such as the status code, content type, and content length. Now we will implement the following code in the checker.py file. Example - Explanation - The above code defines a function named check_site_is_online() that takes a URL and a timeout as input and returns True if the website is online, or raises an exception if the website is offline or there is an unknown error. The function uses the urlparse function from the urllib.parse module to parse the URL and extract the host. If the host is not found in the network location (parser.netloc), it is removed from the resource path (parser.path). The function then tries to connect to the host on ports 80 and 443 with a timeout of 2 seconds by default (which can be changed by passing a different value to the timeout parameter) using the HTTPConnection class from the http.client module. A HEAD request is sent to the server using the request method, and if the request is successful, the function returns True. If an exception is raised, it is captured and stored in the error variable. Finally, the connection is closed, and the error is raised if no successful request is made. It allows the function to detect if the website is offline or if there is another error and raise the appropriate exception. Run First Connectivity CheckerWe will execute our check_site_is_online() function, and check whether a function is working or not. Example - Output: True True Create Our Website Connectivity Checker's CLISo far, we have implemented the site checker, which verifies if a website is online by performing an HTTP request using the http.client module from the standard library. This section will have a minimal CLI that will allow us to run our website connectivity checker app from the command line. Here, we will provide the URL from the command line and a loading list of URLs from a text file. This application will also return a user-friendly message. The above code will reside in the __main__.py file, and we will take the argument from the command line and return the output. Let's understand the following code. Example - Output: (venv) C:\Users\User\Desktop\my_project>python -m site_checker -u python.org pypi.org javatpoint.com python.org is online. pypi.org is online. javatpoint.com is online. (venv) C:\Users\User\Desktop\my_project>python -m site_checker -u python.org pypi.org non-existing-site.org python.org is online. pypi.org is online. non-existing-site.org is offline. (venv) C:\Users\User\Desktop\my_project>vim sample.txt python.org javatpoint.com google.com geeksforgeek.com pypi.org docs.python.org peps.python.org (venv) C:\Users\User \Desktop\my_project>python -m site_checker -f sample.txt python.org is online. javatpoint.com is online. google.com is online. geeksforgeek.com is online. pypi.org is online. docs.python.org is online. peps.python.org is online. Explanation - This code uses the argparse module to create a command line interface for the application. The parser object is created with a description of the application and two mutually exclusive arguments are added: --urls and --file. The --urls argument allows the user to provide a list of URLs as command line arguments, while the --file argument allows the user to provide a file containing a list of URLs. If the --urls argument is provided; the list of URLs is stored in the urls variable. If the --file argument is provided, the file is read and the list of URLs is extracted and stored in the urls variable. If neither argument is provided, the parser.print_help() method is called to display the help information and the function returns. Finally, the code loops through the urls list and calls the check_site_is_online() function for each URL, printing the status of each website to the console. Our website connectivity checker functions effectively. Running the site_connectivity app with the -h or --help option displays a usage message that outlines how to utilize the application. The application allows you to check the connectivity of multiple websites by either providing the URLs at the command line or loading them from a text file. If an error occurs during the connectivity check, a descriptive message will be displayed on the screen indicating the cause of the error. Check the Multiple Websites AsynchronouslyHere we will write the code for the check website asynchronously. Let's understand the following code. Example - Output: (venv) C:\Users\User\Desktop\my_project>python -m site_checker -u python.org pypi.org javatpoint.com pypi.org is online. javatpoint.com is online. python.org is online. ConclusionIn this tutorial, we have built the functional site connectivity checker application in Python. We have learned about the basics of handling HTTP requests to a given website. We have also implemented the command line interface with the argparse and discussed checking if a website is online using Python's http.client. We have also checked the multiple websites using the synchronously. Next TopicNew Features and Fixes in Python 3.11 |