Python Tutorial

Most of us have thought about why Python is growing so much rapidly when we compare it with other programming languages? Yes, Python indeed got famous in a very short span of time, and now we can see applications of Python in every field. And, yes, every field means every field where technology can be seen. Python programming language is not only limited to programming or developing purpose, and now we can also see the use of it in various other fields like medical, business, defence, e-commerce etc. The major reason behind Python's huge development and reach is its simplicity, and numerous libraries come with it. Many fields are achieving a new height of success and development just because of the use of Python in them.

If we look again at the names of fields we have mentioned, we will find that we have also mentioned the medical field. Now, many of us will think that how Python can be helpful in the medical field. The answer to this question is not only limited to medical equipment used in a hospital or in a clinic but Python is also used in various other fields of medicine. One important field of medicine where Python can be seen is Bioinformatics, and we don't have to get confused here between genetics (which is Biotechnology) and bioinformatics.

Note: Bioinformatics is an interdisciplinary field that includes studies from biology and various other fields such as computer science, mathematics, physics, etc.

In Python, we have a very famous module, Biopython, for bioinformatics, and the use of this module is rapidly increasing as many scientists are now using this module for their research. In this tutorial, we are going to study this Biopython module and learn a bit about it. We will also learn about its installation and how it is used for research work in bioinformatics through an example.

Biopython Module

In Python or even in most programming languages, Biopython is not just the most popular but also the largest bioinformatics package. Biopython Module contains a lot of different sub-packages for performing common bioinformatics tasks. The Biopython module is mainly written in Python, but it also contains C code, and Chang and Chapman developed it. The C code present in the Biopython package is used to optimize the complex computation part of the module. The Biopython can run on multiple operating systems such as Windows, Linux, UNIX, Mac OS X, etc.

Before we start learning about Biopython Module, we must have a basic idea of bioinformatic terms such as DNA, RNA, protein sequences, genome sequences etc. Otherwise, it won't be easy to understand the working and functions of this module. Besides the basic terms of bioinformatics, we should make sure that the latest version of Python is present in our system and we are familiar with the pip installer.

Biopython Module: Introduction

The Biopython Module is a collection of different Python modules which provides many different functions to deal with various genetic structures such as DNA, RNA and protein sequence operations. The protein sequence operations we mentioned here can be finding motifs in protein sequence, reverse complementing of a DNA sequence, etc. In Biopython Module, we are provided with a lot of parsers and with the help of these parsers, we can read all the major genetic databases such as SwissPort, GenBank, FASTA, etc. These parsers are also very helpful in reading major wrappers/interfaces, which are very helpful in running other popular bioinformatics tools/software like Entrez, NCBI BLASTN etc. With the help of the Biopython Module, we can do all this inside a Python environment using a Python program.

Biopython Module: Features

Till now, we surely have an idea of how important the Biopython Module is and how it is helpful for all those who are related to the Bioinformatics field. Now, we will discuss the features that Biopython Module offers and for which it is famous. Following is the list of salient features of the Biopython Module:

Biopython Module has easy to learn, easily portable and very clear syntax.
Biopython Module gives us the option by which we can deal with protein sequence formats.
Biopython Module gives us various tools through which we can manage different types of protein structures.
Biopython Module is object-oriented, interpreted and interactive same as Python.
Biopython Module provides us access to many local bioinformatics services that includes Clustalw, Blast, EMBOSS etc.
Biopython Module supports various bioinformatics file formats such as SCOP, FASTA, Medline or PubMed, PDB, ExPaSY-related formats and GenBank.
With Biopython Module, we can access various online services and databases like NCBI services, including PubMed, Blast, Entrez, etc., and ExPaSY services like Prosite and SwissPort.
Biopython Module also gives us the option of BioSQL, a standard set of SQL tables, and we can use them to store sequences plus features and annotations.

We have now seen all the salient features of the Biopython Module, and now we can understand how useful this module is for all the work done in the bioinformatics field.

Biopython Module: Goals

As we all know, that Biopython Module is the best Python package for all fieldwork and research work in bioinformatics, but there would have been some goals for what this package was built. If we talk in general, Biopython Module was built with the goal to provide standard yet simple and extensive access to all the data and tools required for bioinformatics work through Python language. But this wasn't the only goal of building this module; there were other major goals too. We will discuss all these major goals for which the Biopython module was built and list them in this section.

Following is the list of all the major or specific goals for building Biopython Module:

Biopython Module was built for helping in performing genomic data analysis.
Biopython Module was built with the goal of providing high-quality & reusable scripts and modules.
One of the goals for building the Biopython Module was to provide standardized access to all bioinformatics resources.
Biopython Module was also built with the goal of having fast array manipulation, which can be used in PDB, Markov Models, NaiveBayes and Cluster codes.

So, these are all the specific and major goals for which Biopython Module was built and introduced in Python as a package for bioinformatics.

Biopython Module: Advantages

We have now seen the features of the Biopython Module and how it can be very helpful to all of those connected with the field of bioinformatics. We can easily depict some of the advantages of this module, but still, there are some advantages that we can't guess with the listed features or goals. Therefore, in this section, we will see all the advantages of the Biopython Module and how it is helpful in many ways.

Following are some of the advantages of using the Biopython Module for all the studies and work related to Bioinformatics:

Biopython Module provides us with different microarray data types used in the clustering process.
Biopython Module also provides support to all types of journal data used in medical applications.
Biopython Module gives us the option to clear documentation that is based on cookbook style.
Biopython Module is also very helpful in reading and writing different files having Tree-view type.
Biopython Module is very helpful as it supports parser development by providing various sub-modules that can be used to parse a bioinformatics file into a generic class of sequence plus features or a format specific record object.
Biopython Module also supports the structure data that is used for PDB representation, analysis and parsing.
Biopython Module also provides supports to various bioinformatics databases such as the BioSQL database (A database that is widely used as a standard database amongst all the projects of bioinformatics).

So, this is the list of all the advantages we have when we use Biopython Module, and it also depicts how this module is very helpful and useful for everyone connected with the field of bioinformatics.

Biopython Module: Installation

Now, we will learn about its implementation and its functioning in a Python program. We have to first install the Biopython Module in our system, and then only we will be able to import and use functions of this module in a Python program. Therefore, we will learn here about the installation process of the Biopython Module in our system, and we will also check the compatibility of Python installed in our device. This is because Biopython Module is supported in Python version above or equal to 2.5, and Python having version lesser than 2.5 do not support installation and importing of Biopython Module. That's why first we should make sure that Python installed in our system is of higher requisite or latest versions.

Now, if we don't know the version of Python installed in our system and we want to check it, then we can use the following command in the command prompt terminal:

When we press the enter key, the version of Python installed in our system will be displayed, as we can see in the output image.

In the displayed version, we can see that version of Python installed in the system is higher than the required version. But if somehow the version of Python present in our system is not equal to or higher than the required version, i.e., Python version 2.5, then we should first update it and then only we can proceed with the installation part.

Note: There are many other ways to check the version of Python installed in our system, but we will prefer this one to use as this is the easiest and simplest method.

Now, after checking the version of Python installed in our system, we will look forward to installing the Biopython module, and we will use the pip installer to install this module. We will use the following pip installer command in the command prompt terminal to install the Biopython Module in our system:

When we press the enter key after writing the command, the pip installer will start installing the Biopython Module in our system.

Biopython Module is now successfully installed in our system, and now we can import it into a Python program to use its functions and learn its implementation.

Biopython Module: Implementation

To learn how the Biopython Module works and how it helps in parsing bioinformatics files, we have first to create a sample FASTA file (Here 'fasta' is referred to the file format sequence originated from the bioinformatics software). In FASTA file format, the sequence in the file is arranged one by one, and each sequence present in the file will have its own ID, name, description and actual sequence data.

We will first have to open notepad present in our system and write down the following content in it:

Now, we have to save this notepad file with the name 'SampleFile1.fasta', and we have to save this in the same directory where Python is installed so that we don't have to write the whole directory while opening the file. It's time that we will use Biopython Module in a Python program and learn its implementation by parsing the sample fasta file we created.

Look at the following Python program where we have parsed the sample fasta file using functions of Biopython Module:

# Importing required functions from Biopython module
from Bio.SeqIO import parse
from Bio.SeqRecord import SeqRecord 
from Bio.Seq import Seq 

# Open the sample FASTA file we have created
sampleFile = open("SampleFile1.fasta") 

# Parsing the file in the Python program
parseRecords = parse(sampleFile, "fasta")

# Using for loop to printing attributes of files
for record in parseRecords:
    # Printing multiple attributes of the file
    print("Id of FASTA File: %s" % record.id) 
    print("Name of FASTA File: %s" % record.name) 
    print("Description of FASTA File: %s" % record.description) 
    print("Annotations in FASTA File: %s" % record.annotations) 
    print("Sequence Data in FASTA File: %s" % record.seq)

Output:

Id of FASTA File: sampleFile|P2426|FMS1_ECOLI
Name of FASTA File: sampleFile|P2426|FMS1_ECOLI
Description of FASTA File: sampleFile|P2426|FMS1_ECOLI CS1 is a fimbrial subunit of the precursor (Have CS1 pilin)
Annotations in FASTA File: {}
Sequence Data in FASTA File: MKLKKTIGADALATLFATMGASAVEKTISVTASVDMTVDLLQSDGSALPNSVALTYSPAVNNFEAHTINTVVQTNDSDKGVVVKLSAMPVLSNVLNPTLQIPVSVNFAGKPLSTTGITIDSNDLNFASSGVNKVSMTQKLSIHADATRVTGGALTAGQYQGLVSIILTKSTTTTTTTKGT

Id of FASTA File: sampleFile|P2631|FMS3_ECOLI
Name of FASTA File: sampleFile|P2631|FMS3_ECOLI
Description of FASTA File: sampleFile|P2631|FMS3_ECOLI CS3 is a fimbrial subunit of the precursor (Have CS3 pilin)
Annotations in FASTA File: {}
Sequence Data in FASTA File: MLKIKYLLIGLSKSAMSSYSLAAAGPTLTKELALTVLSPAALDATWAPQDNLTLSNTGVSNTLVGVLTLSNTSIDTVSIANTNVSDTSKNGTVTFAHETNNSASFATTISTDNANITLDKNAGNTIVKTTNGSPLPTNLPLKFITTEGNEHLVSGNYRANITITSTIKGGGTKKGTTDKK

Explanation:

We have firstly imported the different tools of the Biopython Module in the program, such as parse, SeqRecord and Seq, using the 'from' keyword. Then, we opened the sample fasta file we created in the program using the open() function. After that, we have used the parse() function on the variable we initialized to open the sample file, i.e., sampleFile. Then, we looped over the parseRecords variable (Initialized variable where file parsed) to print the different properties and attributes from the file.

We have displayed the following attributes with their respective functions of the Biopython Module:

We printed id from the file using record.id,
We printed the name of the sequence in the file using record.name,
We used record.description to print the description of the sequence present in the file,
We used record.annotations to print annotations of the sequence,
and last, we used record.seq to print sequences present in the sample file.

As we can see in the output, all the attributes of the sample fasta file are successfully printed, and these attributes are printed for the first sequence firstly and then for the second sequence.

This is a sample file example that how we can use Biopython Module in the bioinformatics work and how it helps parse bioinformatics software files using a Python program.

Next TopicPython Dash Module

← prev next →