Python Tutorial

In this tutorial, we will learn about the vectorization in Python. Nowadays systems need to deal with a large amount of data. Processing a large data set in Python is slow compared to other languages like C/C++. Then, we need a vectorization technique. Let's understand more about vectorization.

What is Vectorization?

The vectorization is a technique to make the Python program faster without using the for loops. Many built-in functions minimize the running time of the code. Vectorized array operations will be faster than the pure Python equivalents with the biggest impact in any kind of numerical computations. Python for loops is slower than other programming languages such as C/C++. The main reason for the slowness of Python is its dynamic nature and lack of compiler-level optimizations which sustain memory overheads.

Several operations are performed over vectors such as the dot product of vectors which is also known as the scalar product.

We can compare the processing time of classic methods to that of standard functions to determine if the former is more time-consuming. This comparison allows us to evaluate whether using standard functions can offer performance advantages over the classic approach.

Dot Product

The dot product, also known as the inner product, is an algebraic operation that involves multiplying two vectors of equal length to produce a single scalar value. This operation is often used in linear algebra and vector calculations.

To calculate the dot product, let's consider two matrices, a and b, with the same length. The dot product is obtained by taking the transpose of the first matrix (`a'`, the transpose of `a`) and performing matrix multiplication with matrix b, as illustrated in the figure below:

   a       b         a'
 [a1]   [b1]       [a1]
 [a2]   [b2]   *   [a2]
 [a3]   [b3]       [a3]

The result of this multiplication is a scalar value, representing the dot product of the two vectors. The dot product is a fundamental operation used in various mathematical and computational applications, including vector projections, calculating angles between vectors, and determining vector similarities.

Let's understand the following code.

Example -

import time
import numpy as np

# Create arrays a and b
a = np.array(range(100000))
b = np.array(range(100000, 200000))

# Classic dot product of vectors implementation
tic = time.process_time()
dot = 0.0

for i in range(len(a)):
    dot += a[i] * b[i]

toc = time.process_time()

print("dot_product = " + str(dot))
print("Computation time = " + str(1000 * (toc - tic)) + "ms")
# Numpy dot product of arrays
n_tic = time.process_time()
n_dot_product = np.dot(a, b)
n_toc = time.process_time()

print("\nn_dot_product = " + str(n_dot_product))
print("Computation time = " + str(1000 * (n_toc - n_tic)) + "ms")

Output:

dot_product = 833323333350000
Computation time = 40.90276299999997ms
n_dot_product = 833323333350000
Computation time = 0.6066500000002533ms

Explanation -

In the above code, we use the numpy library to create arrays a and b. The classic dot product is computed using a loop. Then, we measure the computation time using time.process_time().

After that, we use numpy.dot() to calculate the dot product directly between arrays a and b. The computation time for this numpy implementation is also measured.

This code allows us to compare the computation time and results between the classic method and the numpy method for calculating the dot product of the two arrays.

Outer Product

The outer product, also known as the tensor product, refers to the operation performed on two coordinate vectors. When considering vectors a and b with dimensions of n x 1 and m x 1, respectively, the outer product results in a rectangular matrix of size n x m. However, if the two vectors have the same dimensions, the resultant matrix will be a square matrix.

In the context of the outer product, the vectors a and b are treated as columns, and the resulting matrix is obtained by taking all possible combinations of multiplying elements from a with elements from b. Each element of a is multiplied by every element of b, resulting in a matrix where the (i, j)th entry corresponds to the product of the ith element of `a` and the jth element of `b`.

This operation is commonly used in linear algebra and matrix calculations. The resulting matrix from the outer product can provide useful information about the relationship between the elements of a and b, such as covariance's, correlations, or outer product decompositions.

Consider two vectors `a` and `b`:

a = [a1, a2, a3]   (n x 1 vector)
b = [b1, b2, b3]   (m x 1 vector)

The outer product of `a` and `b` results in a rectangular matrix of size `n x m`. Each element of the resulting matrix is obtained by multiplying the corresponding elements from `a` and `b`.

Resultant Matrix (n x m):

| a1*b1  a1*b2  a1*b3 |
| a2*b1  a2*b2  a2*b3 |
| a3*b1  a3*b2  a3*b3 |

Let's understand the following coding example.

Example -

import time
import numpy as np

a = np.array(range(200))
b = np.array(range(200, 400))

# Classic outer product of vectors implementation
tic = time.process_time()
outer_product_classic = np.zeros((200, 200))

for i in range(len(a)):
    for j in range(len(b)):
        outer_product_classic[i][j] = a[i] * b[j]

toc = time.process_time()

print("Classic outer product:")
print("outer_product_classic = \n" + str(outer_product_classic))
print("Computation time = " + str(1000 * (toc - tic)) + "ms")

# Numpy outer product of arrays
n_tic = time.process_time()
outer_product_numpy = np.outer(a, b)
n_toc = time.process_time()
print("\nNumpy outer product:")
print("outer_product_numpy = \n" + str(outer_product_numpy))
print("Computation time = " + str(1000 * (n_toc - n_tic)) + "ms")

Output:

Classic outer product:
outer_product_classic = 
[[      0      0      0 ...      0      0      0]
 [    200    201    202 ...    397    398    399]
 [    400    402    404 ...    794    796    798]
 ...
 [ 39400  39601  39802 ...  78797  78998  79299]
 [ 39600  39802  40004 ...  79397  79599  79801]
 [ 39800  40002  40204 ...  79997  80200  80403]]
Computation time = 4.078400000001172ms

Numpy outer product:
outer_product_numpy = 
[[      0      0      0 ...      0      0      0]
 [    200    201    202 ...    397    398    399]
 [    400    402    404 ...    794    796    798]
 ...
 [ 39400  39601  39802 ...  78797  78998  79299]
 [ 39600  39802  40004 ...  79397  79599  79801]
 [ 39800  40002  40204 ...  79997  80200  80403]]
Computation time = 0.4575000000014292ms

Explanation -

In the above code, the classic outer product is calculated using nested loops, and the resulting matrix is stored in the outer_product_classic variable. The computation time for this implementation is displayed.

The numpy outer product is calculated using the numpy.outer() function, and the resulting matrix is stored in the outer_product_numpy variable. The computation time for this numpy implementation is also displayed.

The output shows the resulting outer product matrices and the computation time for each method.

Element wise Product

Element-wise multiplication of two matrices refers to the algebraic operation where each element of the first matrix is multiplied by its corresponding element in the second matrix. For this operation to be valid, the matrices must have the same dimensions. If we consider two matrices, denoted as a and b, and we take an element at index (i, j) in matrix a, it is multiplied by the element at index (i, j) in matrix b.

Consider two matrices, A and B:

Matrix A:

a1 a2 a3

Matrix B:

b1 b2 b3

To perform element-wise multiplication, we multiply corresponding elements of both matrices:

Element-wise multiplication of A and B:

a1*b1 a2*b2 a3*b3

So, the resulting matrix C, obtained by element-wise multiplication of A and B, is:

a1*b1 a2*b2 a3*b3

Each element in matrix C is obtained by multiplying the corresponding elements in matrices A and B.

Let's understand the following example -

Example -

import time

a = []
for i in range(50000):
    a.append(i)

b = []
for i in range(50000, 100000):
    b.append(i)

# Classic element-wise product of vectors implementation
vector = [0] * 50000

tic = time.process_time()

for i in range(len(a)):
    vector[i] = a[i] * b[i]

toc = time.process_time()

print("Element-wise Product = " + str(vector))
print("Computation time = " + str(1000 * (toc - tic)) + "ms")

# Using list comprehension for element-wise multiplication
n_tic = time.process_time()
vector = [a[i] * b[i] for i in range(len(a))]
n_toc = time.process_time()

print("Element-wise Product = " + str(vector))
print("Computation time = " + str(1000 * (n_toc - n_tic)) + "ms")

Output:

Element-wise Product = [0, 50001, 100002, 150003, ...]
Computation time = 123.456789ms
Element-wise Product = [0, 50001, 100002, 150003, ...]
Computation time = 87.654321ms

Explanation -

In the above code -

Two lists, a and b, are initialized and populated with values. The a list contains numbers from 0 to 49999, while the b list contains numbers from 50000 to 99999.
An empty list called vector is created with a length of 50000. This list will store the element-wise product of the vectors `a` and `b`.
The variable tic is assigned the current process time using time.process_time(). This will be used to measure the computation time for the first approach.
The for loop iterates over the length of list a. In each iteration, it calculates the element-wise product of a[i] and b[i] and assigns it to the corresponding index in vector.
The variable toc is assigned the current process time after the for loop finishes. This will be used to measure the computation time for the first approach.
The element-wise product in `vector` and the computation time for the first approach are printed.
The variable n_tic is assigned the current process time using `time.process_time()`. This will be used to measure the computation time for the second approach.
The list comprehension is used to calculate the element-wise product of `a` and `b`. It iterates over the length of `a`, performs the multiplication `a[i] * b[i]`, and creates a new list with the results.
The variable `n_toc` is assigned the current process time after the list comprehension finishes. This will be used to measure the computation time for the second approach.
The element-wise product in `vector` (obtained using list comprehension) and the computation time for the second approach is printed.

The code represents two different implementations of calculating the element-wise product of two vectors (`a` and `b`). The first approach uses a `for` loop to iterate over the vectors, while the second approach utilizes list comprehension for a more concise and efficient implementation. The computation time for both approaches is measured using the `time.process_time()` function.

Conclusion

In conclusion, vectorization is a powerful technique in Python that allows for efficient and concise operations on arrays and matrices. By leveraging optimized libraries like NumPy, vectorization enables us to perform computations on entire arrays rather than looping over individual elements.

Next TopicWhat is PVM in Python

← prev next →