Vectorization in Python
In this tutorial, we will learn about the vectorization in Python. Nowadays systems need to deal with a large amount of data. Processing a large data set in Python is slow compared to other languages like C/C++. Then, we need a vectorization technique. Let's understand more about vectorization.
What is Vectorization?
The vectorization is a technique to make the Python program faster without using the for loops. Many built-in functions minimize the running time of the code. Vectorized array operations will be faster than the pure Python equivalents with the biggest impact in any kind of numerical computations. Python for loops is slower than other programming languages such as C/C++. The main reason for the slowness of Python is its dynamic nature and lack of compiler-level optimizations which sustain memory overheads.
Several operations are performed over vectors such as the dot product of vectors which is also known as the scalar product.
We can compare the processing time of classic methods to that of standard functions to determine if the former is more time-consuming. This comparison allows us to evaluate whether using standard functions can offer performance advantages over the classic approach.
The dot product, also known as the inner product, is an algebraic operation that involves multiplying two vectors of equal length to produce a single scalar value. This operation is often used in linear algebra and vector calculations.
To calculate the dot product, let's consider two matrices, a and b, with the same length. The dot product is obtained by taking the transpose of the first matrix (`a'`, the transpose of `a`) and performing matrix multiplication with matrix b, as illustrated in the figure below:
The result of this multiplication is a scalar value, representing the dot product of the two vectors. The dot product is a fundamental operation used in various mathematical and computational applications, including vector projections, calculating angles between vectors, and determining vector similarities.
Let's understand the following code.
dot_product = 833323333350000 Computation time = 40.90276299999997ms n_dot_product = 833323333350000 Computation time = 0.6066500000002533ms
In the above code, we use the numpy library to create arrays a and b. The classic dot product is computed using a loop. Then, we measure the computation time using time.process_time().
After that, we use numpy.dot() to calculate the dot product directly between arrays a and b. The computation time for this numpy implementation is also measured.
This code allows us to compare the computation time and results between the classic method and the numpy method for calculating the dot product of the two arrays.
The outer product, also known as the tensor product, refers to the operation performed on two coordinate vectors. When considering vectors a and b with dimensions of n x 1 and m x 1, respectively, the outer product results in a rectangular matrix of size n x m. However, if the two vectors have the same dimensions, the resultant matrix will be a square matrix.
In the context of the outer product, the vectors a and b are treated as columns, and the resulting matrix is obtained by taking all possible combinations of multiplying elements from a with elements from b. Each element of a is multiplied by every element of b, resulting in a matrix where the (i, j)th entry corresponds to the product of the ith element of `a` and the jth element of `b`.
This operation is commonly used in linear algebra and matrix calculations. The resulting matrix from the outer product can provide useful information about the relationship between the elements of a and b, such as covariance's, correlations, or outer product decompositions.
Consider two vectors `a` and `b`:
The outer product of `a` and `b` results in a rectangular matrix of size `n x m`. Each element of the resulting matrix is obtained by multiplying the corresponding elements from `a` and `b`.
Resultant Matrix (n x m):
Let's understand the following coding example.
Classic outer product: outer_product_classic = [[ 0 0 0 ... 0 0 0] [ 200 201 202 ... 397 398 399] [ 400 402 404 ... 794 796 798] ... [ 39400 39601 39802 ... 78797 78998 79299] [ 39600 39802 40004 ... 79397 79599 79801] [ 39800 40002 40204 ... 79997 80200 80403]] Computation time = 4.078400000001172ms Numpy outer product: outer_product_numpy = [[ 0 0 0 ... 0 0 0] [ 200 201 202 ... 397 398 399] [ 400 402 404 ... 794 796 798] ... [ 39400 39601 39802 ... 78797 78998 79299] [ 39600 39802 40004 ... 79397 79599 79801] [ 39800 40002 40204 ... 79997 80200 80403]] Computation time = 0.4575000000014292ms
In the above code, the classic outer product is calculated using nested loops, and the resulting matrix is stored in the outer_product_classic variable. The computation time for this implementation is displayed.
The numpy outer product is calculated using the numpy.outer() function, and the resulting matrix is stored in the outer_product_numpy variable. The computation time for this numpy implementation is also displayed.
The output shows the resulting outer product matrices and the computation time for each method.
Element wise Product
Element-wise multiplication of two matrices refers to the algebraic operation where each element of the first matrix is multiplied by its corresponding element in the second matrix. For this operation to be valid, the matrices must have the same dimensions. If we consider two matrices, denoted as a and b, and we take an element at index (i, j) in matrix a, it is multiplied by the element at index (i, j) in matrix b.
Consider two matrices, A and B:
a1 a2 a3
b1 b2 b3
To perform element-wise multiplication, we multiply corresponding elements of both matrices:
Element-wise multiplication of A and B:
a1*b1 a2*b2 a3*b3
So, the resulting matrix C, obtained by element-wise multiplication of A and B, is:
a1*b1 a2*b2 a3*b3
Each element in matrix C is obtained by multiplying the corresponding elements in matrices A and B.
Let's understand the following example -
Element-wise Product = [0, 50001, 100002, 150003, ...] Computation time = 123.456789ms Element-wise Product = [0, 50001, 100002, 150003, ...] Computation time = 87.654321ms
In the above code -
The code represents two different implementations of calculating the element-wise product of two vectors (`a` and `b`). The first approach uses a `for` loop to iterate over the vectors, while the second approach utilizes list comprehension for a more concise and efficient implementation. The computation time for both approaches is measured using the `time.process_time()` function.
In conclusion, vectorization is a powerful technique in Python that allows for efficient and concise operations on arrays and matrices. By leveraging optimized libraries like NumPy, vectorization enables us to perform computations on entire arrays rather than looping over individual elements.