Partitioning in Informatica
The PowerCenter Integration Services creates a default partition type at each partition point. If we have the Partitioning option, we can change the partition type. The partition type controls how the PowerCenter Integration Service distributes data among partitions at partition points.
When we configure the partitioning information for a pipeline, then we must define a partition type at each partition point in the pipeline. The partition type determines how the PowerCenter Integration Service redistributes data across partition points.
Here are the following partition types in the Workflow Manager, such as:
- Database partitioning: The PowerCenter Integration Service queries the IBM DB2 or Oracle system for table partition information. It reads partitioned data from the corresponding nodes in the database. Use database partitioning with Oracle or IBM DB2 source instances on a multi-node table space. Use database partitioning with DB2 targets.
- Hash partitioning: Use hash partitioning when we want the PowerCenter Integration Service to distribute rows to the partitions by the group. For example, we need to sort items by item ID, but we do not know how many items have a particular ID number.
Here are the two types of hash partitioning, such as:
- Hash auto-keys: The PowerCenter Integration Service uses all grouped or sorted ports as a compound partition key. Then we need to use hash auto-keys partitioning at Rank, Sorter, and unsorted Aggregator transformations.
- Hash user keys: The PowerCenter Integration Service uses a hash function to group rows of data among partitions. And define the number of ports to generate the partition key.
- Key range: It specifies one or more ports to form a compound partition key. The PowerCenter Integration Service passes data to each partition depending on the ranges we define for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range.
- Pass-through: The PowerCenter Integration Service passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where we want to create a new pipeline stage to improve performance, but do not want to change the distribution of data across partitions.
- Round-robin: The PowerCenter Integration Service distributes blocks of data to one or more partitions. Use round-robin partitioning so that each partition process rows based on the number and size of the blocks.
Key Points of Informatica Partitions
Below are some essential points while we use the partitions in Informatica, such as:
- We cannot create a partition key for round-robin, hash auto-keys, and pass-through partition.
- If we have a bitmap index upon the target and using the pass-through partition, then we need to update the target table. In this process, the session might be failing because the bitmap index creates the locking problem.
- Partition increases the total DTM buffer memory requirement. To ensure enough free memory to avoid memory allocation failures.
- When we use a pass-through partition, then Informatica tries to make multiple connection requests to the database server. To ensure that the database is configured to accept the more connections requests.
- We can use the native database options as partition alternatives to increase the degree of parallelism of query processing.
For example, in the Oracle database, we can specify a PARALLEL hint or alter the DOP of the table.
- We can also use both Informatica and native database level parallel as per the requirements.
For example, create 2 pass-through pipelines and each sending the query to the Oracle database with the PARALLEL hint.