A Gartner and Forrester research has adjudged Informatica products to be the best in the following areas:
The company, founded by Gaurav Dhillon and Diaz Nesamoney in 1993, with Informatica PowerCenter, is the pioneer in data integration and also the market leader in that space. If there is ever anything you want done with data, Informatica has a series of products that can do anything concerning data!
The easy-to-use Informatica tool has got an interface that is simple and visual, with drag-and-drop functionality to create a process for extracting, transforming and loading data. This process flow, also known as mapping, once created, can be used whenever required and enables the movement of very large data efficiently and swiftly. These huge chunks of raw data can be converted with Informatica to useful and applicable data.
Managing such humungous amounts of data it is possible certain situations may arise that choke performance by creating bottlenecks. To manage such situations there is a performance tuning process for identifying the logjams and clearing them, to get an ETL load time that is more acceptable.
Identifying bottlenecks and enhancing performance is an intricate and iterative process that involves determining one blockage at a time, eliminating it and moving on to the next, all along evaluating till the desired performance level is achieved.
The obvious first step would be to identify the bottlenecks, which may occur in the source, target, mapping, session or system. The blockages may occur in any or many of these areas simultaneously. Let’s look at the bottleneck for each of these areas.
Tips for Optimizing Performance
For enhancing performance first the bottlenecks need to be cleared and then all aspects of the product tuned for optimal performance.
- Optimizing Buffer Memory: When a session is initialized, memory blocks are allocated to hold data fro the source and target. Insufficient buffer memory for the DTM process can cause performance fluctuations. There are 2 options of improving buffer memory:
- Increasing or decreasing the Buffer Block size, depending on the size of the source and target data.
- Increasing the DTM Buffer memory size triggers the Integration Service to create additional buffer blocks.
- Optimizing Cache Memory: The Rank, Aggregator and Lookup transformations utilize cache memory for storing the transformed data. If the cache memory is insufficient, the data is stored in temporary cache files by the Integration Service. Reading from these temp files slows the performance of the session. There are 2 solutions to this situation –
- Increase the cache size so there is no need for the Integration Service to access temporary cache files.
- Limit the connected input/output ports, to the data cache for transformations, which will automatically reduce the data stored in the cache.
- Target Optimization: Bottlenecks because of heavy loading operations or the small size of database network packets are common and prevalent, and can be identified through the Session Log files. There are 6 ways to optimize target:
- You can use bulk loading, so that the Integration Service circumvents the database log and speeds up the performance. The hitch is that if the database log is not written, rollbacks and recovery will not be possible.
- External loaders can be configured in the PowerCenter for the DB2, Oracle, Teradata and Sybase databases. This will also increase the performance of a session.
- Omitting key constraints and indexes will improve the performance of the session, and can be added back after completing the session.
- Reducing deadlocks to speed up performance will require each target session to have a different connection group that uses also uses a different name for the connection.
- Checkpoint intervals can be increased to improve performance.
- Increase the network packet size when writing to Sybase, Oracle, Microsoft SQL or ASE server targets.
- Source Optimization: Bottlenecks at source are caused mainly by small database network packet size and poorly written queries. Increasing the database packet size and improving the efficiency in writing queries will take care of this problem.
- Optimizing Mapping: This does take some time to implement, but it has substantial effect in improving the performance of a session. This should always be done after optimizing the source and target. Ideally, to optimize mapping, it is necessary that the configuration of the mapping allows for maximum work to be done with the minimum expressions and transformations. There are 3 steps to optimizing mapping:
- Eliminate data-type conversions from mapping that may be unnecessary.
- Tuning the transformations by reducing their number and deleting links between transformations.
- Identifying expressions that are complex and slow, and simplify them so that they enhance the performance of the session. Some of the ways to do this are by:
- Instead of functions use operators, as the Integration service reads operators faster.
- The Integration service is adept at processing numeric operations faster than a string operation.
- Use local variables instead of a common expression.
- If the same task is carried out multiple times at different places, bring all the common tasks together and reduce the workload.
- Reducing the number of aggregate function calls frees up the Integration Service and system resources from having to locate and collate the common call function, which unnecessarily consumes resources.
- Partitioning Sessions: Larger data sets can be split into subsets by using the Informatica PowerCenter Partitioning option, and then processed parallel to improve performance of the session.
- Dynamic Session Partitioning: Informatica’s session partition assists in achieving quicker data delivery by parallel data processing. The Integration Service can decide the partitioning of sessions and parallel process data dynamically, based on certain defined parameters.
- Pushdown Optimization: Transformation logic is converted into SQL statements that can be executed directly on a database.
- Balancing Workflow Load: The Informatica Load Balancer manages the distribution of workload across the grid to a single node or many nodes, by matching the requirements of the task to the resources available.
- Grid Deployment: Sessions and workflows can be configured to run on a grid so as to enable the Integration Service to run service process and session threads to enhance scalability and performance.
- Concurrent Workflows: A workflow can be configured to run concurrently as multiple instances. It could either run concurrent as unique instances or as multiple instances with the same name.
Share this Post