11 Informatica Performance Improvement/Tuning Tips – Read On!

Sachin Satishdata management

Informatica performance tuning suggestion

A Gartner and Forrester research has adjudged Informatica products to be the best in the following areas:

  • Data Quality
  • Data Profiling
  • Data Masking
  • Data Integration
  • Data Replication
  • Data Virtualization
  • Master Data Management
  • Cloud-based Data Integration
  • Information Lifecycle Management

The company, founded by Gaurav Dhillon and Diaz Nesamoney in 1993, with Informatica PowerCenter, is the pioneer in data integration and also the market leader in that space. If there is ever anything you want done with data, Informatica has a series of products that can do anything concerning data!

The easy-to-use Informatica tool has got an interface that is simple and visual, with drag-and-drop functionality to create a process for extracting, transforming and loading data. This process flow, also known as mapping, once created, can be used whenever required and enables the movement of very large data efficiently and swiftly. These huge chunks of raw data can be converted with Informatica to useful and applicable data.

Bottlenecks

Managing such humungous amounts of data it is possible certain situations may arise that choke performance by creating bottlenecks.  To manage such situations there is a performance tuning process for identifying the logjams and clearing them, to get an ETL load time that is more acceptable.

Identifying bottlenecks and enhancing performance is an intricate and iterative process that involves determining one blockage at a time, eliminating it and moving on to the next, all along evaluating till the desired performance level is achieved.

The obvious first step would be to identify the bottlenecks, which may occur in the source, target, mapping, session or system. The blockages may occur in any or many of these areas simultaneously. Let’s look at the bottleneck for each of these areas.

  • Source : When data is read slowly by the Integration service from a source there is a delay in filling the DTM buffer, which in turn causes a delay in receiving data by the transformation thread and the writer thread. This chain of delays slows down the session.
  • Target : This bottleneck occurs when the data to be written to the target is pending, due to which there are no free blocks and the transformer and reader threads cannot initiate action. This causes a slow performance during the session.
  • Mapping : Bottlenecks can be caused by mapping logic that is either poorly written or is highly complex, which causes the transformation thread to slow down and results in the reader and writer threads waiting for free blocks.
  • Session : This invariably occurs when the configuration of the session memory is incorrect, resulting in the cache being small and the buffer memory low.
  • System : After having checked all of the above and found them fine, you need to tune the system, as the Integration service makes active use of the resources of the system to manage transformations, write and read data, and run sessions. The system memory is also used for cache files.

Tips for Optimizing Performance

For enhancing performance first the bottlenecks need to be cleared and then all aspects of the product tuned for optimal performance.

  1. Optimizing Buffer Memory: When a session is initialized, memory blocks are allocated to hold data fro the source and target. Insufficient buffer memory for the DTM process can cause performance fluctuations. There are 2 options of improving buffer memory:
    1. Increasing or decreasing the Buffer Block size, depending on the size of the source and target data.
    2. Increasing the DTM Buffer memory size triggers the Integration Service to create additional buffer blocks.
  2. Optimizing Cache Memory: The Rank, Aggregator and Lookup transformations utilize cache memory for storing the transformed data. If the cache memory is insufficient, the data is stored in temporary cache files by the Integration Service. Reading from these temp files slows the performance of the session. There are 2 solutions to this situation –
    1. Increase the cache size so there is no need for the Integration Service to access temporary cache files.
    2. Limit the connected input/output ports, to the data cache for transformations, which will automatically reduce the data stored in the cache.
  3. Target Optimization: Bottlenecks because of heavy loading operations or the small size of database network packets are common and prevalent, and can be identified through the Session Log files. There are 6 ways to optimize target:
    1. You can use bulk loading, so that the Integration Service circumvents the database log and speeds up the performance. The hitch is that if the database log is not written, rollbacks and recovery will not be possible.
    2. External loaders can be configured in the PowerCenter for the DB2, Oracle, Teradata and Sybase databases. This will also increase the performance of a session.
    3. Omitting key constraints and indexes will improve the performance of the session, and can be added back after completing the session.
    4. Reducing deadlocks to speed up performance will require each target session to have a different connection group that uses also uses a different name for the connection.
    5. Checkpoint intervals can be increased to improve performance.
    6. Increase the network packet size when writing to Sybase, Oracle, Microsoft SQL or ASE server targets.
  4. Source Optimization: Bottlenecks at source are caused mainly by small database network packet size and poorly written queries. Increasing the database packet size and improving the efficiency in writing queries will take care of this problem.
  5. Optimizing Mapping: This does take some time to implement, but it has substantial effect in improving the performance of a session. This should always be done after optimizing the source and target. Ideally, to optimize mapping, it is necessary that the configuration of the mapping allows for maximum work to be done with the minimum expressions and transformations. There are 3 steps to optimizing mapping:
    1. Eliminate data-type conversions from mapping that may be unnecessary.
    2. Tuning the transformations by reducing their number and deleting links between transformations.
    3. Identifying expressions that are complex and slow, and simplify them so that they enhance the performance of the session. Some of the ways to do this are by:
      • Instead of functions use operators, as the Integration service reads operators faster.
      • The Integration service is adept at processing numeric operations faster than a string operation.
      • Use local variables instead of a common expression.
      • If the same task is carried out multiple times at different places, bring all the common tasks together and reduce the workload.
      • Reducing the number of aggregate function calls frees up the Integration Service and system resources from having to locate and collate the common call function, which unnecessarily consumes resources.
  1. Partitioning Sessions: Larger data sets can be split into subsets by using the Informatica PowerCenter Partitioning option, and then processed parallel to improve performance of the session.
  2. Dynamic Session Partitioning: Informatica’s session partition assists in achieving quicker data delivery by parallel data processing. The Integration Service can decide the partitioning of sessions and parallel process data dynamically, based on certain defined parameters.
  3. Pushdown Optimization: Transformation logic is converted into SQL statements that can be executed directly on a database.
  4. Balancing Workflow Load: The Informatica Load Balancer manages the distribution of workload across the grid to a single node or many nodes, by matching the requirements of the task to the resources available.
  5. Grid Deployment: Sessions and workflows can be configured to run on a grid so as to enable the Integration Service to run service process and session threads to enhance scalability and performance.
  6. Concurrent Workflows: A workflow can be configured to run concurrently as multiple instances. It could either run concurrent as unique instances or as multiple instances with the same name.

Share this Post

About the Author
Sachin Satish

Sachin Satish

Facebook

Sachin Satish possesses over 9+ years of experience in the IT industry. He started his career in the field of data warehousing and has vast experience in Informatica and various relational database management systems. His expertise lies mainly in the area of Service Delivery, data analysis and IT consulting. He is a Big Data certified professional and is undergoing trainings on Data analytics. He has worked abroad for many years in London, U.K and few cities across the US.
On the personal front, he is passionate about music and exploring places around the world; in his free time, he watches movies and likes to spend time with family.

Comments

comments