Gary

Posted on Nov 4, 2020 • Edited on May 16

Pivot Data Using Informatica Java Transformation

#informatica #transformation #powercenter

Java Transformation is a powerful feature in Informatica, though examples on its usage are scarce online. This scarcity is likely because Java Transformation isn't typically necessary for most mappings. However, in certain scenarios, using Java Transformation can simplify and streamline your mappings. In this blog post, I will provide a comprehensive example of how to use Java Transformation to pivot data.

Suppose you have source data like this:
Source:

STUDENTID	NAME	SUBJECT	SCORE
1001	Alice	Math	100
1001	Alice	English	85
1002	Bob	Math	98
1002	Bob	English	95
1002	Bob	Science	90
1003	Chris	Math	95
1003	Chris	Science	90
1003	Chris	English	80

You want your target like this:

STUDENTID	NAME	MATH	ENGLISH	SCIENCE
1001	Alice	100	85	0
1002	Bob	98	95	90
1003	Chris	95	80	90

Typically, achieving this requires using Expression, Sorter, and Aggregator Transformations. While there are many online discussions covering this method, it can be tedious and difficult to maintain, especially when dealing with numerous columns and various aggregations.

Using Java Transformation can simplify this process significantly. Here are the steps to pivot the source data using Java Transformation:

Step 1 – Create a Mapping
Create a mapping with the source "PIVOT_TEST_SRC" and the target "PIVOT_TEST_TGT":

Step 2 – Create the Java Transformation
In the "Create Transformation" window, select "Java" from the list of available transformations and enter a name for your transformation. Here’s how you can do it:

Open the Mapping Designer: Navigate to the Mapping Designer tool in Informatica.
Add a New Transformation: Click on the "Transformation" menu and select "Create."
Select Java Transformation: From the list of available transformations, select "Java."
Name Your Transformation: Enter a meaningful name for your Java Transformation, such as "Java_Pivot_Src".
Click Create: Click the "Create" button, then click "Done." This sets up the Java Transformation, allowing you to customize it to pivot your data as needed.

Then choose Active type. With the Active type, the number of input rows can differ from the number of output rows.

Step 3 – Drag and drop all the Source qualifier columns to the Java transformation:
Drag and drop each column from the Source Qualifier to the Java Transformation. This action creates input ports in the Java Transformation for each column.

Ensure that the names and data types of the input ports in the Java Transformation match those of the Source Qualifier columns.

Step 4 – Add Output Ports in the Java Transformation
To properly configure the Java Transformation to pivot your data, you'll need to define the necessary output ports. Here's how to do it:

Open Java Transformation Properties: Double-click on the Java Transformation (e.g., Java_pivot_src) in your mapping to open its properties window.
Navigate to the Ports Tab: Click on the "Ports" tab within the properties window.
Add Output Ports: Add the following output ports, which will correspond to the pivoted data structure you want to achieve:

out_studentId
out_studentName
out_mathScore
out_englishScore
out_scienceScore
Here are the steps to add each port:
Click on the Add Button: This is usually represented by a "+" icon or similar.
Name the Port: Enter the name for each port (e.g., out_studentId).
Set the Data Type: Ensure that the data type for each port matches the expected data type of the source data (e.g., String, Integer, etc.).
Configure Data Types and Precision: Make sure to configure the data types and precision for each output port appropriately. This ensures that the data will be correctly processed and transferred to the target.

Your Ports tab should look something like this after adding the output ports:

Save your work. You mapping looks like this:

Step 5 – Implement the Java Code for Pivoting Data
To implement the Java code necessary for pivoting your data, follow these detailed steps:

Open Java Code Tab: Double-click on the Java Transformation (e.g., Java_pivot_src) in your mapping to open its properties window. Then, navigate to the "Java Code" tab at the top.

On Input Row Tab: By default, you will be in the "On Input Row" tab at the bottom of the window. This is where you will write the code that processes each input row.

Understand Other Tabs: Familiarize yourself with other tabs like “Import Packages”, “Helper Code”, and “On End of Data”. These tabs allow you to import necessary Java packages, write helper methods, and define code that executes after all data has been processed, respectively.

Example Using Java HashMap
For this example, you will use a Java HashMap to store student names, subjects, and their respective scores. The keys in the HashMap will be the target column names (Name, Math, English, Science), and the values will be the student names and scores.

Here’s how you can implement this:

                             Name ->  student_name
                             Math ->  math_score
                          English -> English_score
                          Science -> science_score

This hashmap will be embedded in another hashmap (allRecords) where key is student id. So the final hashmap structure looks like:

                            |-->  Name ->  student_name
               Student_id-> |-->  Math ->   math_score
                            |--> English ->  English_score
                            |-->  Science-> science_score

Based on this idea, put the following two lines in “Import Packages” tab:

import java.util.HashMap;
import java.util.Map;

Select “Helper Code”, declare the variables required inside the Java code:

private HashMap<String, HashMap<String, Object>> allRecords= new HashMap<String, HashMap<String, Object>>();
static int inputRows=0;
static int outputRows=0;

Select “On Input row” tab, write the following code:

             HashMap<String, Object> subjectSore;
             if (!isNull("STUDENTID")) {
                inputRows += 1;
                if (allRecords.containsKey(STUDENTID)) {
                    subjectSore = allRecords.get(STUDENTID);
                    subjectSore.put(SUBJECT, SCORE);
                } else { //build new key value pair
                    subjectSore = new HashMap<String, Object>();
                    subjectSore.put("Name", NAME);
                    subjectSore.put("Math", null);
                    subjectSore.put("English", null);
                    subjectSore.put("Science", null);
                    subjectSore.put(SUBJECT, SCORE);
                    //add subjectSore to hashmap
                    allRecords.put(STUDENTID, subjectSore);
                }
              }

“On End of Data” tab, put the following code for output results:

HashMap<String, Object> outsubjectSoreMap;
for (Map.Entry<String, HashMap<String, Object>> entry : allRecords.entrySet()) {
                outsubjectSoreMap = entry.getValue();
                out_studentId = entry.getKey(); //get("STUDENTID");
                out_studentName = (String) outsubjectSoreMap.get("Name");


                if (outsubjectSoreMap.get("Math") == null) {
                    out_mathSore = "";
                } else {
                    out_mathSore = (String) outsubjectSoreMap.get("Math");
                }

                if (outsubjectSoreMap.get("English") == null) {
                    out_englishSore = "";
                } else {
                    out_englishSore = (String) outsubjectSoreMap.get("English");
                }

                if (outsubjectSoreMap.get("Science") == null) {
                    out_scienceScore = "";
                } else {
                    out_scienceScore = (String) outsubjectSoreMap.get("Science");
                }

               generateRow();
               outputRows +=1;
            } // end for loop: output records  
            allRecords.clear();  //release memory
            logInfo("The total number of records generated is: " +  outputRows);

            inputRows=0; 
            outputRows=0;

generateRow() is PowerCenter Java API, which generates rows according to the defined output ports values.
If the input records are too many to hold in the hashmap, you can use generateRow() in the “On Input row” tab to generate some output ports values and then release memory. If you want to this, you need sort expression to sort the input records before Java Transformation, Otherwise you may get duplicated student ids in output records.

             HashMap<String, Object> subjectSore;
             if (!isNull("STUDENTID")) {
                inputRows += 1;
                if (allRecords.containsKey(STUDENTID)) {
                    subjectSore = allRecords.get(STUDENTID);
                    subjectSore.put(SUBJECT, SCORE);
                } else { //build new key value pair
                      if (inputRows >=1000) {  // output records when processed 1000 rows  to release memory
                          HashMap<String, Object> outsubjectSoreMap;
                        for (Map.Entry<String, HashMap<String, Object>> entry : allRecords.entrySet()) {
                outsubjectSoreMap = entry.getValue();
                out_studentId = entry.getKey(); //get("STUDENTID");
                out_studentName = (String) outsubjectSoreMap.get("Name");


                if (outsubjectSoreMap.get("Math") == null) {
                    out_mathSore = "";
                } else {
                    out_mathSore = (String) outsubjectSoreMap.get("Math");
                }

                if (outsubjectSoreMap.get("English") == null) {
                    out_englishSore = "";
                } else {
                    out_englishSore = (String) outsubjectSoreMap.get("English");
                }

                if (outsubjectSoreMap.get("Science") == null) {
                    out_scienceScore = "";
                } else {
                    out_scienceScore = (String) outsubjectSoreMap.get("Science");
                }

               generateRow();
               outputRows +=1;
            } // end for loop: output records  
            allRecords.clear();  //release memory
                        }  //end if countRows >= 1000 

                    subjectSore = new HashMap<String, Object>();
                    subjectSore.put("Name", NAME);
                    subjectSore.put("Math", null);
                    subjectSore.put("English", null);
                    subjectSore.put("Science", null);
                    subjectSore.put(SUBJECT, SCORE);
                    //add subjectSore to hashmap
                    allRecords.put(STUDENTID, subjectSore);
                }
              }

Inside the java code, you can use java System.out.println() or PowerCenter Java API logInfo() to log debug information, which can be found in session log.

I added an Expression transformation to convert string to integer to match data type in target:

that’s it.

If you Java code ran into exceptions, open the session log and found the error messages. For example, here was the error I got:

JAVA PLUGIN_1762    [ERROR] java.lang.NullPointerException
JAVA PLUGIN_1762    [ERROR]     at com.informatica.powercenter.server.jtx.JTXPartitionDriverImplGen.execute(JTXPartitionDriverImplGen.java:382)

This “JTXPartitionDriverImplGen.java:382” gave you where exactly which row has the error. Open the Java transformation, click the ‘Full code” link, it pops up a window with full java code in it. Copy the java code and paste to your favorite editor, go the error line (in this example the line number is 382) then analyze why it threw you that error.

Conclusion
As you can see from the example above, using Java Transformation to pivot data in Informatica is straightforward and powerful. This transformation allows you to perform complex data manipulations with ease. Beyond pivoting data, Java Transformation can be leveraged for various tasks such as compressing data, encrypting/decrypting data, concatenating fields, and aggregating special fields.

With minor modifications to the code used in this example, you can extend the functionality to:

Compress Data: Implement data compression algorithms to reduce the size of data.
Encrypt/Decrypt Data: Use encryption libraries to secure sensitive data.
Concatenate Fields: Combine multiple fields into a single field for easier data management.
Aggregate Special Fields: Perform custom aggregations that might not be straightforward with standard transformations.
Moreover, Java Transformation is highly versatile, enabling you to build mappings for ETL processes that transfer data between non-SQL databases and SQL databases seamlessly.

By integrating Java Transformation into your Informatica mappings, you can enhance your ETL workflows, making them more efficient and easier to maintain. The flexibility and power of Java code within the ETL process open up numerous possibilities for data transformation and management.

DEV Community

Pivot Data Using Informatica Java Transformation

Top comments (0)

Read next

𝗛𝗼𝘄 𝘁𝗼 𝗦𝗰𝗮𝗻 𝗳𝗼𝗿 𝗩𝘂𝗹𝗻𝗲𝗿𝗮𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀 𝗶𝗻 𝗬𝗼𝘂𝗿 𝗣𝗿𝗼𝗷𝗲𝗰𝘁 𝗨𝘀𝗶𝗻𝗴 𝗦𝗻𝘆𝗸 🔍

Demystifying Algorithms: Modular Indexing

Demystifying Algorithms: Sliding Window

Fine-Tuning LLMs: Comparison of Collab, Kaggle and 6 other Platforms