DEV Community

Gary
Gary

Posted on

Pivot Data Using Informatica Java Transformation

Java transformation is very powerful in Informatica. But there are not many examples on web about using Java Transformation. That because for most mappings you don’t need Java transformation. But for some cases, using Java transformation will make your mappings simpler and easier. In this blog I will show you a full example of using Java transformation to pivot data.

Suppose you have source data like this:
Source:

STUDENTID NAME SUBJECT SCORE
1001 Alice Math 100
1001 Alice English 85
1002 Bob Math 98
1002 Bob English 95
1002 Bob Science 90
1003 Chris Math 95
1003 Chris Science 90
1003 Chris English 80

You want your target like this:

STUDENTID NAME MATH ENGLISH SCIENCE
1001 Alice 100 85 0
1002 Bob 98 95 90
1003 Chris 95 80 90

Normally, you would have to use Expression, Sorter and Aggregator Transformations to achieve this. As you can find lots of discussions on web using this way. Though it works, it’s tedious and unmaintainable especially when the target has many columns and many different aggregations.

Here are steps to use Java transformation to pivot the source data:

Step 1 – Create a mapping having source "PIVOT_TEST_SRC" and target " PIVOT_TEST_TGT":

Alt Text

Step 2 – In the create transformation window, select Java from the list and enter a name for the transformation :
Alt Text

Then choose Active type. With the Active type, the number of input rows can differ from the number of output rows.
Alt Text

Step 3 – Drag and drop all the Source qualifier columns to the Java transformation:

Alt Text

Step 4 – Double click the Java_pivot_src and choose the Ports tab; add ports out_studentId, out_studentName, out_mathScore, out_englishScore, and out_scienceSore as output ports:
Alt Text

Save your work. You mapping looks like this:

Alt Text

Step 5 – Double click the Java_pivot_src and choose the “Java Code” tab on top. By default, you are in “On Input Row” tab at the bottom.

Alt Text

There are other tabs “Import Packages”, “Helper Code”, “On End of Data”, etc, which are self-explanatory.

For this example, I am going to use Java Hashmap to save student name, subject and its score where key are Name, Math, English and Science (target column names) and values are student_name, respective subject scores :

                             Name ->  student_name
                             Math ->  math_score
                          English -> English_score
                          Science -> science_score
Enter fullscreen mode Exit fullscreen mode

This hashmap will be embedded in another hashmap (allRecords) where key is student id. So the final hashmap structure looks like:

                            |-->  Name ->  student_name
               Student_id-> |-->  Math ->   math_score
                            |--> English ->  English_score
                            |-->  Science-> science_score
Enter fullscreen mode Exit fullscreen mode

Based on this idea, put the following two lines in “Import Packages” tab:

import java.util.HashMap;
import java.util.Map;
Enter fullscreen mode Exit fullscreen mode

Select “Helper Code”, declare the variables required inside the Java code:

private HashMap<String, HashMap<String, Object>> allRecords= new HashMap<String, HashMap<String, Object>>();
static int inputRows=0;
static int outputRows=0;
Enter fullscreen mode Exit fullscreen mode

Select “On Input row” tab, write the following code:

             HashMap<String, Object> subjectSore;
             if (!isNull("STUDENTID")) {
                inputRows += 1;
                if (allRecords.containsKey(STUDENTID)) {
                    subjectSore = allRecords.get(STUDENTID);
                    subjectSore.put(SUBJECT, SCORE);
                } else { //build new key value pair
                    subjectSore = new HashMap<String, Object>();
                    subjectSore.put("Name", NAME);
                    subjectSore.put("Math", null);
                    subjectSore.put("English", null);
                    subjectSore.put("Science", null);
                    subjectSore.put(SUBJECT, SCORE);
                    //add subjectSore to hashmap
                    allRecords.put(STUDENTID, subjectSore);
                }
              }
Enter fullscreen mode Exit fullscreen mode

“On End of Data” tab, put the following code for output results:

HashMap<String, Object> outsubjectSoreMap;
for (Map.Entry<String, HashMap<String, Object>> entry : allRecords.entrySet()) {
                outsubjectSoreMap = entry.getValue();
                out_studentId = entry.getKey(); //get("STUDENTID");
                out_studentName = (String) outsubjectSoreMap.get("Name");


                if (outsubjectSoreMap.get("Math") == null) {
                    out_mathSore = "";
                } else {
                    out_mathSore = (String) outsubjectSoreMap.get("Math");
                }

                if (outsubjectSoreMap.get("English") == null) {
                    out_englishSore = "";
                } else {
                    out_englishSore = (String) outsubjectSoreMap.get("English");
                }

                if (outsubjectSoreMap.get("Science") == null) {
                    out_scienceScore = "";
                } else {
                    out_scienceScore = (String) outsubjectSoreMap.get("Science");
                }

               generateRow();
               outputRows +=1;
            } // end for loop: output records  
            allRecords.clear();  //release memory
            logInfo("The total number of records generated is: " +  outputRows);

            inputRows=0; 
            outputRows=0; 
Enter fullscreen mode Exit fullscreen mode

generateRow() is PowerCenter Java API, which generates rows according to the defined output ports values.
If the input records are too many to hold in the hashmap, you can use generateRow() in the “On Input row” tab to generate some output ports values and then release memory. If you want to this, you need sort expression to sort the input records before Java Transformation, Otherwise you may get duplicated student ids in output records.

             HashMap<String, Object> subjectSore;
             if (!isNull("STUDENTID")) {
                inputRows += 1;
                if (allRecords.containsKey(STUDENTID)) {
                    subjectSore = allRecords.get(STUDENTID);
                    subjectSore.put(SUBJECT, SCORE);
                } else { //build new key value pair
                      if (inputRows >=1000) {  // output records when processed 1000 rows  to release memory
                          HashMap<String, Object> outsubjectSoreMap;
                        for (Map.Entry<String, HashMap<String, Object>> entry : allRecords.entrySet()) {
                outsubjectSoreMap = entry.getValue();
                out_studentId = entry.getKey(); //get("STUDENTID");
                out_studentName = (String) outsubjectSoreMap.get("Name");


                if (outsubjectSoreMap.get("Math") == null) {
                    out_mathSore = "";
                } else {
                    out_mathSore = (String) outsubjectSoreMap.get("Math");
                }

                if (outsubjectSoreMap.get("English") == null) {
                    out_englishSore = "";
                } else {
                    out_englishSore = (String) outsubjectSoreMap.get("English");
                }

                if (outsubjectSoreMap.get("Science") == null) {
                    out_scienceScore = "";
                } else {
                    out_scienceScore = (String) outsubjectSoreMap.get("Science");
                }

               generateRow();
               outputRows +=1;
            } // end for loop: output records  
            allRecords.clear();  //release memory
                        }  //end if countRows >= 1000 

                    subjectSore = new HashMap<String, Object>();
                    subjectSore.put("Name", NAME);
                    subjectSore.put("Math", null);
                    subjectSore.put("English", null);
                    subjectSore.put("Science", null);
                    subjectSore.put(SUBJECT, SCORE);
                    //add subjectSore to hashmap
                    allRecords.put(STUDENTID, subjectSore);
                }
              }
Enter fullscreen mode Exit fullscreen mode

Inside the java code, you can use java System.out.println() or PowerCenter Java API logInfo() to log debug information, which can be found in session log.

I added an Expression transformation to convert string to integer to match data type in target:
Alt Text

that’s it.

If you Java code ran into exceptions, open the session log and found the error messages. For example, here was the error I got:

JAVA PLUGIN_1762    [ERROR] java.lang.NullPointerException
JAVA PLUGIN_1762    [ERROR]     at com.informatica.powercenter.server.jtx.JTXPartitionDriverImplGen.execute(JTXPartitionDriverImplGen.java:382)
Enter fullscreen mode Exit fullscreen mode

This “JTXPartitionDriverImplGen.java:382” gave you where exactly which row has the error. Open the Java transformation, click the ‘Full code” link, it pops up a window with full java code in it. Copy the java code and paste to your favorite editor, go the error line (in this example the line number is 382) then analyze why it threw you that error.

As you can see from the example above, how easy it is to pivot date using Java transformation. Java transformation can be used to compress data, encrypt/decrypt data, concatenate some fields, aggregate special fields and much more. Just minor changes of the code in this example, you can easily build mappings to ETL data from non-SQL database to an SQL database and vice versa.

Top comments (0)