DEV Community

fabiocaimi
fabiocaimi

Posted on

Convert XML to JSON with Talend

The other day a client provided me with a Rest API to pass me data that I had to import into my database.
However, we found that the API doesn’t work well when there’s too much data. The client then provided me with an XML file with the response of their service: a big XML file (50MB) with multiple loop elements!

The XML file contained thousands of elements, every element represented a different product: I needed to create a JSON file for each product.

Let’s see how to achieve this result with Talend Open Studio for Data Integration, without using external components from the marketplace.

1. Java routine
First of all I create a new routine, adding json-org.jar as external library:

Talend create new routine and add external library

Java code:

package routines;
import org.json.JSONException;
import org.json.JSONObject;
import org.json.XML;
public class XmlToJSON {

public static String xmlToJson(String xml){
    String json ="";
    try {
        JSONObject xmlJSONObj = XML.toJSONObject(xml);
        json = xmlJSONObj.toString();
    } catch (JSONException je) {
        System.out.println(je.toString());
    }
    return json;
}
Enter fullscreen mode Exit fullscreen mode

}

2. Talend designer
In the Talend designer I add 3 simple components:

convert XML to JSON with Talend

2a. tFileInputDelimited
I start my subjob with a tFileInputDelimited:

Image description

2b. tNormalize
I use tNormalize to split my XML by <PRODUCT>

tNormalize

schema of tNormalize

2c. tJavaRow

in tJavaRow I clean a couple of tags (optional) and I call the Java routine mentioned before.

tJavaRow

output_row.json = XmlToJSON.xmlToJson(xmlString);

Schema of tJavaRow:
schema of tJavaRow

That's all!

PS. I couldn't use the Talend wizard "Metadata->XML File" because that wizard causes crashes when you process a big XML file to retrieve its schema. But as you can see we did without that wizard ;)

Top comments (0)