The other day a client provided me with a Rest API to pass me data that I had to import into my database.
However, we found that the API doesn’t work well when there’s too much data. The client then provided me with an XML file with the response of their service: a big XML file (50MB) with multiple loop elements!
The XML file contained thousands of elements, every element represented a different product: I needed to create a JSON file for each product.
Let’s see how to achieve this result with Talend Open Studio for Data Integration, without using external components from the marketplace.
1. Java routine
First of all I create a new routine, adding json-org.jar as external library:
Java code:
package routines;
import org.json.JSONException;
import org.json.JSONObject;
import org.json.XML;
public class XmlToJSON {
public static String xmlToJson(String xml){
String json ="";
try {
JSONObject xmlJSONObj = XML.toJSONObject(xml);
json = xmlJSONObj.toString();
} catch (JSONException je) {
System.out.println(je.toString());
}
return json;
}
}
2. Talend designer
In the Talend designer I add 3 simple components:
2a. tFileInputDelimited
I start my subjob with a tFileInputDelimited:
2b. tNormalize
I use tNormalize to split my XML by <PRODUCT>
2c. tJavaRow
in tJavaRow I clean a couple of tags (optional) and I call the Java routine mentioned before.
output_row.json = XmlToJSON.xmlToJson(xmlString);
That's all!
PS. I couldn't use the Talend wizard "Metadata->XML File" because that wizard causes crashes when you process a big XML file to retrieve its schema. But as you can see we did without that wizard ;)
Top comments (0)