<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: fabiocaimi</title>
    <description>The latest articles on DEV Community by fabiocaimi (@fabiocaimi).</description>
    <link>https://dev.to/fabiocaimi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F369317%2F332b80a7-ebac-4c34-90a1-6e7b93a0ad12.png</url>
      <title>DEV Community: fabiocaimi</title>
      <link>https://dev.to/fabiocaimi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/fabiocaimi"/>
    <language>en</language>
    <item>
      <title>Convert XML to JSON with Talend</title>
      <dc:creator>fabiocaimi</dc:creator>
      <pubDate>Fri, 14 Apr 2023 15:59:16 +0000</pubDate>
      <link>https://dev.to/fabiocaimi/convert-xml-to-json-with-talend-54cf</link>
      <guid>https://dev.to/fabiocaimi/convert-xml-to-json-with-talend-54cf</guid>
      <description>&lt;p&gt;The other day a client provided me with a Rest API to pass me data that I had to import into my database. &lt;br&gt;
However, we found that the API doesn’t work well when there’s too much data. The client then provided me with an XML file with the response of their service: a big XML file (50MB) with multiple loop elements!&lt;/p&gt;

&lt;p&gt;The XML file contained thousands of elements, every element represented a different product: I needed to create a JSON file for each product.&lt;/p&gt;

&lt;p&gt;Let’s see how to achieve this result with Talend Open Studio for Data Integration, without using external components from the marketplace.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Java routine&lt;/strong&gt;&lt;br&gt;
First of all I create a new routine, adding &lt;a href="https://mavenlibs.com/jar/file/org.json/json"&gt;json-org.jar&lt;/a&gt; as external library:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CBbi32vw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pxrht7ojrzd65z94srd2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CBbi32vw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pxrht7ojrzd65z94srd2.png" alt="Talend create new routine and add external library" width="339" height="424"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Java code:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;package routines;&lt;br&gt;
import org.json.JSONException;&lt;br&gt;
import org.json.JSONObject;&lt;br&gt;
import org.json.XML;&lt;br&gt;
public class XmlToJSON {&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public static String xmlToJson(String xml){
    String json ="";
    try {
        JSONObject xmlJSONObj = XML.toJSONObject(xml);
        json = xmlJSONObj.toString();
    } catch (JSONException je) {
        System.out.println(je.toString());
    }
    return json;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;code&gt;}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Talend designer&lt;/strong&gt;&lt;br&gt;
In the Talend designer I add 3 simple components:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0rxu7JBs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/igm8x5l7n78v7cxgd6qc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0rxu7JBs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/igm8x5l7n78v7cxgd6qc.png" alt="convert XML to JSON with Talend" width="508" height="109"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2a. tFileInputDelimited&lt;/strong&gt;&lt;br&gt;
I start my subjob with a tFileInputDelimited:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ldEuzoNh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xfzhczrqcvhdydsmu28a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ldEuzoNh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xfzhczrqcvhdydsmu28a.png" alt="Image description" width="800" height="222"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2b. tNormalize&lt;/strong&gt;&lt;br&gt;
I use tNormalize to split my XML by &lt;code&gt;&amp;lt;PRODUCT&amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--J4CKNG7a--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7e39mbp68ncgphmnuoqf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--J4CKNG7a--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7e39mbp68ncgphmnuoqf.png" alt="tNormalize" width="522" height="132"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9sNpo9Ug--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6541j1b5wd6sjrmdldma.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9sNpo9Ug--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6541j1b5wd6sjrmdldma.png" alt="schema of tNormalize" width="800" height="171"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2c. tJavaRow&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;in tJavaRow I clean a couple of tags (optional) and I call the Java routine mentioned before.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jECTc2zb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4hpmetf1skw4czun3dfz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jECTc2zb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4hpmetf1skw4czun3dfz.png" alt="tJavaRow" width="662" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;output_row.json = XmlToJSON.xmlToJson(xmlString);&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Schema of tJavaRow:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--H8F-kz2Z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/78apoak5vbh1h4rnikzw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--H8F-kz2Z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/78apoak5vbh1h4rnikzw.png" alt="schema of tJavaRow" width="800" height="263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's all!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;PS. I couldn't use the Talend wizard "Metadata-&amp;gt;XML File" because that wizard causes crashes when you process a big XML file to retrieve its schema. But as you can see we did without that wizard ;)&lt;/p&gt;

</description>
      <category>talend</category>
      <category>xml</category>
      <category>json</category>
      <category>dataintegration</category>
    </item>
  </channel>
</rss>
