<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Plugaru Tudor</title>
    <description>The latest articles on DEV Community by Plugaru Tudor (@plugarut).</description>
    <link>https://dev.to/plugarut</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F30340%2F2595da7d-5ff3-4b50-871c-66ad84ab08a8.jpg</url>
      <title>DEV Community: Plugaru Tudor</title>
      <link>https://dev.to/plugarut</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/plugarut"/>
    <language>en</language>
    <item>
      <title>Schema changes and data migration in BigQuery</title>
      <dc:creator>Plugaru Tudor</dc:creator>
      <pubDate>Wed, 06 Oct 2021 15:58:51 +0000</pubDate>
      <link>https://dev.to/plugarut/schema-changes-and-data-migration-in-bigquery-24gh</link>
      <guid>https://dev.to/plugarut/schema-changes-and-data-migration-in-bigquery-24gh</guid>
      <description>&lt;p&gt;A few months ago, I had to perform data migration in BigQuery. I hope that you already know what is BigQuery, if not, google it. It’s a nice tool provided by Google if you want to store huge amounts of data at a low cost. &lt;/p&gt;

&lt;p&gt;At the first sight, it does not sound too difficult and I immediately started to write a SQL query that would just read from one table and store the data in another table. Hey, what can be difficult in this 🤷 The trick was that the new table, had a slightly different schema, and the original data needed to be changed a bit to “fit” in the new table. &lt;/p&gt;

&lt;p&gt;Inside BigQuery tables, you can store nested data, it supports having columns of type &lt;code&gt;RECORD&lt;/code&gt;, it’s something like a table inside another table. And they went the extra mile and made BigQuery support arrays of &lt;code&gt;RECORD&lt;/code&gt;s, pretty neat if you ask me.&lt;/p&gt;

&lt;p&gt;So, back to the problem. In my case, in the old table, one of the columns from a &lt;code&gt;RECORD&lt;/code&gt; was of type &lt;code&gt;STRING&lt;/code&gt; but in the new one, it was of type &lt;code&gt;INT64&lt;/code&gt;. So, I had to cast that column first, before being able to store it nicely in the new table. &lt;br&gt;
Let’s say you have a table named order with the following schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INTEGER"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"store"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"STRING"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lines"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"record"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"repeated"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"fields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INTEGER"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"item"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"STRING"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qty"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"STRING"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see the column &lt;code&gt;lines&lt;/code&gt; which are of type &lt;code&gt;RECORD&lt;/code&gt; and it’s also &lt;code&gt;REPEATED&lt;/code&gt;, meaning it’s a list of records.&lt;br&gt;
So, in my case, the column &lt;code&gt;qty&lt;/code&gt; from &lt;code&gt;lines&lt;/code&gt; record, in the new table was of type &lt;code&gt;INT64&lt;/code&gt;, but you can see in the old one is of type &lt;code&gt;STRING&lt;/code&gt;. How do we reach that column with SQL and cast that column?&lt;br&gt;
We need to use &lt;code&gt;REPLACE&lt;/code&gt; to first unnest the &lt;code&gt;RECORD&lt;/code&gt;, then convert it to an &lt;code&gt;ARRAY&lt;/code&gt; and perform the casting. The SQL query looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;STRUCT&lt;/span&gt; &lt;span class="n"&gt;l_items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;EXCEPT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;qty&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;UNNEST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;l_items&lt;/span&gt; 
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;`project_id.dataset_id.order`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the query above, I used the &lt;code&gt;EXEPT&lt;/code&gt; operation to not include that column at all in the result, so that the new table to have that column empty. This worked fine in my case since because of the wrong type, there was no data, and was no point in casting the column at all. But, if you need cast is, you just need to replace the EXCEPT part, with the proper SQL operation.&lt;/p&gt;

&lt;p&gt;So, what if you have to cast a column from a list of records inside another list of records and so on? You just use the same technique and apply recursion to do this. And of course, I had this situation and I had to write this recursive SQL to exclude a column, but hey, nothing it’s easy in this life... It took me around 3 hours to write to SQL, cause' I'm not the most skilled in SQL, but, I got the job done 💪.&lt;/p&gt;

&lt;p&gt;That’s it, folks!&lt;/p&gt;

</description>
      <category>googlecloud</category>
    </item>
  </channel>
</rss>
