<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Oluwabukunmi Ige</title>
    <description>The latest articles on DEV Community by Oluwabukunmi Ige (@oluwabukunmige).</description>
    <link>https://dev.to/oluwabukunmige</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F631791%2F0b4f8dfc-742c-482f-b493-41705011f50b.jpeg</url>
      <title>DEV Community: Oluwabukunmi Ige</title>
      <link>https://dev.to/oluwabukunmige</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/oluwabukunmige"/>
    <language>en</language>
    <item>
      <title>How To Capture More than Group When Using BigQuery Regex</title>
      <dc:creator>Oluwabukunmi Ige</dc:creator>
      <pubDate>Sat, 15 May 2021 16:21:07 +0000</pubDate>
      <link>https://dev.to/oluwabukunmige/how-to-capture-more-than-group-when-using-bigquery-regex-4jm3</link>
      <guid>https://dev.to/oluwabukunmige/how-to-capture-more-than-group-when-using-bigquery-regex-4jm3</guid>
      <description>&lt;p&gt;I have been working on a particular task that required me to extract dates from a pool of text that was stored Big query. Trust me when I say that the data was quite messy and every date format you can imagine was in attendance.&lt;/p&gt;

&lt;p&gt;Hence I got to work. Went through the big query docs and the function that was more suitable for my task was  &lt;code&gt;REGEXP_EXTRACT&lt;/code&gt;. How it was supposed to work was that I would specify the date format I wanted to extract in the regex function and it would extract it for me.&lt;br&gt;
Something like this:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT REGEXP_EXTRACT('210118164541-1/18/2021',r"(0[1-9]|1[0-2])(/)(0[1-9]|[12]\d|30|31)(/)(\d{4})")&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In this case, the date was &lt;code&gt;1/18/2021&lt;/code&gt; and the job of the regex function was to extract the date.&lt;/p&gt;

&lt;p&gt;The issue I encountered was that the &lt;code&gt;regex_extract&lt;/code&gt; function doesn't work with more than one capturing group. What I mean by capturing group is mainly due to the fact that I could not predict the date format in the data and each part of the date needed a different capturing group. Something like &lt;br&gt;
&lt;code&gt;DD/MM/YY or MM/DD/YY&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Unfortunately, the regex extract function only accepted one capturing group. &lt;/p&gt;

&lt;p&gt;Citing that &lt;code&gt;Regular expressions passed into extraction functions must not have more than 1 capturing group&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Did so some google and Stackoverflow searches before was able to figure out a solution. &lt;/p&gt;

&lt;p&gt;The solution was that since &lt;code&gt;REGEXP_EXTRACT&lt;/code&gt; only takes one capturing group you had to make the other groups non-capturing using &lt;code&gt;?:&lt;/code&gt;. This allows the function to capture more than one group.&lt;/p&gt;

&lt;p&gt;The code below shows this in practice.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT REGEXP_EXTRACT('210118164541-1/18/2021',r"(?:0[1-9]|[12]\d|30|31)(?:.)(?:0[1-9]|1[0-2])(?:/)(?:\d{4}|\d{2}")&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The code above would perfectly extract the date &lt;code&gt;1/18/2021&lt;/code&gt; as needed.&lt;/p&gt;

&lt;p&gt;In conclusion, if you ever encounter a task that requires you to extract more than one capturing group using regex on BigQuery remember that you have to make the other groups non-capturing using the &lt;code&gt;?:&lt;/code&gt; notation and you are good to go.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>bigquery</category>
    </item>
  </channel>
</rss>
