<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alex Antra</title>
    <description>The latest articles on DEV Community by Alex Antra (@alexantra).</description>
    <link>https://dev.to/alexantra</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F141003%2F2e555ee4-0ec3-4fc3-abf7-bcd45e2e069c.jpeg</url>
      <title>DEV Community: Alex Antra</title>
      <link>https://dev.to/alexantra</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/alexantra"/>
    <language>en</language>
    <item>
      <title>Data abstinence in the name of being an ethical data company is not a sound operating model</title>
      <dc:creator>Alex Antra</dc:creator>
      <pubDate>Mon, 28 Sep 2020 09:01:52 +0000</pubDate>
      <link>https://dev.to/alexantra/data-abstinence-in-the-name-of-being-an-ethical-data-company-is-not-a-sound-operating-model-2661</link>
      <guid>https://dev.to/alexantra/data-abstinence-in-the-name-of-being-an-ethical-data-company-is-not-a-sound-operating-model-2661</guid>
      <description>&lt;p&gt;We currently exist in a dichotomy when it comes to how companies use our data.&lt;/p&gt;

&lt;p&gt;At one end of the spectrum we have companies like Facebook and Google who have in the lovely words of &lt;a href="https://haveibeenpwned.com/"&gt;HIBP&lt;/a&gt; have; leaked, sold, redistributed and abused [our data] to our detriment and beyond our control. I call them unethical data companies.&lt;br&gt;
At the other end of the spectrum we have companies like DuckDuckGo and Signal, who strive to be seen as the ethical alternative.&lt;/p&gt;

&lt;p&gt;However the thing that makes them ethical is the fact that they either don’t store your information or that they encrypt it to the point that not even they can look at it. Meaning that even if a government agency had a legally enforceable warrant or a rogue staff member wanted to go looking, it would not be possible.&lt;/p&gt;

&lt;p&gt;What makes them ethical is not a comprehensive and transparent framework that explains exactly how the data is used, but rather that they have just removed all possible avenues for themselves to breach your trust. It’s like preventing getting cancer in your arm by amputating the arm.&lt;/p&gt;

&lt;p&gt;Simply put this is the nuclear option, and like any nuclear option there will always be fallout.&lt;/p&gt;

&lt;p&gt;Great data products require access to lots of good quality data, the more the merrier.&lt;/p&gt;

&lt;p&gt;Facebook and Google have so much of your data they can swing a dead cat 180 degrees and accidentally release three great data products.&lt;/p&gt;

&lt;p&gt;For example Googles predictive typing feature is built off of a decade of reading your emails.&lt;/p&gt;

&lt;p&gt;Companies that don’t collect enough data or obscure it through encryption as a blanket rule are potentially pushing themselves out the market. (Obfuscation where appropriate is recommended obviously)&lt;/p&gt;

&lt;p&gt;Without the data to make market competitive products their ability to said products at the same pace as the unethical companies is effectively the same as trying to win poker with a bad hand, you’ll only succeed if the other person makes a mistake.&lt;/p&gt;

&lt;p&gt;This potentially means that unethical data companies may be able to sink or even just wait out their more ethical competitors.&lt;/p&gt;



&lt;p&gt;DuckDuckGo is celebrating good growth at the moment but we’ve all seen market factors sink more mature companies. All it takes is for the next new thing to require the very data DuckDuckGo has hidden from themselves and they wont be able to compete.&lt;/p&gt;

&lt;p&gt;What benefits the user, the industry, and these ethical data companies is to stop using abstinence as a data strategy and to move towards the center of the current spectrum.&lt;/p&gt;

&lt;p&gt;We need our ethical alternative product companies to have access to the data they need to stay in the market, especially while juggernauts like Facebook still exist.&lt;/p&gt;

&lt;p&gt;However to be able to access that data to stay competitive these companies must establish a data strategy that puts building customer trust front and center.&lt;/p&gt;

&lt;p&gt;A data strategy that achieves this by learning how to thrive in the bounds of these three principles: transparency, controls, and education.&lt;/p&gt;

&lt;p&gt;To be transparent you need to move beyond these vague and all encompassing privacy polices that are really used to be a ‘get out of jail free’ card. You need to explain to the user what kind of data you store, how and why you use it internally, how and why you use it externally (if you have those needs) and what sort of things you would be open to doing in the future that you don’t quite do just now. And you need to do this in a way that the average user can understand.&lt;/p&gt;

&lt;p&gt;You also need to pair that with some some form of framework so your customers know the rules you’re committed to play between. Something like ‘we will never send your personal data to a third party but we may share aggregated non-identifying usage figures under the following examples.’&lt;/p&gt;

&lt;p&gt;Say what you will do, why you’re doing it and what you won’t do. Sentences like ‘we may use your data to improve our products’ or ‘we may share data with third parties’ are just not good enough any more.&lt;/p&gt;

&lt;p&gt;Then once you’ve written a proper privacy policy you need to provide your end user with granular control.&lt;/p&gt;

&lt;p&gt;We need to move away from this gun to the users head approach of denying access to the product if they don’t agree to every single demand of yours. It’s blackmail and doesn’t build trust, it just tests it until a competitor comes along.&lt;/p&gt;

&lt;p&gt;Giving the end user an appropriate level of control is paramount to building that trusting relationship. If they can go into your product and choose to prevent their data from being used for X but still allow it to be used for Y you are not only building trust through control, but leading them down a path of making informed decisions.&lt;/p&gt;

&lt;p&gt;You want to provide so much freedom and control in your own product that the user can’t help but notice how everyone else doesn’t live up to the same standards.&lt;/p&gt;

&lt;p&gt;Then once you have those processes in place you need to keep communicating with your users and give them the space to engage back.&lt;/p&gt;

&lt;p&gt;If you’re changing how you’re using the data, if you’re collecting new data, or if you’re about to do something controversial, tell your users early, give them the ability to opt out before hand and reward them for exercising their choice.&lt;/p&gt;

&lt;p&gt;If you do this, you may find that people who opt out as a first response may opt-in at a later date. Either because they believe in your product enough to change their stance or because they spoke with others and realized that it may not have been as bad as they feared.&lt;/p&gt;

&lt;p&gt;Also communicate about things happening around you and your customer. Actions done by other companies may have negative impacts on you and saying nothing will often perpetuate them.&lt;/p&gt;

&lt;p&gt;If a rival company is using ethical data jingo for their own benefit like Facebook does with their end-to-end encryption of WhatsApp messages you have need to face into that and explain why you are still the ethical choice, especially if what the company is doing is lying to their users. &lt;/p&gt;

&lt;p&gt;If the only thing keeping users of your product is the ethical sales pitch then the second Google and Facebook have appeared to ‘fix their ways’ then your user base will go straight back to them.&lt;/p&gt;

&lt;p&gt;I love these new privacy conscious companies, I’ve been using DuckDuckGo for over two years now and have found my ads less creepy and my data being involved in less breaches.&lt;/p&gt;

&lt;p&gt;However I fear the thing that makes them different might be the thing that puts them out of business. Building an entire strategy around doing the opposite of the bad guys is arguably the easy way around the problem and can potentially prevent them from being competitive or surviving.&lt;/p&gt;

&lt;p&gt;It’s about time companies serious about being trusted by their users actually took the time to tackle that problem rather than chopping off every single possible avenue that could lose them user trust.&lt;/p&gt;



&lt;p&gt;&lt;em&gt;I currently work for a great company called Xero doing all sorts of fun data things. When you're reading my articles I need you to understand that my words are my own, I'm not speaking on behalf of my employer and If i'm talking about something negative in the field that may not be indicative of Xero, I've worked many interesting roles and I read a lot about my field.&lt;/em&gt;&lt;/p&gt;


&lt;h4&gt;
  
  
  Who am I?
&lt;/h4&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;
      &lt;div class="ltag__twitter-tweet__media ltag__twitter-tweet__media__video-wrapper"&gt;
        &lt;div class="ltag__twitter-tweet__media--video-preview"&gt;
          &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LTwjgGRY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/tweet_video_thumb/EbrOQOzUMAAK9XQ.jpg" alt="unknown tweet media content"&gt;
          &lt;img src="/assets/play-butt.svg" class="ltag__twitter-tweet__play-butt" alt="Play butt"&gt;
        &lt;/div&gt;
        &lt;div class="ltag__twitter-tweet__video"&gt;
          
            
          
        &lt;/div&gt;
      &lt;/div&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--nC8rVxhG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/1120951641866326017/tzzd0z0s_normal.jpg" alt="Ron-Resist Racism-Soak 🏳️‍🌈🇳🇿 profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Ron-Resist Racism-Soak 🏳️‍🌈🇳🇿
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        @ronsoak
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      Hi, I'm Ron!&lt;br&gt;🛰️ I ❤️ space!&lt;br&gt;☕ Coffee Addict&lt;br&gt;📊 Data Analysis Team Lead for &lt;a href="https://twitter.com/Xero"&gt;@Xero&lt;/a&gt;&lt;br&gt;⚠️ My words are my own.&lt;br&gt;✍️ Writer on &lt;a href="https://twitter.com/ThePracticalDev"&gt;@ThePracticalDev&lt;/a&gt; (&lt;a href="https://t.co/LMuC1tMLom"&gt;dev.to/ronsoak&lt;/a&gt;)&lt;br&gt;✍️ Writer on &lt;a href="https://twitter.com/Medium"&gt;@Medium&lt;/a&gt; (&lt;a href="https://t.co/OvW7Vsvccs"&gt;medium.com/@ronsoak&lt;/a&gt;)&lt;br&gt;|🇳🇿 |🇬🇧 |🏳️‍🌈 
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      11:11 AM - 29 Jun 2020
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=1277560230725906432" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=1277560230725906432" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=1277560230725906432" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;h4&gt;
  
  
  You should read....
&lt;/h4&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


</description>
      <category>data</category>
    </item>
    <item>
      <title>Google plays the victim in an open letter to Australians. A wolf in sheep’s clothing.</title>
      <dc:creator>Alex Antra</dc:creator>
      <pubDate>Mon, 17 Aug 2020 11:01:46 +0000</pubDate>
      <link>https://dev.to/alexantra/google-plays-the-victim-in-an-open-letter-to-australians-a-wolf-in-sheep-s-clothing-be6</link>
      <guid>https://dev.to/alexantra/google-plays-the-victim-in-an-open-letter-to-australians-a-wolf-in-sheep-s-clothing-be6</guid>
      <description>

&lt;p&gt;&lt;em&gt;I currently work for a great company called Xero doing all sorts of fun data things. When you're reading my articles I need you to understand that my words are my own, I'm not speaking on behalf of my employer and If i'm talking about something negative in the field that may not be indicative of Xero, I've worked many interesting roles and I read a lot about my field.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Open letters seem to be the new way tech throw their weight around, sometimes it’s for good (&lt;a href="https://www.cnet.com/news/google-apple-amazon-and-others-sign-open-letter-opposing-anti-lgbtq-legislation/?utm_source=reddit.com#ftag=CADf328eec"&gt;example&lt;/a&gt;) and sometimes it’s for evil (&lt;a href="https://www.digitaltrends.com/social-media/apple-ios-14-facebook-ad-business/"&gt;example&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Today we an open letter from Google. And it’s bad. &lt;a href="https://about.google/intl/ALL_au/google-in-australia/an-open-letter/"&gt;Link&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The letter pushed on Googles Search Engine and YouTube pages outlined how the upcoming &lt;a href="https://www.accc.gov.au/focus-areas/digital-platforms/draft-news-media-bargaining-code"&gt;Australian News Media Bargaining Code&lt;/a&gt; would not only ‘hurt’ their services, but more importantly ‘hurt’ you, the loyal customer they care so deeply about.&lt;/p&gt;

&lt;p&gt;The code, currently only aimed at Facebook and Google (Facebook’s open letter probably pending) is designed to address the inequalities that come where these major digital companies can behave like monopolies. &lt;/p&gt;

&lt;p&gt;The kind of behavior that means Australian news companies may loose out on ‘airtime’ to multi-billion media companies willing to outbid them, or where Australian news companies have to accept less favorable agreements with Facebook and Google because other parties can pay more money, or where Australian news stories show up on Facebook feeds or Google Search results without receiving any of the ad revenue each company makes from those interactions.&lt;/p&gt;

&lt;p&gt;In a letter straight out of the PR handbook, and later signed by Australia’s Managing Director to give it that personal touch, Google claims that having to give space to local Australian news media is very very very bad for Australians.&lt;/p&gt;

&lt;p&gt;How bad?&lt;/p&gt;

&lt;p&gt;Well this is where things go a bit sideways…&lt;/p&gt;

&lt;h1&gt;
  
  
  Apparently it puts your google search data at risk.
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9VKRHvEb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/mxw5djbzzs4axwacgm38.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9VKRHvEb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/mxw5djbzzs4axwacgm38.PNG" alt="Tim Tams" width="785" height="289"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What does that have to do with the news I hear you say?&lt;br&gt;
Well the code has some key stipulations that the big G and F must comply with to ensure that the code is being fairly applied.&lt;br&gt;
The first is that they must proactively give notice of any changes to the algorithm. Secondly, they must recognize the news as original content, and thirdly they must provide usage data on users interacting with the news content.&lt;br&gt;
It’s that last one google is crying foul of. Effectively the code mandates that Google must provide user data to prove that Google isn’t fudging the numbers.&lt;/p&gt;

&lt;p&gt;So is your personal data at risk?&lt;/p&gt;

&lt;p&gt;No more than usual.&lt;/p&gt;

&lt;p&gt;If you read their privacy policies(&lt;a href="https://www.facebook.com/about/privacy"&gt;Facebook&lt;/a&gt;, &lt;a href="https://policies.google.com/privacy?hl=en-US#infosharing"&gt;Google&lt;/a&gt;), it’s absolute standard practice to share non-personally identifiable data with third parties. Google explicitly says:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;We may share non-personally identifiable information publicly and with our partners — like publishers, advertisers, developers, or rights holders. For example, we share information publicly to show trends about the general use of our services. We also allow specific partners to collect information from your browser or device for advertising and measurement purposes using their own cookies or similar technologies.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Nowhere in the code does it say personally identifiable data, it merely wants to know what demographics are clicking their links so that they can cater their content, because even news outlets still have to do good marketing. Performance and engagement data is something Google and Facebook have always provided their partners, and this isn’t any different.&lt;br&gt;
Which means, no, Google won’t be sharing your search history with Australian News sites.&lt;br&gt;
Google’s entire profit model is built off of your user data helping them sell trends and usage metrics. They have offered every single person and company who uses their services the ability to see who is engaging with their content. So to pivot and complain that they will be forced to provide usage data is a deliberate scare tactic to confuse those who don’t understand just how much of your data Google uses and abuses.&lt;/p&gt;
&lt;h1&gt;
  
  
  They’ve threatened to make Australians pay for Google.
&lt;/h1&gt;

&lt;p&gt;Google claims this code ‘hurts the free service you use.’&lt;/p&gt;

&lt;p&gt;Read this, and read it again until you realize the thinly veiled threat that it us.&lt;/p&gt;

&lt;p&gt;It’s currently free to use Google Search, Google News,Google Discover, and YouTube. What’s the only thing that could risk a free service?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Having to pay for it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Does the new code enforce that Google should charge for their products. Nope.&lt;/p&gt;

&lt;p&gt;This means Google is threatening to charge Australians to use their free services, if they don’t get their way.&lt;/p&gt;

&lt;p&gt;Wow this Open letter went from nice to ‘nice search engine shame if something happened to it’ real quick.&lt;/p&gt;

&lt;p&gt;If nothing else it tells you Google is in the wrong here it should be that the first thing you heard from Google about this code was a bold faced threat.&lt;/p&gt;

&lt;p&gt;Another fun claim in the open letter is that apparently the code also gives Big Media Companies Special Treatment, and unfairly benefits large media companies.&lt;/p&gt;

&lt;p&gt;So let’s dive in and take a closer look at which Big Media Companies the code says is eligible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Companies whose annual revenue must exceed $150k on average. While this excludes small news companies $150k revenue is more ‘medium’ sized company, not big.&lt;/li&gt;
&lt;li&gt;Outlets that predominantly produce ‘core news’ and publish online. Which means places that cover political, crime, and works journalism. Not gossip mags.&lt;/li&gt;
&lt;li&gt;Outlets that adhere to professional editorial standards&lt;/li&gt;
&lt;li&gt;They maintain editorial independence from the subject of their news coverage: Which means controlled media, fake news, magazines, politically owned, or influenced media companies are not eligable.&lt;/li&gt;
&lt;li&gt;They operate primarily in Australia for the purpose of serving Australians.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not exactly the scary Big Media Company version of Big Brother.&lt;/p&gt;

&lt;p&gt;The code won't actually cover a lot of Australian media companies, just primarily national and state news companies. This is a major lifeline to local newspapers which are vital for a functioning democracy, doing so much to push transparency at the local level. It also takes a stand against fake news and outside influences in the media Australians are exposed to. This allows a fighting chance in case Russia decides to try and turn the next election using the same news influencing tactics in 2016. (&lt;a href="https://en.wikipedia.org/wiki/Russian_interference_in_the_2016_United_States_elections"&gt;link&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;So that's what the code outlines is it’s rules to play by.&lt;/p&gt;

&lt;p&gt;But what are Googles own rules on ensuring their services don’t unfairly benefit big media companies?&lt;/p&gt;

&lt;p&gt;Interestingly, the open letter doesn’t cover that, though they did say ‘You’ll hear more from us in the coming days — stay tuned.” so maybe it’s coming, but the open letter does say one thing that’s important to note:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;we already pay them millions of dollars and send them billions of free clicks every year.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The reason why I draw your attention to this is that it’s not relevant. The code isn’t arguing that the news media companies don’t already earn money and get free clicks, the code seeks to address the fundamental bargaining power imbalance caused by Google allowing other media companies to outbid them, or where google earns money off the news coverage but doesn’t pass a cut of the revenue on.&lt;/p&gt;

&lt;p&gt;Google is claiming that what is currently in place is fair and that Google is being unfairly hard done by and unrewarded for their charity.&lt;/p&gt;
&lt;h1&gt;
  
  
  It’s an attempt to protect the major beneficiaries of the status quo, the global media conglomerates, NOT the consumer.
&lt;/h1&gt;

&lt;p&gt;I always get my hackles up when large companies take to writing ‘open letters’. &lt;/p&gt;

&lt;p&gt;Traditionally an open letter is something the downtrodden use to get their voice heard, however its recently become a new tactic where large companies use it to attack something they can’t sue.&lt;/p&gt;

&lt;p&gt;It’s a marketing scheme used to turn Australians against their government in favor of the company that is earning billions and billions of dollars in profit off of everything you search, every gmail you send, and every YouTube video you watch. &lt;/p&gt;

&lt;p&gt;An organization who time and again has demonstrated they have no loyalty to you, the average consumer.&lt;/p&gt;

&lt;p&gt;Australians are being used as pawns to protect Googles right to absolute control over how they conduct their business. There is no clear risk here, Google just doesn’t want to be told what to do, no search history will be shared, no content creator will get less views and no media company will be given unfair special treatment.&lt;/p&gt;

&lt;p&gt;This has all happened before by the way. This is no different to what happened to news via the airways, when TV channels had to meet certain standards.&lt;/p&gt;

&lt;p&gt;Now Google will have to be transparent about usage and fairly negotiate with Australian News Companies on advertising, accurately identify original content, and pay them a cut of the revenue they earn. The only thing being lost here is Google’s chance to profit from anyone with deep enough pockets to shove their content to the front of the queue.&lt;/p&gt;
&lt;h1&gt;
  
  
  Have your say!
&lt;/h1&gt;

&lt;p&gt;The ACCC is accepting written submissions from anyone in the world to voice their concerns on the code.&lt;/p&gt;

&lt;p&gt;Written submissions are due by 5 pm on 28 August 2020 and can be sent to &lt;a href="mailto:bargainingcode@accc.gov.au"&gt;bargainingcode@accc.gov.au&lt;/a&gt;.&lt;/p&gt;



&lt;p&gt;&lt;em&gt;Reminder: All views expressed here are my own and do not represent my employer, Xero.&lt;/em&gt;&lt;/p&gt;


&lt;h4&gt;
  
  
  Who am I?
&lt;/h4&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;
      &lt;div class="ltag__twitter-tweet__media ltag__twitter-tweet__media__video-wrapper"&gt;
        &lt;div class="ltag__twitter-tweet__media--video-preview"&gt;
          &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LTwjgGRY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/tweet_video_thumb/EbrOQOzUMAAK9XQ.jpg" alt="unknown tweet media content"&gt;
          &lt;img src="/assets/play-butt.svg" class="ltag__twitter-tweet__play-butt" alt="Play butt"&gt;
        &lt;/div&gt;
        &lt;div class="ltag__twitter-tweet__video"&gt;
          
            
          
        &lt;/div&gt;
      &lt;/div&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--nC8rVxhG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/1120951641866326017/tzzd0z0s_normal.jpg" alt="Ron-Resist Racism-Soak 🏳️‍🌈🇳🇿 profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Ron-Resist Racism-Soak 🏳️‍🌈🇳🇿
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        @ronsoak
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      Hi, I'm Ron!&lt;br&gt;🛰️ I ❤️ space!&lt;br&gt;☕ Coffee Addict&lt;br&gt;📊 Data Analysis Team Lead for &lt;a href="https://twitter.com/Xero"&gt;@Xero&lt;/a&gt;&lt;br&gt;⚠️ My words are my own.&lt;br&gt;✍️ Writer on &lt;a href="https://twitter.com/ThePracticalDev"&gt;@ThePracticalDev&lt;/a&gt; (&lt;a href="https://t.co/LMuC1tMLom"&gt;dev.to/ronsoak&lt;/a&gt;)&lt;br&gt;✍️ Writer on &lt;a href="https://twitter.com/Medium"&gt;@Medium&lt;/a&gt; (&lt;a href="https://t.co/OvW7Vsvccs"&gt;medium.com/@ronsoak&lt;/a&gt;)&lt;br&gt;|🇳🇿 |🇬🇧 |🏳️‍🌈 
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      11:11 AM - 29 Jun 2020
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=1277560230725906432" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=1277560230725906432" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=1277560230725906432" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;h4&gt;
  
  
  You should read....
&lt;/h4&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Doing the Impossible, using ASSISTANT to make a SQL Linter (and how you can make it lint whatever you want)</title>
      <dc:creator>Alex Antra</dc:creator>
      <pubDate>Thu, 13 Aug 2020 12:51:44 +0000</pubDate>
      <link>https://dev.to/alexantra/doing-the-impossible-using-assistant-to-make-a-sql-linter-and-how-you-can-make-it-lint-whatever-you-want-2ke2</link>
      <guid>https://dev.to/alexantra/doing-the-impossible-using-assistant-to-make-a-sql-linter-and-how-you-can-make-it-lint-whatever-you-want-2ke2</guid>
      <description>

&lt;p&gt;&lt;em&gt;I currently work for a great company called Xero doing all sorts of fun data things. When you're reading my articles I need you to understand that my words are my own, I'm not speaking on behalf of my employer and If i'm talking about something negative in the field that may not be indicative of Xero, I've worked many interesting roles and I read a lot about my field.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  First of all what is a linter?
&lt;/h2&gt;

&lt;p&gt;For those who don't know a Linter is a tool that analyses code to flag errors, bugs, and stylistic errors&lt;/p&gt;

&lt;p&gt;It's a great way to ensure that people are coding in a similar and consistent manner and not like a psychopath......&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/KBg4LUuxOzGNi/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/KBg4LUuxOzGNi/giphy.gif" alt="Psycho" width="245" height="135"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These sadly tend not to exist for SQL. &lt;/p&gt;

&lt;p&gt;The highly contextual, and wildly variable nature of data stores means that while we all use the same keywords, but the devil is in the details. &lt;/p&gt;

&lt;p&gt;For example it's quite common to champion the KISS principle in SQL code, just because you can join 15 tables together in one super query doesn't mean you should, it makes it difficult to test and maintain, and a clear sign you're a psychopath......&lt;/p&gt;

&lt;p&gt;And while we could potentially build a coding framework or a linter to enforce that, there are just times where you have no choice but to join 15 tables together as its all down to how the data works together. Can you see the dilemma? You really can't tell people off if they have no choice.&lt;/p&gt;

&lt;p&gt;In practice this generally means a data team may agree on a set of coding standards, which will differ in their level of comprehensiveness, and are left enforcing it based on how vigilant the peer reviews are.&lt;/p&gt;




&lt;h2&gt;
  
  
  They don't exist but you have one now?
&lt;/h2&gt;

&lt;p&gt;Enter stage left!&lt;/p&gt;

&lt;p&gt;Assistant by Tomasz Smykowski&lt;/p&gt;


&lt;div class="ltag__user ltag__user__id__250707"&gt;
    &lt;a href="/tomaszs2" class="ltag__user__link profile-image-link"&gt;
      &lt;div class="ltag__user__pic"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4DsPsQp9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://res.cloudinary.com/practicaldev/image/fetch/s--h3MTOOqL--/c_fill%2Cf_auto%2Cfl_progressive%2Ch_150%2Cq_auto%2Cw_150/https://dev-to-uploads.s3.amazonaws.com/uploads/user/profile_image/250707/5b3d6462-5d2f-4a9a-af0c-9a66eb558143.png" alt="tomaszs2 image"&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;div class="ltag__user__content"&gt;
    &lt;h2&gt;
&lt;a class="ltag__user__link" href="/tomaszs2"&gt;Tom Smykowski&lt;/a&gt;Follow
&lt;/h2&gt;
    &lt;div class="ltag__user__summary"&gt;
      &lt;a class="ltag__user__link" href="/tomaszs2"&gt;Follow me, if you want to become 10x Developer. I am sharing tips based on 20 years of coding, that gave me a satisfactory career and balanced life. &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;A generic linter VSCODE extension that allows you to program in your own custom rules using regex.  &lt;/p&gt;

&lt;p&gt;I was genuinely very excited when I first saw this. I was in a bar at 2am and had seen the article and sent it to my work email right there right then. My friends where quite rightly judging my commitment to sparkle motion.&lt;/p&gt;

&lt;p&gt;But I digress!&lt;/p&gt;

&lt;p&gt;How did it go? &lt;/p&gt;

&lt;p&gt;Well at first, not well! &lt;/p&gt;

&lt;p&gt;I gave it a whirl and found that it wasn't working for the sorts of things I wanted it to catch. I quickly discovered that it didn't evaluate rules across multiple lines, which is important in SQL because we are forever writing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Select&lt;/span&gt;    &lt;span class="k"&gt;column_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="k"&gt;column_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 

&lt;span class="k"&gt;From&lt;/span&gt;      &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;And then running it, and then it errors, and then we change it, and then we do it again. &lt;/p&gt;

&lt;p&gt;So I emailed Tomasz just to confirm whether that was intended functionality. That was on the 10th of July.&lt;/p&gt;

&lt;p&gt;On the 13th of July Tomasz responded confirming that the linter does not work across multiple lines and that he had created an issue to get that changed as a feature request. &lt;/p&gt;

&lt;p&gt;I thought, that's nice of him, but I was realistic. We all groom backlogs for a living and we all know that some backlogs last multiple presidencies. &lt;/p&gt;

&lt;p&gt;Then Friday July 17th, 5:45pm, I'm drunk, work drinks, great time, email from Tomasz.&lt;/p&gt;

&lt;p&gt;He did it! &lt;/p&gt;

&lt;p&gt;Houston we have multi line support. &lt;/p&gt;

&lt;p&gt;And yes I couldn't wait until Monday to try it out.&lt;/p&gt;

&lt;p&gt;So how did it go? (The sequel)&lt;/p&gt;

&lt;p&gt;Amazing....&lt;/p&gt;

&lt;p&gt;We've loaded in fifteen rules in the past month and they're working well. &lt;/p&gt;
&lt;h2&gt;
  
  
  How to add rules
&lt;/h2&gt;

&lt;p&gt;Well first you need to download the &lt;a href="https://marketplace.visualstudio.com/items?itemName=tomasz-smykowski.assistant"&gt;extension in VSCODE&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Then you need to go to your settings.json file, either in your global or workspace area, depending on how you work. &lt;/p&gt;

&lt;p&gt;Add the following code to start loading in rules in regex:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"rules"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; 
      &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;load&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;rules&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;here&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;:)&lt;/span&gt;&lt;span class="w"&gt; 
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Note:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You will need to double escape regex syntax i.e \b should be

&lt;code&gt;\\b&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;ul&gt;
&lt;li&gt;You may need to change the modifiers however, i = case insensitive, s = multiline, and g = global so I would only remove them if you 100% know what you're doing.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Some Examples
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Comma on the last column
&lt;/h3&gt;

&lt;p&gt;As mentioned earlier this is the the number one mistake we commonly make. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--A5MF5taA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/mz0bx5vphls82pmh273x.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--A5MF5taA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/mz0bx5vphls82pmh273x.JPG" alt="COMMA" width="689" height="208"&gt;&lt;/a&gt;&lt;br&gt;
To add this rule to your linter:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"regex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"(,)(&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s*&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;W*)(from)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Reminder: you don't need a comma on the last column"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"modifiers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sig"&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Don't forget the ON after a join
&lt;/h3&gt;

&lt;p&gt;Sometimes I go straight into adding 'and' conditions after my join and forget the on. (I've also been guilty of having more than one ON clause but that's not covered here.)&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4v-YHv1O--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/odm35faitzmkjvjjj9u3.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4v-YHv1O--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/odm35faitzmkjvjjj9u3.JPG" alt="JOIN" width="685" height="256"&gt;&lt;/a&gt;&lt;br&gt;
To add this rule to your linter:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"regex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"(join)(&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s*&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;w*)(&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s*&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;w*&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s*)(and)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"An ON keyword must follow a join condition"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"modifiers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sig"&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Ban people from using CTE's
&lt;/h3&gt;

&lt;p&gt;In our environments CTE's aren't great for performance. However I still stand by never using them in production code even if they perform well. Something I'm not alone on:&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag__link"&gt;
  &lt;a href="/seattledataguy" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--67hnf1V4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://res.cloudinary.com/practicaldev/image/fetch/s--gLF4qTmT--/c_fill%2Cf_auto%2Cfl_progressive%2Ch_150%2Cq_auto%2Cw_150/https://dev-to-uploads.s3.amazonaws.com/uploads/user/profile_image/177966/0f8eb580-5b46-4ba7-8390-18d85b04c7be.jpg" alt="seattledataguy"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="/seattledataguy/how-to-write-better-sql-advanced-sql-episode-2-please-stop-using-so-many-ctes-4p5g" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;How To Write Better SQL: Advanced SQL Episode 2 - Please Stop Using So Many CTEs&lt;/h2&gt;
      &lt;h3&gt;SeattleDataGuy ・ Jul 6 '20 ・ 6 min read&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#sql&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#datascience&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#database&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--C0y3eHyc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/4trx9oggour2j5uwbfbi.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--C0y3eHyc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/4trx9oggour2j5uwbfbi.JPG" alt="CTE" width="662" height="102"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"regex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"(with&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s)(&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;w*&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s*)(as)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Please refrain from using CTE's in production code. They are bad for redshift."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"modifiers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sig"&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Don't use USING in joins
&lt;/h3&gt;

&lt;p&gt;Other than trying to avoid an accidental bad join, we ban the use of USING as ON is just what we decided is best practice.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--t7Ae01_Z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/201972levpbo0ignmbg2.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--t7Ae01_Z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/201972levpbo0ignmbg2.JPG" alt="USING" width="510" height="232"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nl"&gt;"regex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"(join&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s)(&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;w*&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s*&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;w*&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s*)(using)"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Do not use USING when joining tables, use ON."&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"modifiers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sig"&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Don't get your keywords out of order
&lt;/h3&gt;

&lt;p&gt;How many times have you gotten your order and limit the wrong way around? Or Group / Having / order?&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0LmgkMrk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/ld0g67opkbyreb2wsl18.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0LmgkMrk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/ld0g67opkbyreb2wsl18.JPG" alt="LIMIT" width="482" height="346"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"regex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"(limit&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s*&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d*&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s*)(order|group|where|having)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Limit should be the last keyword in the query."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"modifiers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sig"&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  To reassure commitment...
&lt;/h3&gt;

&lt;p&gt;In modern environments you don't need to commit your code any more.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--y6fCFleK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/nepvgma7psq30gwnv0dy.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--y6fCFleK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/nepvgma7psq30gwnv0dy.JPG" alt="Commit" width="762" height="327"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"regex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"commit;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I can ensure you that Redshift does not have commitment issues, you do not need to write commit."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"modifiers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sig"&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Prevent UNION
&lt;/h3&gt;

&lt;p&gt;You should only ever use UNION ALL. A union scans for duplicates which BURNS compute and it will throw out duplicate rows. My advice is that if you want to exclude duplicates there are better ways.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Kgn0Y7Cs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/sxy1al6luntxofij9plb.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Kgn0Y7Cs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/sxy1al6luntxofij9plb.JPG" alt="union" width="620" height="388"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"regex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"(union&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;s*)(select)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I can't think a valid reason to use union. You should always be using union all. "&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"modifiers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sig"&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Curb some bad compression
&lt;/h3&gt;

&lt;p&gt;Only compression nerds like me know that a varchar(1) in redshift takes up over 7 bytes of space, when you could just use BOOL for a small 1 byte. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--wyALinAk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/jjjb0ch2eqt8hs2fwlib.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--wyALinAk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/jjjb0ch2eqt8hs2fwlib.JPG" alt="bool" width="546" height="185"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"regex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"char&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;W1&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;W"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You should be using a boolean if you are building a char with 1 space."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"modifiers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sig"&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;More info about Redshift Compression here:&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



&lt;h2&gt;
  
  
  And More
&lt;/h2&gt;

&lt;p&gt;We've made rules to catch out common spelling mistakes like 'selct' or to enforce certain practices like using '!=' over '&amp;lt;&amp;gt;'!&lt;/p&gt;




&lt;h1&gt;
  
  
  So how do I make my own?
&lt;/h1&gt;

&lt;p&gt;So as mentioned earlier you need to download the extension in VSCODE and then set up your settings.json&lt;/p&gt;

&lt;p&gt;Now you need to set up a rule, use this template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"regex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"modifiers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Don't forget to add a , after each rule, i.e&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"regex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"modifiers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"regex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"modifiers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Then figure out what you want to catch out. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Beware:&lt;/strong&gt; You are limited to what you can detect using a REGEX pattern. So you'll need to think about how you will do that and just accept that some rules you won't be able to catch. &lt;/p&gt;

&lt;p&gt;For example not closing brackets ( ), it's hard to write a REGEX rule to catch that as what you really want is an exception rule. &lt;/p&gt;

&lt;p&gt;To test whether I can do what I want to do I use something like &lt;a href="https://rubular.com/"&gt;rubular&lt;/a&gt; to test my regex.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Mx8Jr-vL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gdt6zv1dtro55ysfqqaj.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Mx8Jr-vL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/gdt6zv1dtro55ysfqqaj.JPG" alt="Rubular" width="880" height="439"&gt;&lt;/a&gt;&lt;br&gt;
If I get it working, I load it into the settings file and away we go!&lt;/p&gt;
&lt;h2&gt;
  
  
  Things to Note:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;It's got its limitations, it won't work for ever rule, primarily due to the limitations of REGEX not the extension. &lt;/li&gt;
&lt;li&gt;It works all over VSCODE (even in the settings file) so if that annoys you, make sure to use the workspace feature, though there is a feature request to be able to lock the rules to specific file types (this will be handy for those of us who switch coding languages)&lt;/li&gt;
&lt;li&gt;If you make your rule too generic you risk crashing VSCODE as the rule will scan as much as the rule allows. &lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  Share your findings!
&lt;/h1&gt;

&lt;p&gt;Tomasz has a github for this extension where he asks that you submit your own rule file to help others! CSS, HTML, Ruby, Python!&lt;br&gt;
I've already added my sql rules as a pull request :) &lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--566lAguM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/tomaszs"&gt;
        tomaszs
      &lt;/a&gt; / &lt;a href="https://github.com/tomaszs/Hintly"&gt;
        Hintly
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Boost your development by providing custom tips displayed in the code in Visual Studio Code
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;h2&gt;
Demo&lt;/h2&gt;
&lt;p&gt;Hinty (formely: Assistant) is language and framework agnostic.&lt;/p&gt;
&lt;p&gt;Example workspace configuration for Angular/TypeScript. It informs about a bad boolean Input declaration in Angular component. Normally it does not trigger build or linter errors and is a hard to track problem:&lt;/p&gt;
&lt;div class="snippet-clipboard-content notranslate position-relative overflow-auto"&gt;&lt;pre class="notranslate"&gt;&lt;code&gt;{
 ...
 "settings": {
  "assistant": {
   "rules": [
    {
     "regex": "@Input\\\\(\\\\) .*: false;",
     "message": "Define property value with =, not with:"
    }
   ]
  }
 },
 ...
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Result:&lt;/p&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/tomaszs/Hintlyimages/demo.gif"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--C0ppCYPy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://github.com/tomaszs/Hintlyimages/demo.gif" alt=""&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
Hinty - Dynamic Hints For Your Code&lt;/h1&gt;
&lt;p&gt;Are you annoyed that your notes on hard to fix issues are not available when you need them the most - while coding? Is setting standards for the team code broken even if you have a centralized place for rules because it i hard to keep tabs on them all the time?&lt;/p&gt;
&lt;p&gt;Never make the same mistakes again!&lt;/p&gt;
&lt;p&gt;At last there is a solution to these problems. Let me present you a groundbreaking Visual…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/tomaszs/Hintly"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;






&lt;p&gt;&lt;em&gt;Reminder: All views expressed here are my own and do not represent my employer, Xero.&lt;/em&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  Who am I?
&lt;/h4&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;
      &lt;div class="ltag__twitter-tweet__media ltag__twitter-tweet__media__video-wrapper"&gt;
        &lt;div class="ltag__twitter-tweet__media--video-preview"&gt;
          &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LTwjgGRY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/tweet_video_thumb/EbrOQOzUMAAK9XQ.jpg" alt="unknown tweet media content"&gt;
          &lt;img src="/assets/play-butt.svg" class="ltag__twitter-tweet__play-butt" alt="Play butt"&gt;
        &lt;/div&gt;
        &lt;div class="ltag__twitter-tweet__video"&gt;
          
            
          
        &lt;/div&gt;
      &lt;/div&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--nC8rVxhG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/1120951641866326017/tzzd0z0s_normal.jpg" alt="Ron-Resist Racism-Soak 🏳️‍🌈🇳🇿 profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Ron-Resist Racism-Soak 🏳️‍🌈🇳🇿
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        @ronsoak
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      Hi, I'm Ron!&lt;br&gt;🛰️ I ❤️ space!&lt;br&gt;☕ Coffee Addict&lt;br&gt;📊 Data Analysis Team Lead for &lt;a href="https://twitter.com/Xero"&gt;@Xero&lt;/a&gt;&lt;br&gt;⚠️ My words are my own.&lt;br&gt;✍️ Writer on &lt;a href="https://twitter.com/ThePracticalDev"&gt;@ThePracticalDev&lt;/a&gt; (&lt;a href="https://t.co/LMuC1tMLom"&gt;dev.to/ronsoak&lt;/a&gt;)&lt;br&gt;✍️ Writer on &lt;a href="https://twitter.com/Medium"&gt;@Medium&lt;/a&gt; (&lt;a href="https://t.co/OvW7Vsvccs"&gt;medium.com/@ronsoak&lt;/a&gt;)&lt;br&gt;|🇳🇿 |🇬🇧 |🏳️‍🌈 
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      11:11 AM - 29 Jun 2020
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=1277560230725906432" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=1277560230725906432" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=1277560230725906432" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;h4&gt;
  
  
  You should read....
&lt;/h4&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


</description>
      <category>sql</category>
      <category>vscode</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>What they don’t tell you about being an analyst</title>
      <dc:creator>Alex Antra</dc:creator>
      <pubDate>Mon, 29 Jun 2020 12:01:36 +0000</pubDate>
      <link>https://dev.to/alexantra/what-they-don-t-tell-you-about-being-an-analyst-4f90</link>
      <guid>https://dev.to/alexantra/what-they-don-t-tell-you-about-being-an-analyst-4f90</guid>
      <description>

&lt;p&gt;&lt;em&gt;I currently work for a great company called Xero doing all sorts of fun data things. When you're reading my articles I need you to understand that my words are my own, I'm not speaking on behalf of my employer and If i'm talking about something negative in the field that may not be indicative of Xero, I've worked many interesting roles and I read a lot about my field.&lt;/em&gt; &lt;/p&gt;




&lt;p&gt;There are a boat load of articles across the blog-o-sphere outlining all the amazing reasons you should switch to a career as an analyst, however as someone who has been an applications analysts, an information analyst, an operations analyst, a data analyst and now a Data Analyst Team Lead I feel obliged to outline the stuff those articles seem to consistently gloss over.&lt;/p&gt;

&lt;p&gt;Now the goal isn’t to scare you away. Due to the lack of formal education around data, a vast majority of those working in data have come from another profession which makes it a very welcoming field for those making the switch, as we’ve all been there. But it’s important you make the switch fully informed. &lt;/p&gt;

&lt;h2&gt;
  
  
  It's an old but immature field.
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/Rk927btUSH5eW0Hlbs/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/Rk927btUSH5eW0Hlbs/giphy.gif" alt="Old" width="320" height="166"&gt;&lt;/a&gt;&lt;br&gt;
Data and databases in this format have been around nearly half a century, the structure query language first appeared in 1974! However this hasn’t brought with it 50 years of advancement per se. The field has moved along, some key people have worked very hard to get it here, however when compared to other fields in tech we appear quite immature. &lt;/p&gt;

&lt;p&gt;Compare us to software development which in less time has found a way to build entire coding frameworks to enforce standard ways of coding, and end to end CICD flow with automated testing even. &lt;/p&gt;

&lt;p&gt;Data on the other hand.... &lt;/p&gt;

&lt;p&gt;The actual SQL language differs depending on the databases and you have, and you need to remember those differences yourself and adapt, no one has made an universal sql interpreter yet. &lt;/p&gt;

&lt;p&gt;Then, depending on the version of the database some functionality may be missing and not all databases get an upgrade. I remember one role where I was responsible for two TSQL 2003 DB’s and one 2009.... in 2017. Sometimes I would come across the perfect solution on Stack-overflow only to find that that code didn’t work on those older versions. &lt;/p&gt;

&lt;p&gt;Best practices also differ depending on the underlying technology. A common example is the short cut practice known as a CTE. In databases where the memory is centralised CTE’s can be very quick to execute and while not considered best practice when making production code its very commonly used in day to day adhoc queries. For databases that run on distributed processing which have multiple memory banks (ie the job is shared) CTE’s run horribly and should be avoided at all costs. &lt;/p&gt;

&lt;p&gt;The lack of coding frameworks isn’t for a lack of trying, the key difference between the data field and software development field is that the data is completely different in every single company. Which makes coding highly context dependant which is why we don’t have as much opportunity to standardise the code going in or to automate what's coming out. &lt;/p&gt;

&lt;p&gt;This problem of highly contextual data goes beyond just the SQL language itself, some analysis is done in SAP, Python, SPSS, and R. The problem remains the same in each of those methodologies. &lt;/p&gt;

&lt;p&gt;We’re also a volatile field. No one has really settled on a best practice for very long and you’ll see a lot of migrations in your time as companies ditch now obsolete hardware or methodologies for the new shiny only to do the same again within 2-5 years. I’ve witnessed six migrations in seven years. &lt;/p&gt;

&lt;p&gt;We’ll get there eventually but until we do, data analysis will continue to be very manual, inconsistent, and messy. Which just slows everything down, sharing code is hard to do as everyone codes differently, the data bases are different, and the underlying data is contextual. Code that works on one platform, won’t work on another, Testing someone's code is often down to ‘best efforts’ at the time. It's not an exact science and as other fields get better at what they do it leaves the us looking a bit immature by comparison. &lt;/p&gt;
&lt;h2&gt;
  
  
  You’ll be limited by both technology and customer.
&lt;/h2&gt;

&lt;p&gt;In the same vein as the above point your technical ability will be limited by what systems you use and what's being asked of you. For many a career as an analyst is a responsive thing, you’ll provide what you're being tasked to do. &lt;/p&gt;

&lt;p&gt;99% of what my customers need are in our SQL database, which means my skills in other languages are not being challenged. I know python, dabbled in google analytics,  and I'm familiar with doing analysis on JSON but I'm forever loosing my edge as all I need to do my current role is just SQL.&lt;/p&gt;

&lt;p&gt;In some roles this can limit your ability to grow out of your role and to the next, and we don’t all have the luxury of dedicating time outside of work, you can potentially find yourself stunted in a role and unable to apply for other roles as you don’t meet their requirements but your current role can’t grow you in that direction. &lt;/p&gt;

&lt;p&gt;Your customer has an influence on this too. If all that's been requested of you is ‘simple’ stuff you again won’t have an opportunity to grow. A rising tide will lift all boats and it may be necessary to educate your customer base (or wherever your workload comes from) in order for you to be stretched mentally in the role. &lt;/p&gt;

&lt;p&gt;Some people may end up in what we call ‘report farms’ roles where you just pump out reports for a demanding user base. These can be some of the most un-rewarding roles and sometimes the least technical ie. just spreadsheets. I would of course caution you to try and avoid these roles but they are hard to distinguish up front. &lt;/p&gt;
&lt;h2&gt;
  
  
  In the end It’s a customer service job
&lt;/h2&gt;

&lt;p&gt;This isn't a job where you hide in a back room alone for eight hours a day chugging coffee. Nine times out of ten your job is to give other people reports. You will need to talk to those people and often before, during, and after you’ve delivered a report.&lt;/p&gt;

&lt;p&gt;Unless your customer has been an analyst in their previous life they will have no context around how long something should take or whether its feasible or not. They will almost always assume its easy and will want it by tomorrow. They will also not be very data literate and I found expectations will most likely be in the polar opposite place from where you need them to be. &lt;/p&gt;

&lt;p&gt;They’ll be overly concerned about accuracy when they don’t need to be and will want you to ‘wing it’ and ‘just find a pattern’ in the most nebulous of data sets. They will never really 100% understand what they are asking for and you’ll quickly learn the difference between deliberate scope creep and scope creep as a result of the customer starting to understand what they really need at the later stages of the analysis. &lt;/p&gt;

&lt;p&gt;They will be rude and demanding at times, uninterested in answering your questions, “just do it they’ll” say and sometimes they just wont’ understand the problem. Your job, every step of the way, is to be nice and hold their hand. You’ll quickly discover that the only way to navigate through those scenarios is to proactively manage your customer and it’s very hard work but believe me, leaving your customer in the dark and not managing them will cause more work in the long run and a deteriorating stakeholder relationship. &lt;/p&gt;

&lt;p&gt;I nearly always hire staff with good stakeholder management skill over those with the technical ability. I can teach you SQL easier than I can teach you to be a customer service super star. &lt;/p&gt;
&lt;h2&gt;
  
  
  It’s actually very difficult to train for this job outside of the field.
&lt;/h2&gt;

&lt;p&gt;This job is actually nigh impossible to prepare for if you're not already in the field. All the code academy and medium articles will have you believe that learning SQL and how to make a graph is 90% of the job. They'll have you do some left joins and a group by and you're ready for your first day. &lt;/p&gt;

&lt;p&gt;Sadly that's not the case. &lt;/p&gt;

&lt;p&gt;Knowing SQL is 20% of the job. It's like how knowing how to drive is only a small part of being a taxi / Uber driver. The SQL teaches you how to navigate the companies data. Learning the data, much like a taxi driver learning the city is most of the job. When someone says they want sales figures for a certain region but for only for a customer that has done X activity. You need to know where all that data is housed, and how it functions and how to join it together and if you don’t know, your going to have to figure it out. &lt;/p&gt;

&lt;p&gt;The job is also a largely complex problem solving. Figuring out how to smush together, and in what order, multiple data sets to get your customers result turns you into a mini inventor. There is a high likelihood that sometimes you will need to match up a combination of data that no one ever in history has ever done before. There won’t be an answers page for you to turn to at the end and again no online course get truly teach you that, as the complexity of the problem solving needed will be dictated by your companies unique data. &lt;/p&gt;

&lt;p&gt;You also can’t be prepared for big data. Nothing online can prepare you for joining a billion rows of data with another hundred million. Much of being an analyst in a big data space (not all roles are big data) is doing some light data engineering. Knowing how to move such large data sets around in the safest, quickest, most efficient manner is nothing i’ve seen an online data course teach. &lt;/p&gt;
&lt;h2&gt;
  
  
  The quality of the data will plague you.
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/OY9XK7PbFqkNO/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/OY9XK7PbFqkNO/giphy.gif" alt="haunt you" width="250" height="148"&gt;&lt;/a&gt;&lt;br&gt;
In 2020 it’s actually uncommon for companies to be sitting on high quality data. Undocumented and inconsistent data is common place across the field and made infinitely worse by ‘big data’. By the time most companies realized they needed to understand the fidelity of their own data they where already years of data down the line and to go back and retrospectively fix that data would take a lot of work. &lt;/p&gt;

&lt;p&gt;You’ve also got the reality that these data sets weren’t built to be joined or analysed, they were built for an entirely different purpose and then your data warehouse has just made a copy. There are scenarios where joining two datasets together is impossible, or requires an enormous amount of manual work to get it done something your customer won’t think about when they make the request. &lt;/p&gt;

&lt;p&gt;You'll forever find issues with data, and the bigger it is, the more issues are hiding from you. You’ll find legacy data that no longer done that way, entire days or months of missing data because there was a known or unknown issue at that time. Some data will only be current state and is missing history, while some historic data will be difficult to accurately parse, columns won’t always be named in a descriptive or consistent manner and tables will be named as good as the passing fancy on the day. And don’t get me started on dates. I'm lucky enough to live in a data environment mostly in UTC but I've worked in environments where data was recorded in multiple timezones and that's not easy to work with. &lt;/p&gt;

&lt;p&gt;If you're lucky, those datasets will be documented right down to the column, however most companies are deathly allergic to documentation and what you’ll most likely find is a bit of documentation (most likely created by a frustrated analyst) and the rest of information you need will be stored in various peoples heads. In one scenario I had to do reporting on an abandoned feature that wasn’t documented and the only person left in the company who knew how it worked was one of the founders so I had to wrestle my way into their diary ask them to cast their memory back 9 years.&lt;/p&gt;
&lt;h2&gt;
  
  
  You’ll struggle with primary and secondary data.
&lt;/h2&gt;

&lt;p&gt;As data and companies grow so will the spread of data. In my early days it was common to run reports directly off the production environments, now you risk slowing down the product and causing customer complaints if you do so. &lt;br&gt;
So in most enterprise scenarios that data is streamed or replicated elsewhere into a data warehouse. You now have a primary (source) and secondary (replicated) data set scenario, this can cause issues.. &lt;/p&gt;

&lt;p&gt;If the source data is changed after it is copied some environments don’t capture that and every day that passes between your two out-of-sync data sets the more inaccurate your secondary gets. Which can be an issue if you're trying to do accurate financial reporting or trying to gauge the impact to customers after an issue. &lt;/p&gt;

&lt;p&gt;Some primarily datasets don’t retain history, often overwriting themselves in the interest of speed and saving space, this again can cause issues between sets as both set will no longer be inline with each other. &lt;/p&gt;

&lt;p&gt;Then within your own data warehouse you may have a different primary and secondary relationship. &lt;/p&gt;

&lt;p&gt;Models which are best described as short cuts are a good example of this. &lt;/p&gt;

&lt;p&gt;Say the most commonly requested customer information is actually separated across 12 different tables. Rather than join those 12 tables together over and over again one might create a model that is a single table holding those values in one convenient place. This is a very simplified example of a model but now within your own data warehouse you have primary and secondary data sets of the same data. In simple models this shouldn't cause issue however if data was transformed for better reading or to align with business logic you may find scenarios where you need to make the decision as to whether the model and how it was built suits your needs and you may have to parse modelled and un-modelled data together, again neither one was built to work together and no one could have predicted your exact scenario and an easy solution isn’t promised to you. &lt;/p&gt;

&lt;p&gt;You are always going to be tackling the issues that come from having the same data split out into different buckets and it's on you to get familiar with their strengths and pitfalls so to understand what ones to use. &lt;/p&gt;
&lt;h2&gt;
  
  
  You’ll constantly struggle with metadata, business logic, operational logic, and best practice.
&lt;/h2&gt;

&lt;p&gt;Completely stepping around the fact that most data isn’t documented properly or of a good enough quality, what you’ll struggle with next is what exact documentation is needed and how many different types you'll need to juggle and discover. &lt;/p&gt;

&lt;p&gt;Metadata is the descriptive information about the smallest data point, ie what data goes into this column, and if its secondary data what transformations have been applied to it. Aka why does this column exist and what goes into it. &lt;/p&gt;

&lt;p&gt;Business Logic is the application of the companies rules applied over the top of the data. So say your a company with a couple of million customers. However some of those customers are test accounts, free accounts, press accounts and so internally the company will have an understanding of what types of customers they should always exclude when they are referring to ‘customers’. This should normally be held across every department so that different departments can be on the same page. This of course will result in some common SQL code back at your end when your customer goes ‘only valid customers please.’ While business logic should be company wide it more often than not ends up being department specific and so marketing will have their own logic you'll have to remember and then so will finance. The bigger the company is, the harder it is to get unified ‘business logic’, sometimes the right business logic can’t be done in the data or takes 9 times a long, something else you’ll bear the responsibility to solve. &lt;/p&gt;

&lt;p&gt;Operational Logic is more how the data changes when its interacted with. Sometimes a single action in your product will change 9 different things in 7 tables, you won’t catch or understand that with meta data or business logic alone and its often hard baked into the code of your application. Say a customer misses a payment and your terms and conditions say that after 12 days of non payment the account is deleted. That entire process will change data in all sorts of places at different times over those 12 days and those interactions are hard to figure out by watching the data as those movements will be hidden within the noise. Most of the time you will need to be told by whomever created the operational logic what is happening at what times. &lt;/p&gt;

&lt;p&gt;Best practice is similar to business logic but really its more your own internal logic that you and other analysts discover over time or to make up for poor data quality. Like excluding data from a certain ID as that's a testing ID or excluding a certain threshold of data because the data is less realiable past that threshold. Over time you’ll build up a lot of best practice. &lt;/p&gt;

&lt;p&gt;At all times of doing your job you need to rely on the above, which may be missing in most cases and so overtime you’ll build up your own understanding but you’ll still be missing most of it.&lt;/p&gt;

&lt;p&gt;Most of an analysts job is navigating through the uncertainty of the data caused by the fact that you lack all of the above. &lt;/p&gt;
&lt;h2&gt;
  
  
  It can be unforgiving.
&lt;/h2&gt;

&lt;p&gt;There's a lot that can go wrong and it can from time to time go wrong all at once. There will always be ‘that’ report an analyst can remember. The one that kept going wrong. &lt;/p&gt;

&lt;p&gt;This can be extremely frustrating and I've seen it push some people out of the field. It can feel like the cards are stacked against you, if the data's bad it’s your job to work around that, if the data’s missing it’s your to explore every avenue, if no one knows what to do or where to look it’s your job to figure it out. If the customer isn't data literate it’s your job to up-skill them, if your customer comes to you with a short deadline it's your job to sprint to the finish line, if the database is slow it’s your job to make sure that doesn’t impact the customer’s delivery date, if they think of something last minute it’s your job to just add that ‘one more thing’. If there is a problem or issue, nine times out of ten you will be the one bearing the brunt of the issue. It is sadly a role where the power dynamic is uneven by default. You can’t account for everything because the spectrum of issues is very very broad and you're too busy doing reports to fix them up. &lt;/p&gt;
&lt;h2&gt;
  
  
  Conclusion.
&lt;/h2&gt;

&lt;p&gt;I’m still in data after all these years and it’s a career I like, regardless of the above, in fact what I like is tackling the above issues head on but that's not everyone's cup of tea. People enjoy different things and I hope that if you do make the switch you quickly find the thing in data that makes you happy. But if you are considering a career in data, keep the above in mind, it’s not all rainbows and high paying data science roles the medium articles will have you believe! &lt;/p&gt;



&lt;p&gt;&lt;em&gt;Reminder: All views expressed here are my own and do not represent my employer, Xero.&lt;/em&gt;&lt;/p&gt;


&lt;h4&gt;
  
  
  Who am I?
&lt;/h4&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;
      &lt;div class="ltag__twitter-tweet__media ltag__twitter-tweet__media__video-wrapper"&gt;
        &lt;div class="ltag__twitter-tweet__media--video-preview"&gt;
          &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LTwjgGRY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/tweet_video_thumb/EbrOQOzUMAAK9XQ.jpg" alt="unknown tweet media content"&gt;
          &lt;img src="/assets/play-butt.svg" class="ltag__twitter-tweet__play-butt" alt="Play butt"&gt;
        &lt;/div&gt;
        &lt;div class="ltag__twitter-tweet__video"&gt;
          
            
          
        &lt;/div&gt;
      &lt;/div&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--nC8rVxhG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/1120951641866326017/tzzd0z0s_normal.jpg" alt="Ron-Resist Racism-Soak 🏳️‍🌈🇳🇿 profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Ron-Resist Racism-Soak 🏳️‍🌈🇳🇿
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        @ronsoak
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      Hi, I'm Ron!&lt;br&gt;🛰️ I ❤️ space!&lt;br&gt;☕ Coffee Addict&lt;br&gt;📊 Data Analysis Team Lead for &lt;a href="https://twitter.com/Xero"&gt;@Xero&lt;/a&gt;&lt;br&gt;⚠️ My words are my own.&lt;br&gt;✍️ Writer on &lt;a href="https://twitter.com/ThePracticalDev"&gt;@ThePracticalDev&lt;/a&gt; (&lt;a href="https://t.co/LMuC1tMLom"&gt;dev.to/ronsoak&lt;/a&gt;)&lt;br&gt;✍️ Writer on &lt;a href="https://twitter.com/Medium"&gt;@Medium&lt;/a&gt; (&lt;a href="https://t.co/OvW7Vsvccs"&gt;medium.com/@ronsoak&lt;/a&gt;)&lt;br&gt;|🇳🇿 |🇬🇧 |🏳️‍🌈 
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      11:11 AM - 29 Jun 2020
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=1277560230725906432" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=1277560230725906432" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=1277560230725906432" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;h4&gt;
  
  
  You should read....
&lt;/h4&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


</description>
      <category>data</category>
      <category>analysis</category>
      <category>sql</category>
      <category>database</category>
    </item>
    <item>
      <title>I built my own SQL tester in Python, then rebuilt it again from scratch,here's what I learned.</title>
      <dc:creator>Alex Antra</dc:creator>
      <pubDate>Mon, 23 Mar 2020 18:55:19 +0000</pubDate>
      <link>https://dev.to/alexantra/i-built-my-own-sql-tester-in-python-then-rebuilt-it-again-from-scratch-here-s-what-i-learned-c9l</link>
      <guid>https://dev.to/alexantra/i-built-my-own-sql-tester-in-python-then-rebuilt-it-again-from-scratch-here-s-what-i-learned-c9l</guid>
      <description>&lt;h1&gt;
  
  
  Context
&lt;/h1&gt;

&lt;p&gt;As previously mentioned in another &lt;a href="https://dev.to/ronsoak/i-built-my-own-vs-code-syntax-highlighter-from-scratch-and-here-s-what-i-learned-1h98"&gt;Redshift Flavored Automation Article&lt;/a&gt; AWS Redshift isn't that common and so tools on the internet are hard to come by. &lt;/p&gt;

&lt;p&gt;Testing tables are important in the data world, did the criteria you specify work? Are there anomalies in your data? &lt;/p&gt;

&lt;p&gt;But tables can be big and be made up of multiple different data points and logic. This can mean that to do due diligence for testing a table you might need to run five, ten, fifteen, even twenty separate checks on the table after its been built. Human nature means that this doesn't always happen, and certainly not consistently across every table and analyst.&lt;/p&gt;

&lt;p&gt;One of my team vocalized this issue and we got talking about a checklist that we could generate once we've built a table to know what to test, this would also go a long way to helping the tester in seeing whats been mitigated. I played around with the idea of creating some VSCode plugin that reads your code and generates a checklist but the more I looked into it the more a different solution made more sense, and felt easier to do. &lt;/p&gt;

&lt;p&gt;Why build a checklist of things to test, which would then require the analyst to create the test queries when I can just jump straight to building a tool that generates the test queries??? Sure this won't cover all scenarios but nothing ever would, the reason you built the table may never be apparent in the code. &lt;/p&gt;

&lt;p&gt;So I knew what I wanted to build, however I only knew HTML/CSS, SQL/PLSQL, Batch and a splash of power shell. While I could have done this in PLSQL or power shell I saw this as a great opportunity to learn Python.&lt;/p&gt;

&lt;h1&gt;
  
  
  Version 1
&lt;/h1&gt;

&lt;p&gt;I had briefly been exposed to Pandas and so I was familiar with data frames, a familiar to someone who works with data every day. A few Pandas tutorials later and I had a general direction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Methodology
&lt;/h2&gt;

&lt;p&gt;Where I landed in terms of a mechanic was to use the tables Data Definition Language or DDL which is the blueprint for how the table was created. See below for an example.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fronsoak%2FSQL_Test_Script_Gen%2Fmaster%2Fassets%2Ffull_table_build.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fronsoak%2FSQL_Test_Script_Gen%2Fmaster%2Fassets%2Ffull_table_build.png" alt="SQL DDL"&gt;&lt;/a&gt;&lt;br&gt;
My script would read this DDL and then based on what data type the column was, String, Integer, Date, would spin off a bunch of per-fabricated tests.  So if the column was a date it would spit out a query that tested the min and max value of that date column, something we do to check that we have got records in the correct date range. &lt;/p&gt;

&lt;p&gt;I did this by getting the user to copy the DDL to their clipboard and then using the Pandas copy clipboard would ingest the DDL into the script.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;sql_load&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_clipboard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sep&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;squeeze&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;I then checked to make sure that the content being loaded was valid, the first word needed to be 'create' so I enforced that and forced an error if it didn't find that.If it did pass the test the script read what the table name was as this would be used later. &lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="c1"&gt;#GET FIRST ROW OF DDL
&lt;/span&gt;&lt;span class="n"&gt;sql_header&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sql_load&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;                                               

&lt;span class="c1"&gt;#VALIDATE THAT THIS IS A VALID DDL
&lt;/span&gt;&lt;span class="n"&gt;sql_validate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;^\w*\b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;sql_header&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;U&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;               
&lt;span class="n"&gt;sql_validate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sql_validate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sql_validate&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CREATE&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;                                                
    &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;script_terminated_with_error.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;error_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;error_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;If you are reading this message, the python script has terminated. &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   
    &lt;span class="n"&gt;error_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reason? The first word on the clipboard wasn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t CREATE.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;error_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This means you have not copied a valid Redshift SQL Table Create Statement to your clipboard. &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;error_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;For more help refer to: https://github.com/ronsoak/SQL_Test_Script_Gen.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;error_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;close&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;fr&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Not a valid DDL, must start with CREATE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sql_table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\w+\.\w+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;sql_header&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;U&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;         


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;I decided that the output of this script would be written to a text file so I had to get python to create a text file, rename it the name of the table name and then write content to it. But tables are often called schema.tablename and we can't have a dot in a filename so I had to re-arrange how all that looked.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;sql_table_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sql_table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[table_testing][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;sql_table_file&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;].txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; 
&lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;sql_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;So now I've got this whole DDL loaded in but the only thing I need are the column names and there data types, I don't need their compression types nor do I need the tables diststyle,distkey, or sortkey. So seeming I'm using data frames I went through the process of whittling down the entire DDL to just rows that contained a data type, to which I had loaded all of the possible data types into a list called red_types.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="c1"&gt;#LIST OF DATA TYPES
&lt;/span&gt;&lt;span class="n"&gt;red_types&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SMALLINT|INT2|INTEGER|INT|INT4|BIGINT|INT8|DECIMAL|NUMERIC|REAL|FLOAT4|DOUBLE|DOUBLE PRECISION|FLOAT8|FLOAT|BOOL|BOOLEAN|DATE|TIMESTAMP|TIMESTAMPTZ|CHAR|CHARACTER|NCHAR|BPCHAR|VARCHAR|CHARACTER VARYING|NVARCHAR|TEXT|GEOMETRY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;#READ THE COL NAMES AND DATA TYPES
&lt;/span&gt;&lt;span class="n"&gt;sql_load&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql_load&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;sql_load&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sql_load&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  
&lt;span class="n"&gt;sql_load&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sql_load&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\(\S*\)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;sql_reduce&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sql_load&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;sql_load&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;fr&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(\b(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;red_types&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)\b)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; 
&lt;span class="n"&gt;sql_reduce&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sql_reduce&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expand&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;sql_cols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sql_reduce&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; 
&lt;span class="n"&gt;sql_cols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sql_cols&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;COL_NAME&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DATA_TYPE&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt; 
&lt;span class="n"&gt;sql_cols&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;COL_NAME&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sql_cols&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;COL_NAME&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;sql_cols&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DATA_TYPE&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sql_cols&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DATA_TYPE&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Then once I had whittled down the data frame to the exact rows I wanted, I created a function that wrote the tests to a text file dependent on the rows data type, and then looped through the data frame.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="c1"&gt;#DEFINE FUNCTION FOR PRINTING
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;col_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt;      &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;red_nums&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select min(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;), avg(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;), max(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select median(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;elif&lt;/span&gt;    &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;red_dates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select min(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;), max(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;elif&lt;/span&gt;    &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;red_string&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, count(*) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; group by 1 order by 2 desc limit 50; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select count(distinct(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)), count(*) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; limit 50; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;elif&lt;/span&gt;    &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;red_bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, count(*) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; group by 1 order by 2 desc limit 10; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;elif&lt;/span&gt;    &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;red_geo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Geospatial Data not currently supported. &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Column:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;is not a know Datatype. Datatype passed was:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sql_cols&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; 
    &lt;span class="nf"&gt;col_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;COL_NAME&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DATA_TYPE&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;sql_table&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Finally once the script was done, I wrapped a batch script around it so a user could just double click the script and they would be prompted to copy the DDL to their clipboard and then they could prompt the batch script to kick off the python script. &lt;/p&gt;
&lt;h2&gt;
  
  
  Outcomes
&lt;/h2&gt;

&lt;p&gt;Once I had it up and running I gave it to my team to use and test. We encountered a few issues right off the bat, like the regex used to detect table names needed a few iterations. And data types that had parameters after them like varchar(300) continued to cause issues. It also would hard fail if the wrong thing was copied to the clipboard (like a file).&lt;/p&gt;

&lt;p&gt;After a month of testing I showed it off to some other analysts and got them using it. It would throw an error every now and again but not the end of the world for my first python automation. &lt;/p&gt;
&lt;h2&gt;
  
  
  Learnings
&lt;/h2&gt;

&lt;p&gt;I knew when I made the script that it was over-engineered and relied too heavily on external libraries and hacky workarounds. The header of my script looked like this.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="c1"&gt;#IMPORTS
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;  
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;            
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;warnings&lt;/span&gt;      
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;           
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;            

&lt;span class="c1"&gt;#SUPRESS WARNINGS
&lt;/span&gt;&lt;span class="n"&gt;warnings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filterwarnings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ignore&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;This pattern has match groups&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;I posted the script on a reddit thread looking for feedback and pretty succinctly was told that I shouldn't be using Pandas for this, and if I was going to use Pandas not to use itterows in it. &lt;/p&gt;

&lt;p&gt;I knew I could do better.&lt;/p&gt;
&lt;h1&gt;
  
  
  Version 2
&lt;/h1&gt;

&lt;p&gt;So I set out to rebuild this script not using Pandas, in fact as little external libraries as possible was my aim. So I brushed up on my lists, tuples, and dicts and got coding.&lt;/p&gt;
&lt;h2&gt;
  
  
  Methodology
&lt;/h2&gt;

&lt;p&gt;Not using Pandas meant I couldn't copy a DDL off of the clipboard, native python can't do this, other libraries can but I decided the DDL could be in a text file and get loaded that way. So the first thing the batch file does now is open up the text file and gets the user to save the DDL into it. &lt;/p&gt;

&lt;p&gt;So now I've locked the DDL load to a text file the next few steps are the same. Read the table name and prepare the output text file, and it's file name.&lt;/p&gt;

&lt;p&gt;You will note that I've removed the validation that checks that the first word is 'Create' the analyst will figure it out soon enough.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="c1"&gt;#LOAD THE DDL
&lt;/span&gt;&lt;span class="n"&gt;load_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;test_script_input.sql&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;file_contents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;load_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readlines&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;#GET THE TABLE NAME
&lt;/span&gt;&lt;span class="n"&gt;table_head&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;file_contents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\w+\.\w+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table_head&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;U&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;#FILE OUTPUT
&lt;/span&gt;&lt;span class="n"&gt;sql_table_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[table_testing][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;sql_table_file&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;].txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; 
&lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;sql_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;So no data frames this time right? Which means the DDL has been loaded into a list. How did I refine that list down to just the right columns? Last time took 15 lines of code, this time it's all done in a one line lambda.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="c1"&gt;#RESTRICT TO APPLICABLE COLUMNS
&lt;/span&gt;&lt;span class="n"&gt;valid_rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\b(SMALLINT|INT2|INTEGER|INT|INT4|BIGINT|INT8|DECIMAL|NUMERIC|REAL|FLOAT4|DOUBLE|DOUBLE PRECISION|FLOAT8|FLOAT|BOOL|BOOLEAN|DATE|TIMESTAMP|TIMESTAMPTZ|CHAR|CHARACTER|NCHAR|BPCHAR|VARCHAR|CHARACTER VARYING|NVARCHAR|TEXT|GEOMETRY)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="n"&gt;file_contents&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;The methodology is the same, looking for entries in the list that only contain a redshift data type, but this was vastly more elegant, I don't even need to have the data types loaded in a separate list. It could also handle full data type names, where before my script went to the effort to first reduce decimal(50,5) down to just decimal, then it would detect it, while in this script it can detect data types without needed to first remove the brackets. &lt;/p&gt;

&lt;p&gt;The next steps are very similar. Now I've got my valid rows, I needed to convert some of the values into strings, and rather than pass the actual data types into the function I passed the high level type. So where it was a int, bigint, decimal I passed it as  'Number'. &lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="c1"&gt;#CREATE TEST SCRIPT
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;table_line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;valid_rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;table_line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table_line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;col_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;table_line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;col_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col_name&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;col_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;col_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"'"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;col_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;col_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;col_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;col_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;dat_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;table_line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;dat_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dat_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\b(CHAR|CHARACTER|NCHAR|BPCHAR|VARCHAR|CHARACTER VARYING|NVARCHAR|TEXT)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;dat_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt; &lt;span class="n"&gt;col_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;STRING&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="nf"&gt;elif &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\b(SMALLINT|INT2|INTEGER|INT|INT4|BIGINT|INT8|DECIMAL|NUMERIC|REAL|FLOAT4|DOUBLE|DOUBLE PRECISION|FLOAT8|FLOAT)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;dat_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt; &lt;span class="n"&gt;col_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;NUMBER&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="nf"&gt;elif &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\b(BOOL|BOOLEAN)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;dat_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt; &lt;span class="n"&gt;col_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BOOL&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="nf"&gt;elif &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\b(DATE|TIMESTAMP|TIMESTAMPTZ)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;dat_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt; &lt;span class="n"&gt;col_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DATES&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="nf"&gt;elif &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\b(GEOMETRY)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;dat_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt; &lt;span class="n"&gt;col_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GEO&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;col_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BAD&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;col_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;script_gen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;col_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;In the previous version I generated one or two test queries per column type. This time I wanted to go further, and so I thought about more scenarios we would want to check, like if it's an number, checking the 25, 50, and 75 quartiles. I even added headings to give each area structure, while before it was a text file with a number queries. Needless to say my function got a lot bigger.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;


&lt;span class="c1"&gt;#SCRIPT_GEN FUNCTION
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;script_gen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt;      &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;NUMBER&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Testing for Column: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Column Type: Number &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Testing counts of column &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select count(*) as row_count, count(distinct(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)) as distinct_values from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Checking for nulls, are you expecting nulls? &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select count(*) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; where &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; is null; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Testing mins, avgs and max values &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select min(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;), avg(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;), max(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Testing median, redshift doesn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t like doing medians with other calcs &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select median(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Testing 25% quartile, can be slow &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select percentile_cont(.25) within group (order by&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;) as low_quartile from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Testing 50% quartile, can be slow&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select percentile_cont(.50) within group (order by&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;) as mid_quartile from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Testing 75% quartile, can be slow&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select percentile_cont(.75) within group (order by&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;) as high_quartile from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt;    &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DATES&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Testing for Column: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Column Type: Date &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Testing counts of column &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select count(*) as row_count, count(distinct(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)) as distinct_values from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Checking for nulls, are you expecting nulls? &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select count(*) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; where &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; is null; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Checking for highs and lows, are they as you expected? &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select min(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;), max(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Checking how many dates are in the future. &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select count(*) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; where &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &amp;gt;sysdate; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Checking how many dates have a timestamp. &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select count(*) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; where substring(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,12,8)&amp;lt;&amp;gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;00:00:00&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt;    &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;STRING&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Testing for Column: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Column Type: String &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Testing counts of column &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select count(*) as row_count, count(distinct(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)) as distinct_values from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Checking for nulls, are you expecting nulls? &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select count(*) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; where &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; is null; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Top 50 values &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, count(*) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; group by 1 order by 2 desc limit 10; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Check string lengths &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select min(len(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)) as min_length,max(len(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)) as max_length  from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt;    &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BOOL&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Testing for Column: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Column Type: BOOL &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Testing counts of column &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select count(*) as row_count, count(distinct(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)) as distinct_values from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Checking for nulls, are you expecting nulls? &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select count(*) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; where &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; is null; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-- Breakdown of boolean &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, count(*) from &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; group by 1 order by 2 desc limit 10; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt;    &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GEO&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Geospatial Data not currently supported. Suggest Something?  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;sql_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Column:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;is not a know Datatype. Datatype passed was:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;And voila we get this:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fronsoak%2FSQL_Test_Script_Gen%2Fmaster%2Fassets%2Fresults_example.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fronsoak%2FSQL_Test_Script_Gen%2Fmaster%2Fassets%2Fresults_example.png" alt="Results"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Outcomes
&lt;/h2&gt;

&lt;p&gt;This script is a lot more 'pure' in my eyes, doesn't import an excessive amount of libraries and definitely no suppressed warnings. It can handle nuances in different code a lot better than the previous one did. Doesn't matter whether it's varchar(15) or varchar (400000) it handles both correctly. &lt;/p&gt;

&lt;p&gt;Sure I've made a few compromises, like not validating the input, or allowing the user to just copy the DDL to the clipboard, it doesn't even open the test script automatically for them any more, but those are all very minor things that hardly impact the analyst.&lt;/p&gt;
&lt;h2&gt;
  
  
  Things to do in V3
&lt;/h2&gt;

&lt;p&gt;Originally I wanted a GUI to load the DDL, where an input box would open up and the user would paste the DDL there. Couldn't quite spare the mental energy to learn Tkinter this time around but definitely something to explore.&lt;/p&gt;

&lt;p&gt;Another thing I envisioned when I set out to rebuild this was that this would actually send the queries to our Redshift Enviroment, run them, and return the results. Again biting off more than I can chew on this pass, primarily because I don't have permission to install the command line postgress tool needed to send commands to Redshift at work so something I'll look into that another day. &lt;/p&gt;
&lt;h1&gt;
  
  
  Git Repo
&lt;/h1&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/ronsoak" rel="noopener noreferrer"&gt;
        ronsoak
      &lt;/a&gt; / &lt;a href="https://github.com/ronsoak/SQL_Test_Script_Gen" rel="noopener noreferrer"&gt;
        SQL_Test_Script_Gen
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      A python application for generating SQL test scripts based off of a table DDL.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/ronsoak/SQL_Test_Script_Gen./assets/git_header.PNG"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fronsoak%2FSQL_Test_Script_Gen.%2Fassets%2Fgit_header.PNG" alt="git_header"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;SQL Test Script Generator&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;For Redshift Table Builds&lt;/em&gt;&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;What is this?&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;This is a python script that reads a table build written in Redshift Syntax and then for every column outputs a text file with some basic test scripts.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Why?&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;Part of thorough testing in SQL is ensuring that each column you have pulled through is operating as expected, this script quickly creates the basic tests you should run, leaving you to write more personalised tests.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Pre-requisites&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Latest version of Python 3.0+&lt;/li&gt;
&lt;li&gt;Tables built using Redshift, will not (currently) work on tables built in Oracle, TSQL,Postgresql,MySQL etc.....&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Instalation / Configuration&lt;/h2&gt;

&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;Download the git as a Zip, you only really need 'Launch_Me.bat' and 'Gen_Test_Script.py'&lt;/li&gt;
&lt;li&gt;Place these two files in the folder of your choosing&lt;/li&gt;
&lt;li&gt;Edit 'Launch_Me.bat' and go to line 11&lt;/li&gt;
&lt;li&gt;Edit 'C:/path to your python install/python.exe' to be where your python.exe is installed&lt;/li&gt;
&lt;li&gt;Edit 'c:/path to this script/Gen_Test_Script.py' to be…&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/ronsoak/SQL_Test_Script_Gen" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;





&lt;p&gt;&lt;em&gt;Reminder: All views expressed here are my own and do not represent my employer, Xero.&lt;/em&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  Who am I?
&lt;/h4&gt;

&lt;p&gt;&lt;iframe class="tweet-embed" id="tweet-1183218570164965376-807" src="https://platform.twitter.com/embed/Tweet.html?id=1183218570164965376"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-1183218570164965376-807');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1183218570164965376&amp;amp;theme=dark"
  }



&lt;/p&gt;

&lt;h4&gt;
  
  
  You should read....
&lt;/h4&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


</description>
      <category>python</category>
      <category>sql</category>
      <category>database</category>
      <category>testing</category>
    </item>
    <item>
      <title>Whelp, they got all our data, now what? - A guide, well a lecture first, then a guide.</title>
      <dc:creator>Alex Antra</dc:creator>
      <pubDate>Mon, 09 Mar 2020 11:27:03 +0000</pubDate>
      <link>https://dev.to/alexantra/whelp-they-got-all-our-data-now-what-a-guide-well-a-lecture-first-then-a-guide-1ce9</link>
      <guid>https://dev.to/alexantra/whelp-they-got-all-our-data-now-what-a-guide-well-a-lecture-first-then-a-guide-1ce9</guid>
      <description>&lt;p&gt;&lt;em&gt;Reminder: All views expressed here are my own and do not represent my employer, Xero.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Remember how we laughed at previous generations for thinking smoking was healthy,  that sun burn was harmless and knowing more than one language made you dumb?&lt;/p&gt;

&lt;p&gt;Well chuckle a bit quieter as it won't be long before people are laughing at us. "You did WHAT with your data!!!!!!!!?" &lt;br&gt;
"You DIDN'T read the terms of service???"&lt;br&gt;
"You LET them track you???"&lt;/p&gt;

&lt;p&gt;For we are, and I won't mince words here, the data stupid generation. &lt;/p&gt;

&lt;p&gt;We have given away so much of our data to companies who made billions off of it with absolutely nothing in return, we have sacrificed privacy, safety and in some circumstances our own genetic blueprints all because we were so oblivious as to what was happening.&lt;/p&gt;

&lt;p&gt;And while I will concede that we didn't know what was happening until it was too late, much like Edna with her 1963 Lincoln Continental and 6 pack a day habit, it's something we should be trying to rectify as soon as possible.&lt;br&gt;
&lt;a href="https://i.giphy.com/media/Zx12n2W9mDpNS/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/Zx12n2W9mDpNS/giphy.gif" alt="Edna"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  So how bad is it?
&lt;/h1&gt;

&lt;p&gt;To quote &lt;a href="https://haveibeenpwned.com/Privacy" rel="noopener noreferrer"&gt;HIBP&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Our data is leaked, sold, redistributed and abused to our detriment and beyond our control&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Let's first talk about the data we knowingly gave away.
&lt;/h2&gt;

&lt;p&gt;So think about every social media account, every newsletter, every loyalty program, and every other account you've set up in the past ten years.&lt;/p&gt;

&lt;p&gt;Every tweet, Facebook post, email, picture, video, GIF you put online you willingly made it the property of the company hosting it to do whatever they wanted to do with it. You agreed to it in the really really really long Terms of Service you skipped to the bottom and agreed to.&lt;/p&gt;

&lt;p&gt;Now we can discuss how intentionally problematic these sort of legal documents are another day but as a surprise to no one, &lt;a href="https://www.theguardian.com/commentisfree/2014/apr/24/terms-and-conditions-online-small-print-information" rel="noopener noreferrer"&gt;a clear majority of people&lt;/a&gt; do not read them, and of those who do, less actually understand them.&lt;/p&gt;

&lt;p&gt;But the devil is in the details.&lt;/p&gt;

&lt;p&gt;It's in &lt;a href="https://www.theguardian.com/technology/2014/apr/15/gmail-scans-all-emails-new-google-terms-clarify" rel="noopener noreferrer"&gt;Google's Terms and Conditions&lt;/a&gt; that they read all of your emails not just to use to cater adverts back to you, but to aide internal development (ML anyone) and other stuff (profit from information sharing).&lt;/p&gt;

&lt;p&gt;It was &lt;a href="https://www.theverge.com/2019/7/26/8932064/apple-siri-private-conversation-recording-explanation-alexa-google-assistant" rel="noopener noreferrer"&gt;Apples Privacy Policy&lt;/a&gt; that tells you that a human may listen to what you say to Siri to "better improve her recognition of pronunciation."&lt;/p&gt;

&lt;p&gt;Heck, if you read all the way to the bottom of the iTunes terms and conditions, you'll find that &lt;a href="https://www.businessinsider.com.au/apple-no-itunes-for-nuclear-weapon-2013-10?op=1&amp;amp;r=US&amp;amp;IR=T" rel="noopener noreferrer"&gt;you've agreed to NOT use iTunes in the creation of Nuclear Weapons&lt;/a&gt;. Something I think most of us will be fine with. &lt;br&gt;
&lt;a href="https://i.giphy.com/media/cEYFeDOOQ0cHqIgIEOA/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/cEYFeDOOQ0cHqIgIEOA/giphy.gif" alt="Nuclear"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And to be fair we all knew this, to some extent, that the stuff we put up on Facebook was used by them for their own purposes. What I think many people didn't realize was how long we had been doing it for, we were all so accustomed to websites coming and going it didn't need close scrutiny but next thing you knew we had willingly fed twitter five years worth of our likes, dislikes, political opinions, passing fancy's, and humor. &lt;/p&gt;

&lt;p&gt;We also didn't realize that while you may only put certain information online, the decade you spent on Facebook allowed them piece together the missing info like a Jigsaw.&lt;/p&gt;

&lt;p&gt;What pains me now is that we are filling these websites with information about our kids who don't get a chance to opt out, by the time they understand the problem you may have thrown away any chance of them ever having privacy. &lt;/p&gt;

&lt;p&gt;The things we willingly do, have massive ripple effects. Signing up to a loyalty scheme at a shoe store to receive $10 off your next order makes that shoe company a hell of a lot more than $10, people don't give money away for free. &lt;/p&gt;

&lt;p&gt;Your photos are scanned, your videos deconstructed, your emails read, and if you really need a wake up call, these companies are looking at your nudes Barry. Which actually happened at &lt;a href="https://www.engadget.com/2019/05/23/snapchat-employees-spied-snaplion-tool/" rel="noopener noreferrer"&gt;Snapchat!!!!!!!&lt;/a&gt;.&lt;br&gt;
&lt;a href="https://i.giphy.com/media/5jXpKwV4b1pde/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/5jXpKwV4b1pde/giphy.gif" alt="Small"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Let's talk about the stuff we unknowingly gave away
&lt;/h2&gt;

&lt;p&gt;I talked earlier about Facebook filling in the puzzle of the stuff you didn't post online. &lt;/p&gt;

&lt;p&gt;They all do a lot of that, all the time. You see, we leave a lot of footprints all over the world wide web. &lt;br&gt;
&lt;a href="https://i.giphy.com/media/EDMgatJnj6Z4Q/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/EDMgatJnj6Z4Q/giphy.gif" alt="Footprints"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;So you log in at home right, you do that every day for a month, through your ISP they know what country, city, and region you are in. Good things ISP's don't tell them your exact location right? Well your ISP won't but your GPS enabled cellular phone has just told them your exact longitude and latitude, your phone has also told them the SSID of your wifi so the next person who logs in on that wifi network (flatmate), Facebook now knows what house they are in. One person can betray the information of everyone in the house by being the final piece of the puzzle. &lt;/p&gt;

&lt;p&gt;Same with at work, Facebook notices that there's this second location you regularly log into five days a week and your Benedict Arnold of a phone has also told them the exact coordinates of your workplace as well.&lt;/p&gt;

&lt;p&gt;And it's not just your home location, &lt;a href="https://www.wired.com/story/google-location-tracking-turn-off/" rel="noopener noreferrer"&gt;they are tracking&lt;/a&gt;, or more accurately your phone is giving them a map of where you go. What super market you go to, what shops you visit, what shops you don't visit, your bar, your gym. &lt;/p&gt;

&lt;p&gt;They then use this, not just in advertising, but to enhance their profile of you, and to learn more about how the people in your city operate on mass. &lt;/p&gt;

&lt;p&gt;They know when certain streets are busiest, what a stores opening hours are and when it's busiest, and what sort of people are in a city based purely on mass data collection. And while some of that stuff is useful, they never asked, they tricked us, and now we are suffering the cost of that(more on exactly how we are suffering later). &lt;/p&gt;

&lt;p&gt;You don't even have to be logged in, Google can sometimes tell by your search habits that it's you, just give them enough time. In fact going in-incognito doesn't hide you from Google, at first it will just look like a new person in your house is looking for Busty Mature Women on the ole Porn Hub, but give them enough time (especially seeming the activity is happening on the same device every night at 10:55pm....BARRY) and google knows what ticks your box, even though you used 'Private Browsing.' &lt;/p&gt;
&lt;h2&gt;
  
  
  Let's talk about the stuff we didn't know they did
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Facial Recognition Training
&lt;/h3&gt;

&lt;p&gt;Before I purged my Facebook I downloaded a copy of everything on there. I had uploaded over 2,000 photos to the blue box in the sky. &lt;/p&gt;

&lt;p&gt;While a good chunk would have been memes, enough of them contained pictures of people, lets say 300 have humans in them. Then consider that Facebook has over &lt;a href="https://zephoria.com/top-15-valuable-facebook-statistics/" rel="noopener noreferrer"&gt;2.5 billion&lt;/a&gt; active users and we all now know how Facebook trained it's Facial Recognition algorithm.&lt;/p&gt;

&lt;p&gt;And while advancing the field of data science is a cool thing, Facebook has &lt;a href="https://thehill.com/policy/technology/358102-franken-blasts-facebook-for-accepting-rubles-for-us-election-ads" rel="noopener noreferrer"&gt;proven&lt;/a&gt; themselves to willingly sell their services to whomever can pay, regardless of the ethical implications. &lt;/p&gt;

&lt;p&gt;Enter stage left, the Chinese Government who use facial recognition to track every citizens movements and actions as part of some personality score, and we know exactly who is most likely to purchase that data. It's all fun and games until they use your data against for unethical reasons. We fucked up. &lt;br&gt;
&lt;a href="https://i.giphy.com/media/80TEu4wOBdPLG/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/80TEu4wOBdPLG/giphy.gif" alt="oops"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Cookie Stealing
&lt;/h3&gt;

&lt;p&gt;Did you know that &lt;a href="https://www.cnet.com/news/firefox-privacy-extension-keeps-facebook-from-tracking-you-on-the-web/" rel="noopener noreferrer"&gt;Facebook is tracking the other stuff&lt;/a&gt; you do on other websites? &lt;br&gt;
Without Facebook even being open they can track you, though all the embedded 'like' or 'share to Facebook' buttons found on other websites. If you can see a link to Facebook on a website, old Zuckerberg is watching you. &lt;br&gt;
&lt;a href="https://i.giphy.com/media/1zKdb4WSHgY4QKAsjo/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/1zKdb4WSHgY4QKAsjo/giphy.gif" alt="Zuck"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Stealing your data via the companies they own
&lt;/h3&gt;

&lt;p&gt;These big companies also steal your data by the simply fact that they own other companies.&lt;/p&gt;

&lt;p&gt;Those loyalty schemes are often owned by bigger companies, which is why 5 minutes after you sign up for your $5 voucher, your inbox is being flooded.&lt;/p&gt;

&lt;p&gt;Back onto hating on the big wigs.&lt;/p&gt;

&lt;p&gt;Did you know Facebook owns Instagram, WhatsApp, Occulus Rift and &lt;a href="https://en.wikipedia.org/wiki/List_of_mergers_and_acquisitions_by_Facebook" rel="noopener noreferrer"&gt;82 other companies&lt;/a&gt;? &lt;/p&gt;

&lt;p&gt;Google on the other hand is actually owned by a company called Alphabet, who owns everything Google, Nest, Sidewalk, is about to purchase Fitbit (yes all your health data will go to them), and nearly &lt;a href="https://en.wikipedia.org/wiki/List_of_mergers_and_acquisitions_by_Alphabet" rel="noopener noreferrer"&gt;300 other companies&lt;/a&gt;. At one stage Google bought more than one company a week. &lt;/p&gt;
&lt;h3&gt;
  
  
  Your data is bought and sold over and over again
&lt;/h3&gt;

&lt;p&gt;Unbeknownst to most of us, the selling and buying of our data has been what has made all of the companies filthy rich.&lt;/p&gt;

&lt;p&gt;Their are Data Enrichment companies who will sit as the middle man and buy your data off of Google, smash it together with what they have from Twitter and sell it to Facebook. One of those companies last year left one of their servers unprotected and the personal information of &lt;a href="https://haveibeenpwned.com/PwnedWebsites#PDL" rel="noopener noreferrer"&gt;1.2 billion people&lt;/a&gt; were leaked, which means that company was buying and selling data on over a billion people, you know how unlikely it is that your data wasn't in that leak?&lt;br&gt;
&lt;a href="https://i.giphy.com/media/2QoC6AM8YOQx2/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/2QoC6AM8YOQx2/giphy.gif" alt="odds"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Live Manipulation of pricing / products
&lt;/h3&gt;

&lt;p&gt;We are all aware of surge pricing right? Ubers are more expensive based on demand.&lt;/p&gt;

&lt;p&gt;But what if the Uber was more expensive because they knew you could afford it?&lt;/p&gt;

&lt;p&gt;That's exactly what online retailers like &lt;a href="https://www.businessinsider.com/amazon-price-changes-2018-8/?r=AU&amp;amp;IR=T" rel="noopener noreferrer"&gt;Amazon&lt;/a&gt; do. Because they have all of our demographics, interests, habits and buying history Amazon can show someone who they think can afford a higher price, a higher price, while in return only offer a sales price to someone they need to work hard to covert. &lt;/p&gt;

&lt;p&gt;Same with Banks, the information they have harvested on you, as well as your financial history will be used to make judgements on what to lend you. A practice some &lt;a href="https://www.cnbc.com/2018/10/05/new-kind-of-auto-insurance-can-be-cheaper-but-tracks-your-every-move.html" rel="noopener noreferrer"&gt;insurance companies&lt;/a&gt; have started to deploy.&lt;/p&gt;

&lt;p&gt;Just imagine if your health insurance went up because they saw that you had eaten at McDonald's???? Is that a world you want to live in? &lt;/p&gt;
&lt;h3&gt;
  
  
  Using your data to influence your actions
&lt;/h3&gt;

&lt;p&gt;And of course the one that has been all over the news. If the Facebook can use what they know about you to dynamically change what sort of content you can and can't see, how do you know that they aren't hiding stuff from you in order to influence your decision making?&lt;/p&gt;

&lt;p&gt;We already suspect (know) that &lt;a href="https://dev.towiki/Facebook"&gt;Russia influenced the previous American Election&lt;/a&gt; via Social media. What if your local government wants to roll out legislation to curb the amount of data Facebook can steal off of you? Can we trust them to not tweak the algorithm in their favor? Showing you more content that puts up arguments against the legislation? All signs point to no.&lt;/p&gt;
&lt;h2&gt;
  
  
  Let's talk about the breaches
&lt;/h2&gt;

&lt;p&gt;If it's not bad enough that all these websites are harvesting your data, they keep fucking losing it. &lt;/p&gt;

&lt;p&gt;According to &lt;a href="https://haveibeenpwned.com/" rel="noopener noreferrer"&gt;Have I Been Pwned&lt;/a&gt;, the Internets leading website on compromised user data, over 400 websites have lost user data due to breaches to a whopping NINE POINT FIVE BILLION USER ACCOUNTS. &lt;/p&gt;

&lt;p&gt;I will repeat again what I said earlier, the odds of your personal data not having been breached are infinitesimal right now. Your data is out there in an uncontrolled environment, what data exactly is still unknown, it could just be your NeoPets account (&lt;a href="https://www.vice.com/en_us/article/ezpvw7/neopets-hack-another-day-another-hack-tens-of-millions-of-neopets-accounts" rel="noopener noreferrer"&gt;yes&lt;/a&gt; they got hacked too), but hackers don't need a lot of information to get more from you. Hackers steal and buy your data to enable them to do something called &lt;a href="https://www.csoonline.com/article/2124681/what-is-social-engineering.html" rel="noopener noreferrer"&gt;Social Engineering&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Given enough data they can either engineer what your password might be, or use an old pass word to steal some accounts off you, thus learning more about you, or even ring up technical support call centers and pretend to be you to get access to your account. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/l3q2MDnkLri1t7i5a/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/l3q2MDnkLri1t7i5a/giphy.gif" alt="leak"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Why do they do it?
&lt;/h1&gt;

&lt;p&gt;Primarily it was for Marketing and Feature Usage tracking. Over time it evolved into what it is now.&lt;/p&gt;
&lt;h2&gt;
  
  
  Marketing
&lt;/h2&gt;

&lt;p&gt;With over a billion users on the internet it can be hard for some one wanting to advertise a product to reach the right audience. A maker of boutique headphones aimed at audiophiles could show an add to a million people and not get a single sale. To the untrained its hard to find out who should see your advert and who shouldn't. This is where Facebook / Twitter / Google excel. &lt;/p&gt;

&lt;p&gt;Because they have information on you, they break you down into what is known as segments. Segments become groupings that they offer to these people wanting to advertise.&lt;/p&gt;

&lt;p&gt;As a 30 year old white male, living in an English speaking western developed world who has an interest in technology, my segments probably look something like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Male&lt;/li&gt;
&lt;li&gt;Disposable Income&lt;/li&gt;
&lt;li&gt;Likely to buy tech&lt;/li&gt;
&lt;li&gt;English speaking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In fact my Facebook download had me as 'Starting Adult Life' which goes to show what years I actively used Facebook. &lt;/p&gt;

&lt;p&gt;The platform holder then charges the Headphone maker to show their adverts only to the segments that apply to the Headphone makers most likely customer base.&lt;/p&gt;

&lt;p&gt;No one claims that this is a 100% hit rate, which makes this the perfect scam. Facebook might only increase the success rate of the headphone makers adverts by 5%, especially when the advert might just be a bad advert or the product poorly priced, but to the Headphone maker, it was better than blind chance and so they pay for it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Feature Tracking
&lt;/h2&gt;

&lt;p&gt;A core part of development in tech is knowing what to develop next. The collection of this data has been, for the longest time, the backbone of development.  If a development team want to know what bug they should fix first, they would turn to feature usage data. If they want to know what kind of person is using X product to figure out how to entice Y person to use it, feature usage data, they want to know if a new feature is being used, again feature usage data. Its how these companies decide on what to develop. Now this isn't a get out of jail free card. They really only need anonymous data, and a lot more could be done in this space to make sure devs are only seeing the data needed to do their job. &lt;/p&gt;
&lt;h1&gt;
  
  
  What can we do?
&lt;/h1&gt;

&lt;p&gt;We need to be both preemptive to save our future data as well as clean up the past X amount of years of accounts we have created, it's no good setting up good passwords from here on out, if your data is going to get leaked by an account you didn't use any more. This is both an exercise in security as it is best data protection practice. &lt;/p&gt;
&lt;h2&gt;
  
  
  Check how badly you've been hacked.
&lt;/h2&gt;

&lt;p&gt;Go to &lt;a href="https://haveibeenpwned.com/" rel="noopener noreferrer"&gt;have i been pwned?&lt;/a&gt; and enter in every email you can remember ever using (even work emails). This will give you an indication of how badly compromised you are.&lt;/p&gt;
&lt;h2&gt;
  
  
  Close old accounts
&lt;/h2&gt;

&lt;p&gt;As a piggy back off of the above, close as many old accounts that you can remember and don't use any more.&lt;/p&gt;
&lt;h2&gt;
  
  
  Pretend your covered by GDPR
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://en.wikipedia.org/wiki/General_Data_Protection_Regulation" rel="noopener noreferrer"&gt;GDPR&lt;/a&gt; is a regulation passed by the EU to give Europeans the right to be forgotten. For example before GDPR if you closed your account with Facebook, Facebook didn't delete your data, they kept it and kept using it. Now with the GDPR in place, if requested by a member of the EU Facebook legally has to scramble the users data so they can't be identified any more or risk a 4% fine to their Revenue. &lt;/p&gt;

&lt;p&gt;Fun fact, most tech companies haven't even implemented GDPR properly as their products were never built to do this sort of thing, not to mention GDPR covers EU citizens wherever they live. Because no one has any way of proving whether someone who lives in Australia is or isn't an EU citizen, they won't challenge you. So once you've closed your account, ask them to remove your data under the GDPR. &lt;/p&gt;
&lt;h2&gt;
  
  
  Set up a spam email address
&lt;/h2&gt;

&lt;p&gt;For any account you want to keep open, or anything in the future you want to sign up to, have a personal email address for you and a spam email account for everything else (I have whats left of my Facebook tied to a spam email). Give it a generic name like &lt;a href="mailto:732643_8324732_824623@gmail.com"&gt;732643_8324732_824623@gmail.com&lt;/a&gt; and give it a good password (nothing similar to anything else you use please).&lt;/p&gt;
&lt;h2&gt;
  
  
  Mask your email
&lt;/h2&gt;

&lt;p&gt;Some email services will now offer you a masked email. &lt;a href="https://support.apple.com/en-nz/HT210425" rel="noopener noreferrer"&gt;Apple now does this&lt;/a&gt;, so you can be &lt;a href="mailto:debbie_taylor@icloud.com"&gt;debbie_taylor@icloud.com&lt;/a&gt;, but apple can offer you &lt;a href="mailto:876545678765_dfdsfu@icloud.com"&gt;876545678765_dfdsfu@icloud.com&lt;/a&gt; to use with everyone else on the internet. You still get the emails but it means people can't extract personal information from your email and prevents hackers from matching your email addresses across the internet. &lt;/p&gt;
&lt;h2&gt;
  
  
  Opt out
&lt;/h2&gt;

&lt;p&gt;Get into the habit of scrolling to the bottom of every spam email and hit 'unsubscribe', get yourself removed from those email lists, not only are they sending you advertising, they also include trackers in the rich content of advert to see how you interact with the advert. &lt;/p&gt;
&lt;h2&gt;
  
  
  Reset your advertising ID
&lt;/h2&gt;

&lt;p&gt;Did you know that every software platform you use, tracks you with an &lt;a href="https://en.wikipedia.org/wiki/Advertising_ID" rel="noopener noreferrer"&gt;advertising ID&lt;/a&gt;, which helps the marketing people find you faster? I bet most people don't know that.&lt;/p&gt;

&lt;p&gt;It can be found on your iPhone, your Android, your Mac and your Windows, and you can reset it as often as you want, and you can even change some settings to make it harder for them to track you. I have BOTH change my settings to its harder to advertisers to track me and I have a reminder in my phone to reset my advertising id every month. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.groovypost.com/howto/five-helpful-keyboard-tips-for-typing-on-your-iphone-or-ipad/" rel="noopener noreferrer"&gt;Guide for iOS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.wikihow.com/Reset-Your-Advertising-ID-on-PC-or-Mac" rel="noopener noreferrer"&gt;Guide for Mac&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.groovypost.com/howto/five-helpful-keyboard-tips-for-typing-on-your-iphone-or-ipad/" rel="noopener noreferrer"&gt;Guide for Android&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.wikihow.com/Reset-Your-Advertising-ID-on-PC-or-Mac" rel="noopener noreferrer"&gt;Guide for Windows&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi-cdn.phonearena.com%2Fimages%2Farticles%2F227279-thumb%2Fiphone-ios-settings-privacy-advertising-limit-ad.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi-cdn.phonearena.com%2Fimages%2Farticles%2F227279-thumb%2Fiphone-ios-settings-privacy-advertising-limit-ad.png" alt="Reset Ad ID"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Read what apps can access
&lt;/h2&gt;

&lt;p&gt;Many apps on the iOS and Android app store access more than you would think, and while both Apple and Android are getting better at showing you what those apps access and giving you the ability to dictate what they can and can't do, it's still worth checking from time to time. The key things I look for are, what apps want to know my location, what apps have access to my contacts, and which apps can use my camera.&lt;/p&gt;

&lt;p&gt;A good starter guide for is &lt;a href="https://www.wired.com/story/how-to-check-app-permissions-ios-android-macos-windows/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Prevent Apps from tracking you and chose what data they can send back to the mother ship
&lt;/h2&gt;

&lt;p&gt;In the same vein as the above, start by making sure only apps that need to use your location are using it and turn it off for everything else. For example my voice memo app has location turned on. &lt;/p&gt;

&lt;p&gt;You can also allow or deny 'usage analytics' from being sent back to the device owner. This is standard across many devices and software and it is pitched as being 'for your benefit' as they can detect bugs and improve the software to your liking however, in this day in age my preference is to not believe any 'it's for your benefit rhetoric' when I know for a fact that data will either be sold or leaked.&lt;/p&gt;

&lt;p&gt;A handy guide for ios is &lt;a href="https://support.apple.com/en-us/HT202100" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This advice also applies to the software you use, a lot of apps, web browsers, computer programs, and other internet connected devices in your home will also most likely have a 'opt out of sharing usage analytics' option.&lt;/p&gt;
&lt;h2&gt;
  
  
  Use privacy focused  search engines.
&lt;/h2&gt;

&lt;p&gt;So everything you search in google can be used against you, not just by Google, but also by the law.&lt;/p&gt;

&lt;p&gt;Enter &lt;a href="https://duckduckgo.com/" rel="noopener noreferrer"&gt;Duck Duck Go&lt;/a&gt;. The privacy focused search engine, it doesn't track you and encrypts your search activity so even they can't see what you are doing. They have a door mat outside their office that says &lt;a href="https://onezero.medium.com/nothing-can-stop-google-duckduckgo-is-trying-anyway-718eb7391423" rel="noopener noreferrer"&gt;COME BACK WITH A WARRANT&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F5520%2F1%2AIL9RMMxagMHyyhQt8VI8nA.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F5520%2F1%2AIL9RMMxagMHyyhQt8VI8nA.jpeg" alt="warrant"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I've been using Duck Duck Go for over a year now and it's been a great decision as my ads get less and less targeted, it also helps me with my attempt to divorce google.&lt;/p&gt;

&lt;p&gt;Though a word of warning, your searching will get a tad harder, because google is always tracking you, it allows them to make searching the internet easy, while they can't predict what you will search they can make finding the right result easier as they know what to exclude, especially with vague search terms. &lt;/p&gt;

&lt;p&gt;When I started using Duck Duck Go I did notice that stuff I used to google using vague terms didn't get the results I wanted, but over time, both I and Duck Duck Go have improved. For example I live in Wellington New Zealand, when I google a restaurant google will know to show me the one in my city, because Duck Duck Go doesn't know where I am, I will often get restaurants of the same name in other countries.  &lt;/p&gt;

&lt;p&gt;Trust me it's worth it. &lt;/p&gt;
&lt;h2&gt;
  
  
  Use privacy focused Web Browsers / stop using Chrome.
&lt;/h2&gt;

&lt;p&gt;Stop using Google Chrome is the number one answer here. They have monopolized their way into the number one position and with that monopoly they are scraping monumental amounts of data. What makes it worse is that Chrome is also the base for many other web browsers like Microsoft Edge and Opera Browser, and even is the underlying code base for non web browsers like Slack and VSCode. What this means is the Chrome based tracking stuff is tracking you even if you aren't using Chrome, its up to the developers to turn that off.&lt;/p&gt;

&lt;p&gt;Well there are two options here.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://brave.com/" rel="noopener noreferrer"&gt;Brave&lt;/a&gt;: Brave browser is a chrome based browser, however it actively blocks everything and has been built with that purpose in mind. Its turned off all of Googles back doors and even provides you with a granular, per website, tool called Shield that allows you to block what you want. What shocks me is that YouTube stops working when you block all their trackers. Bit naughty. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.mozilla.org/en-US/firefox/" rel="noopener noreferrer"&gt;Firefox&lt;/a&gt;: Firefox is the only non-chrome base browser left (other than Safari). Not only is it stopping Chrome from being the true monopoly it also has a privacy focus similar to Brave. It actively stops trackers and gives you control over what you allow certain websites to do. Firefox where the first people to be able to provide a tool that stopped Facebook from looking at the other tabs you've got open. &lt;/p&gt;
&lt;h2&gt;
  
  
  Use end-end encryption services where possible
&lt;/h2&gt;

&lt;p&gt;When a company tells you your data is encrypted they often mean, it's encrypted from everyone else. This is alright at protecting your data from being intercepted, but should a hacker get access to the back end, like the NINE POINT FIVE BILLION TIMES ITS HAPPENED BEFORE the encryption means nothing. It also means a legal agency like the government can get access if they have a warrant. Cough America Cough. &lt;/p&gt;

&lt;p&gt;The phrase you want to be on the look out for is 'end to end encryption' messaging provider Telegram offers this on their secret chats function meaning, that if you used it, not even Telegram, or the government with a warrant can have a look. &lt;/p&gt;

&lt;p&gt;However do your research, WhatsApp is encrypted end-to-end, however its parent company Facebook does &lt;a href="https://medium.com/@gzanon/no-end-to-end-encryption-does-not-prevent-facebook-from-accessing-whatsapp-chats-d7c6508731b2" rel="noopener noreferrer"&gt;still have other ways of seeing that data&lt;/a&gt;, primarily by seeing it before it gets encrypted. So be warned. &lt;/p&gt;
&lt;h2&gt;
  
  
  Change your password / use password managers / don't be dumb
&lt;/h2&gt;

&lt;p&gt;Come one, you know the drill.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don't use the same password for everything&lt;/li&gt;
&lt;li&gt;Don't cycle passwords&lt;/li&gt;
&lt;li&gt;Don't make them easy, or write them down.&lt;/li&gt;
&lt;li&gt;Turn on two factor authentication&lt;/li&gt;
&lt;li&gt;Use a password manager like LastPass&lt;/li&gt;
&lt;li&gt;Change them often&lt;/li&gt;
&lt;li&gt;Keep an eye on Have I been Pwned, every time that your get pwned change all your passwords.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Don't use genetic testing companies
&lt;/h2&gt;

&lt;p&gt;In one of the most chilling revelations of a breach, &lt;a href="https://www.bionews.org.uk/page_136360" rel="noopener noreferrer"&gt;MyHeritage&lt;/a&gt; revealed that they had been hacked. It's not just passwords that are out there in the wild for the customers but their genetic information. A future where we could use a genetic imprint in lieu of a password has potentially already been ruined by this sort of thing. I would urge you to resist using any of these services until security gets a bit better.&lt;/p&gt;
&lt;h2&gt;
  
  
  Read the ts &amp;amp; cs
&lt;/h2&gt;

&lt;p&gt;Easier said than done right. You can use this website, &lt;a href="https://tosdr.org/" rel="noopener noreferrer"&gt;Terms of Service: Didn't Read&lt;/a&gt;, to get terms and conditions boiled down for easier consumption. &lt;/p&gt;
&lt;h2&gt;
  
  
  Download your data.
&lt;/h2&gt;

&lt;p&gt;In recent years, most websites give you the ability to download all of your data that they hold on you. I recommend you do this for any major platform you use. When I downloaded all my Facebook data I could even see what guesses they had made about who I was for marketing purposes. &lt;/p&gt;
&lt;h2&gt;
  
  
  Actively delete old data
&lt;/h2&gt;

&lt;p&gt;In the past year I deleted everything off of Facebook (after downloading it) and delete all my old tweets. This doesn't remove them from the platform holders data but it prevents other sites, like Data Enrichment companies, from scraping them and selling them on.&lt;/p&gt;

&lt;p&gt;For twitter there are some third parties who will bulk delete tweets for you.&lt;br&gt;
For Facebook I used a Chrome Extension that deleted posts for you by simulating the mouse clicks over and over again. That took a few days. &lt;/p&gt;



&lt;p&gt;&lt;em&gt;Reminder: All views expressed here are my own and do not represent my employer, Xero.&lt;/em&gt;&lt;/p&gt;


&lt;h4&gt;
  
  
  Who am I?
&lt;/h4&gt;

&lt;p&gt;&lt;iframe class="tweet-embed" id="tweet-1183218570164965376-162" src="https://platform.twitter.com/embed/Tweet.html?id=1183218570164965376"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-1183218570164965376-162');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1183218570164965376&amp;amp;theme=dark"
  }



&lt;/p&gt;

&lt;h4&gt;
  
  
  You should read....
&lt;/h4&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


</description>
      <category>data</category>
      <category>security</category>
      <category>guide</category>
    </item>
    <item>
      <title>Data Analyst? TIME TO LEARN PYTHON!!!🐍🐼</title>
      <dc:creator>Alex Antra</dc:creator>
      <pubDate>Wed, 29 Jan 2020 11:06:07 +0000</pubDate>
      <link>https://dev.to/alexantra/data-analyst-time-to-learn-python-if0</link>
      <guid>https://dev.to/alexantra/data-analyst-time-to-learn-python-if0</guid>
      <description>&lt;p&gt;&lt;a href="https://i.giphy.com/media/tKwAHuwjcHkV0eLzcO/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/tKwAHuwjcHkV0eLzcO/giphy.gif" alt="Question" width="480" height="270"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Do you work with data?&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;If so...&lt;/em&gt;&lt;br&gt;
You need to learn the language of the snake...&lt;br&gt;
&lt;a href="https://i.giphy.com/media/ZyA2bpUQ4gVqg/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/ZyA2bpUQ4gVqg/giphy.gif" alt="Snake" width="500" height="221"&gt;&lt;/a&gt;&lt;br&gt;
No, not that snake language!... &lt;br&gt;
&lt;strong&gt;Python&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;After all...&lt;/em&gt;&lt;br&gt;
It is year of the Snake....&lt;br&gt;
&lt;a href="https://i.giphy.com/media/3oriOb1uHOt0MDrz1u/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/3oriOb1uHOt0MDrz1u/giphy.gif" alt="Wrong" width="500" height="250"&gt;&lt;/a&gt;&lt;br&gt;
oh...&lt;br&gt;
I'm being informed that it is in fact NOT year of the snake...&lt;/p&gt;

&lt;p&gt;Oh well the matter still resides... you should learn python.&lt;/p&gt;

&lt;p&gt;That means....Yes this is yet another learn python article. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Un48AUki--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i.imgflip.com/3n8927.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Un48AUki--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://i.imgflip.com/3n8927.jpg" alt="Spongebob" width="596" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But if it wasn't good advice people wouldn't keep saying it...&lt;br&gt;
&lt;a href="https://i.giphy.com/media/d8OJ0GIgmFa7ZSTuSe/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/d8OJ0GIgmFa7ZSTuSe/giphy.gif" alt="Truth" width="245" height="190"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The truth is that its fast becoming a staple skill here in the United Counties of Actionable Insights.&lt;/p&gt;

&lt;p&gt;See it's not just for engineers and scientists, and this isn't another 'SQL is dead, learn this instead' it's more that data analysis is becoming more than just  where clauses and group bys. &lt;/p&gt;

&lt;p&gt;No longer are the oppressed forced to load data into tables and have to set the right timezone, no longer do the downtrodden need to rely on excel or tableau to visualize data NO LONGER DO THE WEAK NEED TO REMEMBER TO ADD A SEMICOLON AFTER EVERY STATEMENT AND A COMMA BETWEEN COLUMNS.....&lt;br&gt;
&lt;a href="https://i.giphy.com/media/l2YWAN8HNBlfyNd5u/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/l2YWAN8HNBlfyNd5u/giphy.gif" alt="speech" width="480" height="270"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Okay... here's why:&lt;/p&gt;
&lt;h1&gt;
  
  
  BI is evolving
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/u1k1kpDZSw5sA/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/u1k1kpDZSw5sA/giphy.gif" alt="evolving" width="320" height="240"&gt;&lt;/a&gt;&lt;br&gt;
BI is rapidly moving away from just using data that fits neatly into relational databases. &lt;/p&gt;

&lt;p&gt;Unstructured data is getting more common, and while some teams do a good job at smushing it into a format that is consumable for a SQL database, that's not always possible or necessary. Other solutions are required if we are to provide results back to the business quickly with minimal overhead. &lt;/p&gt;

&lt;p&gt;BI is also moving away from just a suite of reports to a suite of products, products that are part of a CI/CD pipeline and developed using languages other than SQL and while some of them will be built in Java or C you can build a lot of things with Python.&lt;/p&gt;

&lt;p&gt;And while it may seem like Data Science is sprinting in the opposite direction from BI, its really not going to be long until Data Science is seen as a part of BI. Expect every self respecting Data Platform to have at least one AI/ML/NN model that sits alongside the other models in the platform. At the moment AI / ML is primarily being written in Python, there's no guarantee you'll be able to use a Data Science model using SQL. &lt;/p&gt;

&lt;p&gt;And finally, BI has mostly been the process of looking at present and historic data and use those findings to guess at what you can do in the future. Data analysis is trending more and more towards making statistically sound predictive analysis and SQL currently isn't that good at that sort of thing, it reads tables not algorithms. There's no guarantee SQL will be able to, or will be the best at being able to leverage future predictive analysis. &lt;/p&gt;
&lt;h1&gt;
  
  
  Big Data is getting Bigger
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/l3V0ysgwJunVci98s/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/l3V0ysgwJunVci98s/giphy.gif" alt="Bigger" width="550" height="303"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Question&lt;/strong&gt;. What do people do now when they need to analyse data outside of a data warehouse? They use spreadsheets.&lt;/p&gt;

&lt;p&gt;However, the day of the spreadsheet is nearly over. Excel caps out between 500k and 1mil, in the world of big data, a million records could be the thirty minutes of events. Excel is not the adhoc analysis tool of the future, Python is. Crunching a couple of million rows of data in Python using Pandas is stupid easy, you can load in as much data as your RAM can take without any overheads, and if your're crunching too much data Python allows you to batch process data or randomly sample it, all with a few lines of code.&lt;br&gt;
&lt;a href="https://i.giphy.com/media/dQkcf8GANR0ps57oBH/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/dQkcf8GANR0ps57oBH/giphy.gif" alt="Simple" width="480" height="480"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  It can also help behind the scenes
&lt;/h1&gt;

&lt;p&gt;BI us as much about the back end as it is the front end. You can use Python as part of your ETL process, you can automate tasks, monitor platforms or even build better capabilities.&lt;/p&gt;

&lt;p&gt;For example Airflow is a data pipeline tool that is configured in Python, you can move data between systems using Airflow. &lt;/p&gt;

&lt;p&gt;In my team we've used Python to read our SQL code and produce test scripts (article incoming.)&lt;/p&gt;

&lt;p&gt;One of our scientists needed a data dump off of one of our internal systems and our platform team didn't have resource spare to get that data through our traditional channels, so they used Python to ping the API and directly import it in (don't worry it was above board). &lt;/p&gt;
&lt;h1&gt;
  
  
  It really is the tool of the future
&lt;/h1&gt;

&lt;p&gt;Python has been described as 'the second best coding language for everything' and it really does so many things effortlessly. Setting up a local web server to host a web app is literally two lines of code using Flask, we really are in the future.&lt;br&gt;
&lt;a href="https://i.giphy.com/media/ZZkCo8zKWtt2ZgozfX/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/ZZkCo8zKWtt2ZgozfX/giphy.gif" alt="Future" width="480" height="317"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's because of the above reasons that Python should be the next thing you should learn in your data career. Its going to offer you a more flexible and feature rich way to analyse data or improve the way you work over any other tool in your existing arsenal. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cWQnWl6A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://imgur.com/sASfOiZ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cWQnWl6A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://imgur.com/sASfOiZ.png" alt="Example" width="880" height="880"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  So how / what do I learn
&lt;/h1&gt;

&lt;p&gt;Well Python can feel overwhelming to learn because it can do anything, however we'll just focus on analyzing data with Python.&lt;/p&gt;

&lt;p&gt;You do this using Pandas and Jupyter.&lt;/p&gt;
&lt;h2&gt;
  
  
  Pandas
&lt;/h2&gt;

&lt;p&gt;Pandas is a library you import into Python and its brings with it the functionality to hold data in virtual tables and analyse it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html"&gt;Pandas Home Page&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notable things you can do in Pandas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Import data out of a CSV / API / Parquet / or the clipboard (love that one)&lt;/li&gt;
&lt;li&gt;Select, transform, join, group, aggregate just like SQL&lt;/li&gt;
&lt;li&gt;EXPLAIN - tell pandas to look at a data set and explain it to you and it will run away and tell you all sorts of random information about your data set, mins, max's, upper quartiles etc the works!&lt;/li&gt;
&lt;li&gt;Pivot data (management love pivots)&lt;/li&gt;
&lt;li&gt;Graph your data (using Matplotlib)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Jupyter Notebook
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://jupyter.org"&gt;Jupyter&lt;/a&gt; is the software you should use. It takes the form of a living document and allows you to present text and code in a chronological format. &lt;/p&gt;

&lt;p&gt;Why is this important? Unlike SQL, Python won't show it's results unless you ask it to and traditional code environments will output Python code in a terminal. Jupyter is the best tool for learning on as you can write code and execute it in blocks and then as you learn you can grow your code in blocks while still being able to see earlier blocks. &lt;/p&gt;
&lt;h2&gt;
  
  
  Tutorials
&lt;/h2&gt;

&lt;p&gt;So of course there are a million youtube videos and interactive code camps out there for you to pick up.&lt;/p&gt;

&lt;p&gt;The video that best helped me was this guy, Keith Galli, maybe that was because he seems genuinely interested in showing you Pandas and not growing his brand....&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/vmEHCJofslg"&gt;
&lt;/iframe&gt;
&lt;/p&gt;




&lt;h4&gt;
  
  
  Who am I?
&lt;/h4&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;
      &lt;div class="ltag__twitter-tweet__media ltag__twitter-tweet__media__video-wrapper"&gt;
        &lt;div class="ltag__twitter-tweet__media--video-preview"&gt;
          &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AwvwyjJ---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/tweet_video_thumb/EGujA1IUwAA9BiF.jpg" alt="unknown tweet media content"&gt;
          &lt;img src="/assets/play-butt.svg" class="ltag__twitter-tweet__play-butt" alt="Play butt"&gt;
        &lt;/div&gt;
        &lt;div class="ltag__twitter-tweet__video"&gt;
          
            
          
        &lt;/div&gt;
      &lt;/div&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--nC8rVxhG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/1120951641866326017/tzzd0z0s_normal.jpg" alt="Ronsoak 🏳️‍🌈🇳🇿 profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Ronsoak 🏳️‍🌈🇳🇿
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        @ronsoak
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      Who Am I❓&lt;br&gt; |🇳🇿  |🇬🇧  |🏳️‍🌈  &lt;br&gt;📊 Senior Data Analyst for &lt;a href="https://twitter.com/Xero"&gt;@Xero&lt;/a&gt; &lt;br&gt;⚠️ My words do not represent my company.&lt;br&gt;✍️ Writer on &lt;a href="https://twitter.com/ThePracticalDev"&gt;@ThePracticalDev&lt;/a&gt; (&lt;a href="https://t.co/kJy8nsYdeq"&gt;dev.to/ronsoak&lt;/a&gt;)&lt;br&gt;✍️ Writer on &lt;a href="https://twitter.com/Medium"&gt;@Medium&lt;/a&gt; (&lt;a href="https://t.co/zKVBza2SVQ"&gt;medium.com/@ronsoak&lt;/a&gt;)&lt;br&gt;🎨 I draw (&lt;a href="https://t.co/JJ7lqS1iiy"&gt;instagram.com/ronsoak_art/&lt;/a&gt;)&lt;br&gt;🛰️ I ❤️ space! 
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      03:11 AM - 13 Oct 2019
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=1183218570164965376" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=1183218570164965376" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=1183218570164965376" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;h4&gt;
  
  
  You should read....
&lt;/h4&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


</description>
      <category>python</category>
      <category>sql</category>
      <category>database</category>
      <category>analyst</category>
    </item>
    <item>
      <title>The Good, The Bad, The Ugly: Microsoft's three significant impacts on the world of Data.</title>
      <dc:creator>Alex Antra</dc:creator>
      <pubDate>Mon, 25 Nov 2019 11:19:42 +0000</pubDate>
      <link>https://dev.to/alexantra/the-good-the-bad-the-ugly-microsoft-s-three-significant-impacts-on-the-world-of-data-3dn5</link>
      <guid>https://dev.to/alexantra/the-good-the-bad-the-ugly-microsoft-s-three-significant-impacts-on-the-world-of-data-3dn5</guid>
      <description>&lt;p&gt;As someone who works in data, my relationship with Microsoft is mixed. &lt;/p&gt;

&lt;p&gt;It's like having a grumpy mean father, but has never missed a ballet recital and was there for you that one time you did the thing with the golf ball that your mother can never hear about. &lt;/p&gt;

&lt;p&gt;The tools Microsoft have provided to the field have been plentiful, some have even gone on to become staples of the data community, however, some of these are for better and some have been for worst. &lt;/p&gt;

&lt;h1&gt;
  
  
  The Good👍: Microsoft SQL Server + Supporting Cast.
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/MVyN3ZKHhFverXA8hv/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/MVyN3ZKHhFverXA8hv/giphy.gif" alt="Awards"&gt;&lt;/a&gt;&lt;br&gt;
SQL Server, SSMS, T-SQL, Analysis Services, Agent, are all industry standard tools for the data world.&lt;/p&gt;

&lt;p&gt;It's been in the &lt;a href="https://db-engines.com/en/ranking_trend" rel="noopener noreferrer"&gt;top three in terms of DB-ranking&lt;/a&gt;, since 2013 and the truth is, if you have a production transactional database built anytime in the past 10 years, It's probably T-SQL. &lt;/p&gt;

&lt;p&gt;And while it looks to be slowly loosing ground to other products and new ways of crunching data, it's definitely not going anywhere, new versions and more modern variants are available. Those in the Azure cloud will find several reasons to use the latest SQL Server. &lt;/p&gt;
&lt;h1&gt;
  
  
  The Bad👎: Microsoft Excel
&lt;/h1&gt;

&lt;p&gt;For many of us, Microsoft Excel is our first exposure to analysis. It taught us some fundamental stuff like tables, vlookups, pivot tables, graphs, and maybe a touch of VBA.&lt;/p&gt;

&lt;p&gt;However, Excel is the awesome party guest who has overstayed their welcome. Cool at the beginning, but now they really need to leave.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/8ip0NDz8dtdzG/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/8ip0NDz8dtdzG/giphy.gif" alt="Bad party"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This isn't me saying Excel has no place in 2019, not at all. &lt;/p&gt;

&lt;p&gt;My problem with Excel is that for many people and companies they never move on to other tools. They very much, double down on making Excel perform tasks it was never built to do. Choosing to learn and build suites of VBA plugins and hacky work-around's to do things other analytical tools can natively do in seconds.&lt;/p&gt;

&lt;p&gt;It's a case of the training wheels that never came off.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/VnaMMuJt5jbmE/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/VnaMMuJt5jbmE/giphy.gif" alt="Training Wheels"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Too many analysts are being robbed of the opportunity to take those training wheels off, and the blame lies in two distinct camps.&lt;/p&gt;

&lt;p&gt;Camp number one, is the customer. The customers who are themselves only familiar with Excel, it's come pre-installed on just about every computer they have ever touched. They also don't trust insights that they can't personally inspect the raw data of.  In this age of big data, we are still hearing 'Can I see it in Excel?' at an alarming frequency. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgflip.com%2F3hdvce.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgflip.com%2F3hdvce.jpg" alt="Excel"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Camp number two, is that Excel is a desert island. Analysts are stranded on this island, unsure which Island they should travel to next. The islands of Python, R, SQL are too far off in the distance, but closer are all the 'Excel Plugin' islands promising fresh water and tasty mango's. It's very easy to see how some very good analysts never make it to Python island and instead spend the rest of their lives searching stack overflow for answers to VBA questions. &lt;/p&gt;
&lt;h1&gt;
  
  
  The Ugly🤮: Microsoft Access
&lt;/h1&gt;

&lt;p&gt;Microsoft Access is the child who tried to help but instead ruined everything, it's not the child's fault, their heart was in the right place, however the results are disastrous. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/5yaCPstUOV9Kw/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/5yaCPstUOV9Kw/giphy.gif" alt="Silly child"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And that is what makes Microsoft Access the Ugly. It always ends in disaster. &lt;/p&gt;

&lt;p&gt;Billed as "Babies first Database" it's easy to pick up and administer. &lt;/p&gt;

&lt;p&gt;However nine times out of ten what ends up happening is that the Access Database quickly becomes a business critical piece of software being run off a random employees local machine. &lt;/p&gt;

&lt;p&gt;This quickly becomes a major security and data risk, often requiring the Data Warehouse team to quickly consume the access database to negate the risks it posed and to keep the fancy executives who had begun to rely on it at bay. &lt;/p&gt;

&lt;p&gt;It's often like fighting a hydra as for every one access database that is dealt with, two more pop up randomly amongst the business. It is for that exact reason that many companies running the Microsoft Suite do not allow Access to be installed at all. &lt;/p&gt;


&lt;h4&gt;
  
  
  Who am I?
&lt;/h4&gt;

&lt;p&gt;&lt;iframe class="tweet-embed" id="tweet-1183218570164965376-883" src="https://platform.twitter.com/embed/Tweet.html?id=1183218570164965376"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-1183218570164965376-883');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1183218570164965376&amp;amp;theme=dark"
  }



&lt;/p&gt;

&lt;h4&gt;
  
  
  You should read....
&lt;/h4&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


</description>
      <category>sql</category>
      <category>data</category>
      <category>microsoft</category>
      <category>excel</category>
    </item>
    <item>
      <title>The R.A.G (Redshift Analyst Guide): Troubleshooting process</title>
      <dc:creator>Alex Antra</dc:creator>
      <pubDate>Wed, 20 Nov 2019 11:17:45 +0000</pubDate>
      <link>https://dev.to/alexantra/the-r-a-g-redshift-analyst-guide-troubleshooting-process-339i</link>
      <guid>https://dev.to/alexantra/the-r-a-g-redshift-analyst-guide-troubleshooting-process-339i</guid>
      <description>&lt;p&gt;&lt;em&gt;Welcome to the R.A.G, a guide about Amazon's Redshift Database written for the Analyst's out there in the world who use it.&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Previously on the R.A.G....
&lt;/h3&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;





&lt;p&gt;&lt;a href="https://i.giphy.com/media/uKcWTUA3EHGLK/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/uKcWTUA3EHGLK/giphy.gif" alt="Stress" width="480" height="209"&gt;&lt;/a&gt;&lt;br&gt;
Are results slow?&lt;/p&gt;

&lt;p&gt;Is WLM killing your query?&lt;/p&gt;

&lt;p&gt;Do tables seem to not play ball? &lt;/p&gt;

&lt;p&gt;There is a LOT to consider when trying to solve the above. &lt;/p&gt;

&lt;p&gt;Here's the process for you to work through:&lt;/p&gt;
&lt;h2&gt;
  
  
  ✅ Check the Explain Plan.
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/jJA5fhjTuIOqc/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/jJA5fhjTuIOqc/giphy.gif" alt="Plan" width="480" height="270"&gt;&lt;/a&gt;&lt;br&gt;
Do this for a single query at a time, not your whole script. But start by getting Redshift to tell you how it's going to execute your query. Make sure to look for actions with high costs, sequential scans or nested loops. If you can avoid them, or break your query into smaller tasks this will help you a lot.&lt;/p&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;h2&gt;
  
  
  ✅ Understand the Distribution and Sorting of the tables you are dealing with
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/2kNyLY42v4rxjBoDJW/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/2kNyLY42v4rxjBoDJW/giphy.gif" alt="Understand" width="480" height="267"&gt;&lt;/a&gt;&lt;br&gt;
Whether the tables you are dealing with are built by you or someone else, their configuration could be working against you. &lt;/p&gt;

&lt;p&gt;Run the below query and make note of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;distyle_and_key: How is the table distributed across nodes?&lt;/li&gt;
&lt;li&gt;row_skew_ratio: This is the effectiveness of the dist key, the closer to 1 the better.&lt;/li&gt;
&lt;li&gt;first_sortkey: How is it sorted on the node?&lt;/li&gt;
&lt;li&gt;no_sort_keys: How many sort keys?&lt;/li&gt;
&lt;li&gt;sortkey_skew_ratio: This is the effectiveness of the sort key, the closer to 1 the better.&lt;/li&gt;
&lt;li&gt;percent_unsorted: How long has this table been since vacuum&lt;/li&gt;
&lt;li&gt;stats_needed: If yes, the table needs 'analyst statistics' before the leader node knows how to handle it properly.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;  &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;schema&lt;/span&gt;        &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;schema_location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;table&lt;/span&gt;         &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encoded&lt;/span&gt;       &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;are_columns_encoded&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;diststyle&lt;/span&gt;     &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;distyle_and_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sortkey1&lt;/span&gt;      &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;first_sortkey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sortkey1_enc&lt;/span&gt;  &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sortkey_compression&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sortkey_num&lt;/span&gt;   &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;no_sort_keys&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;skew_sortkey1&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sortkey_skew_ratio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;size&lt;/span&gt;          &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;size_in_blocks_mb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tbl_rows&lt;/span&gt;      &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;total_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;skew_rows&lt;/span&gt;     &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;row_skew_ratio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pct_used&lt;/span&gt;      &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;percent_space_used&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unsorted&lt;/span&gt;      &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;percent_unsorted&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stats_off&lt;/span&gt;     &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;stats_needed&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;    &lt;span class="n"&gt;svv_table_info&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt;   &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'table_name'&lt;/span&gt;
&lt;span class="k"&gt;limit&lt;/span&gt;   &lt;span class="mi"&gt;50&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;h2&gt;
  
  
  ✅ Run your query - read the error logs
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/mq5y2jHRCAqMo/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/mq5y2jHRCAqMo/giphy.gif" alt="Error" width="480" height="480"&gt;&lt;/a&gt;&lt;br&gt;
If you can run your query and its not being killed by the WLM or crashing, then check the Redshift error logs on how to make it run faster.&lt;/p&gt;

&lt;p&gt;The table contains an EVENT and a SOLUTION table, this may provide some key information on how to make your query run faster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;      &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;solution&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;querytxt&lt;/span&gt;

&lt;span class="k"&gt;from&lt;/span&gt;        &lt;span class="n"&gt;stl_alert_event_log&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;

&lt;span class="k"&gt;join&lt;/span&gt;        &lt;span class="n"&gt;stl_query&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;
&lt;span class="k"&gt;on&lt;/span&gt;          &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;

&lt;span class="k"&gt;where&lt;/span&gt;       &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;userid&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;usesysid&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pg_user&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;usename&lt;/span&gt; &lt;span class="k"&gt;ilike&lt;/span&gt; &lt;span class="s1"&gt;'%name%'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;--change to your name&lt;/span&gt;

&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;    &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_time&lt;/span&gt; &lt;span class="k"&gt;desc&lt;/span&gt;

&lt;span class="k"&gt;limit&lt;/span&gt;       &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  ✅ Are you fighting for resources?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/Ov5NiLVXT8JEc/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/Ov5NiLVXT8JEc/giphy.gif" alt="Fighting" width="500" height="281"&gt;&lt;/a&gt;&lt;br&gt;
In some scenarios your query may be slow because of lack of resources, or you had to wait until a slot opened up. Run the below query to see how your queries are being handled.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;      &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;process_queue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;slot_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;datediff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;queue_start_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;queue_end_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;q_wait_time_seconds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exec_start_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exec_end_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;datediff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exec_start_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exec_end_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exec_time_seconds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;final_state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;est_peak_mem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;querytxt&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;        &lt;span class="n"&gt;stl_wlm_query&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt;        &lt;span class="n"&gt;stv_wlm_service_class_config&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;on&lt;/span&gt;          &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;service_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;service_class&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt;        &lt;span class="n"&gt;stl_query&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;
&lt;span class="k"&gt;on&lt;/span&gt;          &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt;       &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;userid&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;usesysid&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pg_user&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;usename&lt;/span&gt; &lt;span class="k"&gt;ilike&lt;/span&gt; &lt;span class="s1"&gt;'%user%'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;--change to your name&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt;  &lt;span class="k"&gt;by&lt;/span&gt;   &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xid&lt;/span&gt; &lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  ✅ Understand best practice
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.giphy.com/media/mD5B5h1bPp7sy26YW8/giphy.gif"&gt;Best practice&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;






&lt;p&gt;&lt;em&gt;header image drawn by me&lt;/em&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  Who am I?
&lt;/h4&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;
      &lt;div class="ltag__twitter-tweet__media ltag__twitter-tweet__media__video-wrapper"&gt;
        &lt;div class="ltag__twitter-tweet__media--video-preview"&gt;
          &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AwvwyjJ---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/tweet_video_thumb/EGujA1IUwAA9BiF.jpg" alt="unknown tweet media content"&gt;
          &lt;img src="/assets/play-butt.svg" class="ltag__twitter-tweet__play-butt" alt="Play butt"&gt;
        &lt;/div&gt;
        &lt;div class="ltag__twitter-tweet__video"&gt;
          
            
          
        &lt;/div&gt;
      &lt;/div&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--nC8rVxhG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/1120951641866326017/tzzd0z0s_normal.jpg" alt="Ronsoak 🏳️‍🌈🇳🇿 profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Ronsoak 🏳️‍🌈🇳🇿
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        @ronsoak
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      Who Am I❓&lt;br&gt; |🇳🇿  |🇬🇧  |🏳️‍🌈  &lt;br&gt;📊 Senior Data Analyst for &lt;a href="https://twitter.com/Xero"&gt;@Xero&lt;/a&gt; &lt;br&gt;⚠️ My words do not represent my company.&lt;br&gt;✍️ Writer on &lt;a href="https://twitter.com/ThePracticalDev"&gt;@ThePracticalDev&lt;/a&gt; (&lt;a href="https://t.co/kJy8nsYdeq"&gt;dev.to/ronsoak&lt;/a&gt;)&lt;br&gt;✍️ Writer on &lt;a href="https://twitter.com/Medium"&gt;@Medium&lt;/a&gt; (&lt;a href="https://t.co/zKVBza2SVQ"&gt;medium.com/@ronsoak&lt;/a&gt;)&lt;br&gt;🎨 I draw (&lt;a href="https://t.co/JJ7lqS1iiy"&gt;instagram.com/ronsoak_art/&lt;/a&gt;)&lt;br&gt;🛰️ I ❤️ space! 
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      03:11 AM - 13 Oct 2019
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=1183218570164965376" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=1183218570164965376" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=1183218570164965376" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;h4&gt;
  
  
  You should read....
&lt;/h4&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;


</description>
      <category>data</category>
      <category>analytics</category>
      <category>sql</category>
      <category>aws</category>
    </item>
    <item>
      <title>The R.A.G (Redshift Analyst Guide): Things to avoid / Best Practice</title>
      <dc:creator>Alex Antra</dc:creator>
      <pubDate>Wed, 20 Nov 2019 11:16:38 +0000</pubDate>
      <link>https://dev.to/alexantra/the-r-a-g-redshift-analyst-guide-things-to-avoid-best-practice-1jbl</link>
      <guid>https://dev.to/alexantra/the-r-a-g-redshift-analyst-guide-things-to-avoid-best-practice-1jbl</guid>
      <description>&lt;p&gt;&lt;em&gt;Welcome to the R.A.G, a guide about Amazon's Redshift Database written for the Analyst's out there in the world who use it.&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Previously on the R.A.G....
&lt;/h3&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;





&lt;p&gt;Below is a collection of do's, don'ts, and 'be wary of's that we have come across in our travels. &lt;/p&gt;

&lt;p&gt;None of these are hard and fast, sometimes you have no choice but to do all of the Don'ts. &lt;/p&gt;




&lt;h1&gt;
  
  
  Things to Avoid
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/IdmfEtnMWPzOg/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/IdmfEtnMWPzOg/giphy.gif" alt="Avoid" width="311" height="177"&gt;&lt;/a&gt;&lt;br&gt;
A series of things NOT to do, mainly sourced from official documentation.&lt;/p&gt;
&lt;h2&gt;
  
  
  ❌ Don't Compress your first sort key
&lt;/h2&gt;

&lt;p&gt;Compressing your first compound sort key runs the risk of actually making your query run slower in some scenarios. &lt;/p&gt;

&lt;p&gt;From AWS: &lt;em&gt;"If sort key columns are compressed much more highly than other columns in the same query, range-restricted scans might perform poorly."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As a general rule of thumb compress your first sort key as raw. Feel free to compress the other sort keys.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;customer_id&lt;/span&gt;     &lt;span class="nb"&gt;char&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;encode&lt;/span&gt;  &lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;createddate&lt;/span&gt;     &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="n"&gt;encode&lt;/span&gt; &lt;span class="n"&gt;RAW&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;customer_type&lt;/span&gt;   &lt;span class="nb"&gt;char&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;encode&lt;/span&gt;  &lt;span class="n"&gt;ZSTD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;amount_due&lt;/span&gt;      &lt;span class="nb"&gt;decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;encode&lt;/span&gt; &lt;span class="n"&gt;AZ64&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;diststyle&lt;/span&gt;   &lt;span class="k"&gt;key&lt;/span&gt;
&lt;span class="n"&gt;distkey&lt;/span&gt;     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sortkey&lt;/span&gt;     &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;createddate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;customer_type&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Source:&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/c_Runlength_encoding.html"&gt;AWS Doco&lt;/a&gt;&lt;br&gt;
Source:&lt;a href="https://github.com/awslabs/amazon-redshift-utils/blob/master/src/Investigations/EarlyMaterialization.md"&gt;Sort Key Investigation&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  ❌ Don't select * unless it's a small query
&lt;/h2&gt;

&lt;p&gt;Redshift has a dedicated resource stream for handling small queries, so this rule doesn't apply to you if you are just wanting to do a quick select * from table where limit 50 as your query will be given its own resources.&lt;/p&gt;

&lt;p&gt;HOWEVER, for everything else you should never be doing select * from unless you absolutely NEED every column. Redshift works faster the fewer columns are pulled in.&lt;br&gt;
Source:&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.html"&gt;AWS Best Practice&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  ❌ Don't assign a column with NULL values as a Dist key
&lt;/h2&gt;

&lt;p&gt;If the column you set as your dist key has a lot of NULL values, then all the NULLS will end up on one slice. This could potentially cause a bad SKEW&lt;br&gt;
Source:&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-best-dist-key.html?tag=duckduckgo-d-20"&gt;AWS Doco&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  ❌ Don't join the same table multiple times in the same query
&lt;/h2&gt;

&lt;p&gt;Referencing the same table in a query can come at a high performance cost, explore other options like breaking down the query into smaller datasets or use a CASE expression.&lt;br&gt;
Source:&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.html"&gt;AWS Best Practice&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  ❌ Don't use sub-queries for large complex operations
&lt;/h2&gt;

&lt;p&gt;Avoid using sub-queries on data sets that have multiple conditions and are large in size. Sub-queries perform best over JOINS where its a simple IN clause.&lt;/p&gt;

&lt;p&gt;The example below shows a good use of sub-query over a join. The reason why the join condition is bad is that one data set is huge the other is small and by JOINING the data you will force data to be redistributed across nodes.&lt;br&gt;
Source:&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.html"&gt;AWS Best Practice&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  ❌ Don't use highly unique columns as a Dist Key
&lt;/h2&gt;

&lt;p&gt;For Example using a timestamp for a dist key would be bad.&lt;br&gt;
If we used a timestamp as our dist key, that would potentially lead to 86,000 unique dist keys PER day. This would vastly reduce the benefits of having a dist key.&lt;br&gt;
Source:&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-best-dist-key.html?tag=duckduckgo-d-20"&gt;AWS Doco&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  ❌ Don't use UNION, instead use UNION ALL
&lt;/h2&gt;

&lt;p&gt;When you use UNION, Redshift tries to remove any duplicate rows, so depending on the size of your data the performance overhead could be huge. Use UNION ALL instead and if you need to remove duplicate rows look at other methods to do so like a row_number and delete statement.&lt;/p&gt;

&lt;p&gt;UNION is believed to perform ~150% worse than UNION ALL.&lt;br&gt;
Source:&lt;a href="https://gist.github.com/slpsys/5e43d8237fd8aa924015"&gt;Investigation on Github&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  ❌ Don't use an Interleaved Sort key unless you 100% know what your doing
&lt;/h2&gt;

&lt;p&gt;Interleaved Sort keys are complicated, only use them if you know what you're doing, by default use compound sort keys.&lt;/p&gt;

&lt;p&gt;Not implementing an Interleaved sort key can result in very poor result return time and long write / update / vacuum wait times.&lt;br&gt;
Source:&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html"&gt;AWS Doco&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  ❌ Don't use Dist Style All on very large or small tables.
&lt;/h2&gt;

&lt;p&gt;About 500k-1m is the sweet spot for Dist Style ALL, remember it gets copied to every node.&lt;/p&gt;
&lt;h2&gt;
  
  
  ❌ Don't use LZO, when you can use ZSTD or AZ64
&lt;/h2&gt;

&lt;p&gt;LZO's best of all worlds compression has been replaced by ZSTD and AZ64 who do a better job. AZ64 should be used on your numbers, ZSTD on the rest.&lt;/p&gt;


&lt;h1&gt;
  
  
  Things to Be Wary Of
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/d3mlE7uhX8KFgEmY/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/d3mlE7uhX8KFgEmY/giphy.gif" alt="Thinking" width="480" height="264"&gt;&lt;/a&gt;&lt;br&gt;
A series of things to be aware of, most of them have come from my own experience.&lt;/p&gt;
&lt;h2&gt;
  
  
  🤔 Redshift Inserts columns in the order you specify - even if they don't line up
&lt;/h2&gt;

&lt;p&gt;If you build a table with the columns in this order:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer_id&lt;/li&gt;
&lt;li&gt;Createddate
&lt;/li&gt;
&lt;li&gt;Customer_type&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But your insert statement is in this order:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer_id
&lt;/li&gt;
&lt;li&gt;Customer_type
&lt;/li&gt;
&lt;li&gt;Createddate
Redshift will attempt to insert it in that order. So it will try and put customer_type data into the createddate column. If the data types don't match, Redshift will throw an error, but if they do match, you won't notice.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  🤔 Comparison Operators &amp;gt; LIKE &amp;gt; Similar to &amp;gt; REGEX/POSIX
&lt;/h2&gt;

&lt;p&gt;Applying Logic to your dataset comes at a cost in terms of performance.&lt;/p&gt;

&lt;p&gt;Comparison Operators such as &amp;lt; &amp;gt; = perform better than LIKE, which in turn perform better than Similar To.&lt;/p&gt;

&lt;p&gt;And anything and everything performs better than REGEX / POSIX &lt;/p&gt;
&lt;h2&gt;
  
  
  🤔 Conditional Logic on Joins performs the worst
&lt;/h2&gt;

&lt;p&gt;When we join tables on conditional logic, i.e join table on blah &amp;gt; blah or join table on blah between bleh and bloh.&lt;/p&gt;

&lt;p&gt;Redshift has no choice but to do a nested loop which means every SINGLE row in table a has to be checked against every row in table b, which can have massive amounts of overhead.&lt;/p&gt;
&lt;h2&gt;
  
  
  🤔 All Functions will come at a cost
&lt;/h2&gt;

&lt;p&gt;Using functions can slow down performance. For example where invoicedate &amp;gt;= date_trunc('month',sysdate) has a higher performance cost than invoicedate &amp;gt;= "2019-10-01".&lt;/p&gt;

&lt;p&gt;This will forever be a balancing act of course. Over time you will learn what Functions cost more&lt;/p&gt;
&lt;h2&gt;
  
  
  🤔 Some data types have a base size you can't reduce.
&lt;/h2&gt;

&lt;p&gt;For example varchar(4) takes up 8 bytes of space even though you have specified 4.&lt;/p&gt;
&lt;h2&gt;
  
  
  🤔 Query's against system tables can be slow
&lt;/h2&gt;

&lt;p&gt;Querying the system tables for logs or usage history can be slow as all of that stuff is taken care of by the leader node only. So they can't distribute that query out to the other nodes sadly.&lt;/p&gt;
&lt;h2&gt;
  
  
  🤔 CASE When statements take the first match
&lt;/h2&gt;

&lt;p&gt;Case when statements will take the first 'when' statement that is true, even if there are multiple that are true.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;exists&lt;/span&gt; &lt;span class="n"&gt;case_when_experiment&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;fruit&lt;/span&gt;   &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;case_when_experiment&lt;/span&gt;
&lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'apple'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;fruit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;case&lt;/span&gt; 
       &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;fruit&lt;/span&gt; &lt;span class="k"&gt;ilike&lt;/span&gt; &lt;span class="s1"&gt;'a%'&lt;/span&gt;  &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="s1"&gt;'apple first match'&lt;/span&gt;
       &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;fruit&lt;/span&gt; &lt;span class="k"&gt;ilike&lt;/span&gt; &lt;span class="s1"&gt;'%e'&lt;/span&gt;  &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="s1"&gt;'apple second match'&lt;/span&gt;
       &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;fruit&lt;/span&gt; &lt;span class="k"&gt;ilike&lt;/span&gt; &lt;span class="s1"&gt;'_p%'&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="s1"&gt;'apple third match'&lt;/span&gt;
       &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="s1"&gt;'no match'&lt;/span&gt;
       &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;case_when_experiment&lt;/span&gt;
&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The result of the select statement  is:&lt;br&gt;
Fruit: Apple &lt;br&gt;
Case: apple first match&lt;br&gt;
...even though all three of the 'when' conditions are true. &lt;/p&gt;
&lt;h2&gt;
  
  
  🤔 Avoid Delete if you can drop/truncate. Take note of what UPDATE does.
&lt;/h2&gt;

&lt;p&gt;Delete doesn't delete the row. It hides it and marks it to be deleted when you next vacuum.&lt;/p&gt;

&lt;p&gt;Likewise, UPDATE, just hides the row and replaces it with a new row, and the next Vacuum will remove the hidden row.&lt;/p&gt;

&lt;p&gt;If you can, drop table or truncate. A table that is constantly deleted from without a vacuum will increase in size. &lt;/p&gt;
&lt;h2&gt;
  
  
  🤔 MD5 works faster than Func_sha1
&lt;/h2&gt;

&lt;p&gt;MD5 hashing has a faster read and write due to being 120-bit however its not as secure as func_SHA1.  Use MD5 when you want to obscure something without a security implication, using SHA1 for security&lt;/p&gt;
&lt;h2&gt;
  
  
  🤔 Be aware of the SQL execution order.
&lt;/h2&gt;

&lt;p&gt;SQL is executed in the following order:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FROM and JOIN are loaded into memory&lt;/li&gt;
&lt;li&gt;Filters in the WHERE &amp;amp; JOIN's are applied&lt;/li&gt;
&lt;li&gt;GROUP BY aggregates the data&lt;/li&gt;
&lt;li&gt;Having applies logic to the aggregates&lt;/li&gt;
&lt;li&gt;Select brings in the columns needed&lt;/li&gt;
&lt;li&gt;Order by sorts the final product&lt;/li&gt;
&lt;li&gt;Limit then limits the final data set&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So Limit 10 will not just crunch 10 rows, it will crunch them all and show you 10.&lt;/p&gt;


&lt;h1&gt;
  
  
  Things to Do
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/Ph05xuYgrX5te/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/Ph05xuYgrX5te/giphy.gif" alt="Do this" width="500" height="200"&gt;&lt;/a&gt;&lt;br&gt;
The opposite of the things to avoid list :) &lt;/p&gt;
&lt;h2&gt;
  
  
  ✔️ Use CHAR over VARCHAR if you have an exact length field.
&lt;/h2&gt;

&lt;p&gt;For example if we have a column for ORGID's which are ALWAYS 36char long. Redshift will perform better if you set it to CHAR(36) over VARCHAR(36)&lt;br&gt;
Source:&lt;a href="http://dwgeek.com/amazon-redshift-data-types-best-practices.html/"&gt;Best Practices by DWGeeks&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  ✔️ Use numeric / boolean logic where possible
&lt;/h2&gt;

&lt;p&gt;The cost to apply logic to numbers and and boolean is so much lower than if you are using strings. &lt;/p&gt;

&lt;p&gt;This is why OLTP systems will use number to represent things and why we use keys in Datamarts (dimmarketkey) as where dimmarketkey = 4 performs better than dimmarketkey = "Australia"&lt;/p&gt;
&lt;h2&gt;
  
  
  ✔️ Repeating the same filters on multiple tables still helps.
&lt;/h2&gt;

&lt;p&gt;If two tables you are joining have the same filters, specify them both, even if you feel like its redundant. See the below example from amazon&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- BAD EXAMPLE&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;listing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sellerid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;qtysold&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;listing&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;salesid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;listing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listid&lt;/span&gt;
&lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;listing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listtime&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2008-12-01'&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- GOOD EXAMPLE&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;listing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sellerid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;qtysold&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;listing&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;salesid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;listing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listid&lt;/span&gt;
&lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;listing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listtime&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2008-12-01'&lt;/span&gt;
&lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;saletime&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2008-12-01'&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Source:&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.html"&gt;AWS Best Practice&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  ✔️ Match Sort Keys to Group By clauses - but don't skip them.
&lt;/h2&gt;

&lt;p&gt;If you build a table, to later group it by cust_type, cust_city , and cust_plan, consider sorting the base table by Region, Industry, and Product Plan.&lt;/p&gt;

&lt;p&gt;However if you instead grouped by cust_type and cust_plan, the first and third sort key, you will lose all benefit of the sortkey.&lt;br&gt;
Source:&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.html"&gt;AWS Best Practice&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  ✔️ Match Order by and Group by clauses if possible.
&lt;/h2&gt;

&lt;p&gt;Do: &lt;/p&gt;

&lt;p&gt;group by a, b, c&lt;/p&gt;

&lt;p&gt;order by a, b, c&lt;/p&gt;

&lt;p&gt;Don't:&lt;/p&gt;

&lt;p&gt;group by b, c, a&lt;/p&gt;

&lt;p&gt;order by a, b, c&lt;/p&gt;

&lt;p&gt;Source:&lt;a href="https://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.html"&gt;AWS Best Practice&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;&lt;em&gt;header image drawn by me&lt;/em&gt;&lt;/p&gt;


&lt;h4&gt;
  
  
  Who am I?
&lt;/h4&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;
      &lt;div class="ltag__twitter-tweet__media ltag__twitter-tweet__media__video-wrapper"&gt;
        &lt;div class="ltag__twitter-tweet__media--video-preview"&gt;
          &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AwvwyjJ---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/tweet_video_thumb/EGujA1IUwAA9BiF.jpg" alt="unknown tweet media content"&gt;
          &lt;img src="/assets/play-butt.svg" class="ltag__twitter-tweet__play-butt" alt="Play butt"&gt;
        &lt;/div&gt;
        &lt;div class="ltag__twitter-tweet__video"&gt;
          
            
          
        &lt;/div&gt;
      &lt;/div&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--nC8rVxhG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/1120951641866326017/tzzd0z0s_normal.jpg" alt="Ronsoak 🏳️‍🌈🇳🇿 profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Ronsoak 🏳️‍🌈🇳🇿
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        @ronsoak
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      Who Am I❓&lt;br&gt; |🇳🇿  |🇬🇧  |🏳️‍🌈  &lt;br&gt;📊 Senior Data Analyst for &lt;a href="https://twitter.com/Xero"&gt;@Xero&lt;/a&gt; &lt;br&gt;⚠️ My words do not represent my company.&lt;br&gt;✍️ Writer on &lt;a href="https://twitter.com/ThePracticalDev"&gt;@ThePracticalDev&lt;/a&gt; (&lt;a href="https://t.co/kJy8nsYdeq"&gt;dev.to/ronsoak&lt;/a&gt;)&lt;br&gt;✍️ Writer on &lt;a href="https://twitter.com/Medium"&gt;@Medium&lt;/a&gt; (&lt;a href="https://t.co/zKVBza2SVQ"&gt;medium.com/@ronsoak&lt;/a&gt;)&lt;br&gt;🎨 I draw (&lt;a href="https://t.co/JJ7lqS1iiy"&gt;instagram.com/ronsoak_art/&lt;/a&gt;)&lt;br&gt;🛰️ I ❤️ space! 
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      03:11 AM - 13 Oct 2019
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=1183218570164965376" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=1183218570164965376" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=1183218570164965376" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  You should read....
&lt;/h4&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>data</category>
      <category>analytics</category>
      <category>sql</category>
      <category>aws</category>
    </item>
    <item>
      <title>The R.A.G (Redshift Analyst Guide): Understanding the Query Plan (Explain)</title>
      <dc:creator>Alex Antra</dc:creator>
      <pubDate>Wed, 20 Nov 2019 11:15:18 +0000</pubDate>
      <link>https://dev.to/alexantra/the-r-a-g-redshift-analyst-guide-understanding-the-query-plan-explain-360d</link>
      <guid>https://dev.to/alexantra/the-r-a-g-redshift-analyst-guide-understanding-the-query-plan-explain-360d</guid>
      <description>&lt;p&gt;&lt;em&gt;Welcome to the R.A.G, a guide about Amazon's Redshift Database written for the Analyst's out there in the world who use it.&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Previously on the R.A.G....
&lt;/h3&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;





&lt;p&gt;Redshift has the ability to explain to you how it's going to interpret the query you are about to run, going so far as to estimate how hard it's going to be, how much data it's going to crunch, and what moving around of data it's going to have to do.&lt;/p&gt;

&lt;p&gt;This explanation (Query Plan) can help you understand the cost your query is going to have on Redshift which may help give you some tips on how to improve it.&lt;/p&gt;

&lt;h1&gt;
  
  
  Reading the Query Plan
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/NFA61GS9qKZ68/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/NFA61GS9qKZ68/giphy.gif" alt="Reading"&gt;&lt;/a&gt;&lt;br&gt;
Example Query plan from AWS&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;QUERY PLAN
XN Merge  (cost=1015345167117.54..1015345167544.46 rows=1000 width=103)
  Merge Key: category.catname, sum(sales.pricepaid)
  -&amp;gt;  XN Network  (cost=1015345167117.54..1015345167544.46 rows=170771 width=103)
        Send to leader
        -&amp;gt;  XN Sort  (cost=1015345167117.54..1015345167544.46 rows=170771 width=103)
              Sort Key: category.catname, sum(sales.pricepaid)
              -&amp;gt;  XN HashAggregate  (cost=15345150568.37..15345152276.08 rows=170771 width=103)
                    Filter: (sum(pricepaid) &amp;gt; 9999.00)
                        -&amp;gt;  XN Hash Join DS_BCAST_INNER  (cost=742.08..15345146299.10 rows=170771 width=103)
                              Hash Cond: ("outer".catid = "inner".catid)
                              -&amp;gt;  XN Hash Join DS_BCAST_INNER  (cost=741.94..15342942456.61 rows=170771 width=97)
                                    Hash Cond: ("outer".dateid = "inner".dateid)
                                    -&amp;gt;  XN Hash Join DS_BCAST_INNER  (cost=737.38..15269938609.81 rows=170766 width=90)
                                          Hash Cond: ("outer".buyerid = "inner".userid)
                                          -&amp;gt;  XN Hash Join DS_BCAST_INNER  (cost=112.50..3272334142.59 rows=170771 width=84)
                                                Hash Cond: ("outer".venueid = "inner".venueid)
                                                -&amp;gt;  XN Hash Join DS_BCAST_INNER  (cost=109.98..3167290276.71 rows=172456 width=47)
                                                      Hash Cond: ("outer".eventid = "inner".eventid)
                                                      -&amp;gt;  XN Merge Join DS_DIST_NONE  (cost=0.00..6286.47 rows=172456 width=30)
                                                            Merge Cond: ("outer".listid = "inner".listid)
                                                            -&amp;gt;  XN Seq Scan on listing  (cost=0.00..1924.97 rows=192497 width=14)
                                                            -&amp;gt;  XN Seq Scan on sales  (cost=0.00..1724.56 rows=172456 width=24)
                                                      -&amp;gt;  XN Hash  (cost=87.98..87.98 rows=8798 width=25)
                                                            -&amp;gt;  XN Seq Scan on event  (cost=0.00..87.98 rows=8798 width=25)
                                                -&amp;gt;  XN Hash  (cost=2.02..2.02 rows=202 width=41)
                                                      -&amp;gt;  XN Seq Scan on venue  (cost=0.00..2.02 rows=202 width=41)
                                          -&amp;gt;  XN Hash  (cost=499.90..499.90 rows=49990 width=14)
                                                -&amp;gt;  XN Seq Scan on users  (cost=0.00..499.90 rows=49990 width=14)
                                    -&amp;gt;  XN Hash  (cost=3.65..3.65 rows=365 width=11)
                                          -&amp;gt;  XN Seq Scan on date  (cost=0.00..3.65 rows=365 width=11)
                              -&amp;gt;  XN Hash  (cost=0.11..0.11 rows=11 width=10)
                                    -&amp;gt;  XN Seq Scan on category  (cost=0.00..0.11 rows=11 width=10)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Cost
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/jt38YxwGTevEkFWWoY/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/jt38YxwGTevEkFWWoY/giphy.gif" alt="Cost"&gt;&lt;/a&gt;&lt;br&gt;
As you can see, the word cost is found everywhere in the explain plan. Cost isn't the exact cost of how long it will run end to end, but is instead the relative cost of what it will take to execute the parts of the query. &lt;/p&gt;

&lt;p&gt;So what does '1015345167117.54..1015345167544.46 rows=1000 width=103' actually tell you?&lt;/p&gt;

&lt;p&gt;Well let's start with the REALLY long number, for one it's actually two numbers separated by two decimal points. So in this example it's 1015345167117.54 &amp;amp; 1015345167544.46 . The first number is the cost to get just the FIRST row of the operation, with the second being the cost to complete the whole job. The costs are cumulative as you go from bottom to top. So the top value is the total cost that includes all the ones below.&lt;/p&gt;

&lt;p&gt;Lets break down some theoretical cost scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;First cost is low, second is high:&lt;/strong&gt; This means that it's easy to get to the data but hard to complete the whole thing, this may just indicate a huge data set that it just needs to slug through. Not necessarily a bad thing, think of using some basic logic against a very large table . It could be an indication that some of your logic has an unreasonably high cost for what your wanting it to do, maybe you've saved a date field in a VARCHAR and Redshift is doing unnecessary conversion in the background.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;First and second cost are low:&lt;/strong&gt; This is a good thing, don't change a thing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;First and second cost are high (but both different):&lt;/strong&gt; This indicates possibly a bad join, where two large tables are being joined on each other and/or heavy amounts of logic are being applied to both tables. Or possibly you are including far too many actions in a single query, remember to keep code simple.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;First cost is high, second is about equal.:&lt;/strong&gt; This possibly indicates an overly complex query where it takes a lot of processing just to get the first row but once it has that it's not exponentially longer to complete the task. Think of searching a very large table for specifically three things without adding any other criteria.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Second is the rows part of the cost. This is an estimated amount of rows to return and relies heavily on table metadata, so tables that haven't been analysed in a while will return false values to the query plan and can lead to a bad query plan being executed. Row count is the closest thing to understanding how to improve the query as you can possibly add additional logic to bring that row count down.&lt;/p&gt;

&lt;p&gt;Last is the width, this refers to the size cost of the columns being returned in bytes which is why it's recommend that you only bring back the rows you need. The only way to reduce this is to select fewer columns. &lt;/p&gt;
&lt;h2&gt;
  
  
  Sequential Scan
&lt;/h2&gt;

&lt;p&gt;Where you see this, this means that Redshift will scan the entire object (table, cte, sub-query) all rows and all columns checking for the criteria you have specified. This is why it's important to only be dealing with tables that are as small in both rows and columns as possible to speed up query time. In scenarios where you are hitting a source table with a sequential scan, which is unavoidable as you will always need to go a source table at least once, this is where you need to be taking advantage of the table's dist key and sort key as those are the only ways to hit the table as fast as possible.&lt;/p&gt;
&lt;h2&gt;
  
  
  Inner and Outer
&lt;/h2&gt;

&lt;p&gt;The EXPLAIN output also references inner and outer tables. The inner table is scanned first, and appears nearer the bottom of the query plan. The inner table is the table that is probed for matches. It is usually held in memory, is usually the source table for hashing, and if possible, is the smaller table of the two being joined. The outer table is the source of rows to match against the inner table. It is usually read from disk. The query optimizer chooses the inner and outer table based on database statistics from the latest run of the ANALYZE command. The order of tables in the FROM clause of a query doesn't determine which table is inner and which is outer.&lt;/p&gt;
&lt;h2&gt;
  
  
  Join Types
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/l41YouCUUcreUabHW/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/l41YouCUUcreUabHW/giphy.gif" alt="Join"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Merge Join
&lt;/h3&gt;

&lt;p&gt;In a merge join both tables are perfect for each other. Which means that the join condition on each side is the dist key and the sort key. Meaning both tables perfectly line up without any meddling needed.&lt;/p&gt;

&lt;p&gt;This is the best join. Though rare to pull off.&lt;/p&gt;
&lt;h3&gt;
  
  
  Hash Join
&lt;/h3&gt;

&lt;p&gt;In a hash join, the join conditions aren't perfect for each other but Redshift can mange with a bit of work. So what Redshift does is look at both tables and between them creates a hash table which is like a lookup table that sits in the middle.&lt;/p&gt;

&lt;p&gt;Once Redshift has created the hash table it can then do its job and match the two.&lt;/p&gt;

&lt;p&gt;Obviously a Merge Join is better, but a Hash Join is fine if you can't swing a Merge, and is very favorable over a Nested Loop.&lt;/p&gt;
&lt;h3&gt;
  
  
  Nested Loop Join
&lt;/h3&gt;

&lt;p&gt;This is the bad one.&lt;/p&gt;

&lt;p&gt;A nested loop occurs when a hash table can't be created between the two. This occurs when you use conditional criteria in the join, like between or greater than.&lt;/p&gt;

&lt;p&gt;This will require the database to check every value in the left table against every value in the right table. The complexity of a Nested Loop Join would be “quadratic”, in that you need to do about N*N (or N²) different operations to process the join. Not great! Nested Loop Joins don’t hold up when you’re joining million-row tables together – your database might end up needing to complete trillions of operations to execute that join&lt;/p&gt;
&lt;h2&gt;
  
  
  Broadcast or Redistribution
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/hTWp5BTK4czW8/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/hTWp5BTK4czW8/giphy.gif" alt="moving"&gt;&lt;/a&gt;&lt;br&gt;
When Redshift has to do a join, it may have to move the data around its nodes to complete the join being asked of it. The task of moving that data around can take up time and so obviously if you can avoid this then you can speed up your query.&lt;/p&gt;
&lt;h3&gt;
  
  
  Broadcast
&lt;/h3&gt;

&lt;p&gt;In a broadcast, the data values from one side of a join are copied from each compute node to every other compute node, so that every compute node ends up with a complete copy of the data. &lt;/p&gt;

&lt;p&gt;If this is occurring you will see the phrase &lt;strong&gt;DS_BCAST_INNER&lt;/strong&gt; in the explain plan.&lt;/p&gt;
&lt;h3&gt;
  
  
  Redistribution
&lt;/h3&gt;

&lt;p&gt;In a redistribution, participating data values are sent from their current slice to a new slice (possibly on a different node). Data is typically redistributed to match the distribution key of the other table participating in the join if that distribution key is one of the joining columns.&lt;/p&gt;

&lt;p&gt;If redistribution is occurring, you will see the following phrases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DS_DIST_ALL_NONE:&lt;/strong&gt; No redistribution is required, because the inner table has already been distributed to every node using DISTSTYLE ALL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DS_DIST_NONE:&lt;/strong&gt; No tables are redistributed. Collocated joins are possible because corresponding slices are joined without moving data between nodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DS_DIST_INNER:&lt;/strong&gt; The inner table is redistributed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DS_DIST_OUTER:&lt;/strong&gt; The outer table is redistributed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DS_DIST_ALL_INNER:&lt;/strong&gt; The entire inner table is redistributed to a single slice because the outer table uses DISTSTYLE ALL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DS_DIST_BOTH:&lt;/strong&gt; Both tables are redistributed.&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;&lt;em&gt;header image drawn by me&lt;/em&gt;&lt;/p&gt;


&lt;h4&gt;
  
  
  Who am I?
&lt;/h4&gt;

&lt;p&gt;&lt;iframe class="tweet-embed" id="tweet-1183218570164965376-963" src="https://platform.twitter.com/embed/Tweet.html?id=1183218570164965376"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-1183218570164965376-963');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1183218570164965376&amp;amp;theme=dark"
  }



&lt;/p&gt;
&lt;h4&gt;
  
  
  You should read....
&lt;/h4&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>data</category>
      <category>analytics</category>
      <category>sql</category>
      <category>aws</category>
    </item>
    <item>
      <title>The R.A.G (Redshift Analyst Guide): Data Types and Compression</title>
      <dc:creator>Alex Antra</dc:creator>
      <pubDate>Wed, 20 Nov 2019 09:53:18 +0000</pubDate>
      <link>https://dev.to/alexantra/the-r-a-g-redshift-analyst-guide-data-types-and-compression-4a4e</link>
      <guid>https://dev.to/alexantra/the-r-a-g-redshift-analyst-guide-data-types-and-compression-4a4e</guid>
      <description>&lt;p&gt;&lt;em&gt;Welcome to the R.A.G, a guide about Amazon's Redshift Database written for the Analyst's out there in the world who use it.&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Previously on the R.A.G....
&lt;/h3&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;





&lt;p&gt;When building tables you determine what 'data type' the column(s) will be. You do this column by column and while it may seem silly to specify exactly what is going to go into each column, especially when Redshift can guess for you, it can be a big contributing factor in terms of speeding up performance and decreasing table size.&lt;/p&gt;

&lt;p&gt;You can also choose to compress your columns. &lt;br&gt;
Compression will allow for more data to fit inside a block, again decreasing table size. It will also improve the efficiencies of the Zone Maps, another thing that can speed up performance. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/3og0Ivispl1Xs5NCVi/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/3og0Ivispl1Xs5NCVi/giphy.gif" alt="smaller"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Data Types
&lt;/h1&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Data Keywords&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SMALLINT ,INT2&lt;/td&gt;
&lt;td&gt;This is for WHOLE numbers that only take up 2 bytes of data, range: -32768 to +32767&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;INTEGER, INT, INT4&lt;/td&gt;
&lt;td&gt;Also for whole numbers that only take up 4 bytes of date, range: -2147483648 to +2147483647&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BIGINT, INT8&lt;/td&gt;
&lt;td&gt;Also for whole numbers that only take up 8 bytes of date, range:  -9223372036854775808 to 9223372036854775807&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DECIMAL, NUMERIC&lt;/td&gt;
&lt;td&gt;For numbers with decimal points, up to 38 digits total. When you classify a column as decimal you must declare both the TOTAL length and then how many decimals. For example decimal(10,2) means ten numbers max with two decimal places, this equates to 8 digits on the left of the decimal, and 2 on the right.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REAL, FLOAT4&lt;/td&gt;
&lt;td&gt;For storing smaller, rounded down, floating point numbers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DOUBLE PRECISION, FLOAT , FLOAT8&lt;/td&gt;
&lt;td&gt;For storing larger, non rounded, floating point numbers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BOOLEAN, BOOL&lt;/td&gt;
&lt;td&gt;Boolean is a single byte flag which is either 1 or 0, true or false. Though it can hold a null value.  It can also get represented as a checkbox.  You can specify that the default value is true or false, if you don't specify a default value then the default value will be null&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CHAR, CHARACTER, NCHAR , BPCHAR&lt;/td&gt;
&lt;td&gt;CHAR is a fixed length text string, ignore references to NCHAR and BPCHAR those are old functionality merged all into one. CHAR always takes up all of the space you specify, so if you specify char(100) but only put 'Hi' into the column, the remain 98 characters of space will be filled with spaces. Which can cause issues with EXACT object matching. Use when your column is always going to be a fixed length. CHAR will always use up at least 4 bytes of data, even if you specify CHAR(2)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VARCHAR, CHARACTER VARYING, NVARCHAR, TEXT&lt;/td&gt;
&lt;td&gt;VARCHAR allows for varying character length which is good for free text fields. Unlike CHAR it will only use however much space has been entered. So a in a VARCHAR(100) the word 'Christmas' will only use 9 of that 100, saving space. VARCHAR will always use up at least 6 bytes of data, even if you specify VARCHAR(2)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DATE&lt;/td&gt;
&lt;td&gt;Use this for dealing with time spanning as small as a day, ie. this will not use / show / handle time.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TIMESTAMP, TIMESTAMP WITHOUT TIME ZONE&lt;/td&gt;
&lt;td&gt;Use this for dealing with time AND where your entire data warehouse is in the same timezone&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TIMESTAMPZ, TIMESTAMP WITH TIMEZONE&lt;/td&gt;
&lt;td&gt;Use this for dealing with time across various time zones&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h1&gt;
  
  
  Data Type Usage Examples
&lt;/h1&gt;

&lt;p&gt;Below are some examples  of how to use the above data types in your code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;TABLE_NAME&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;SMALLINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="n"&gt;INT4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="n"&gt;INT8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;REAL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="n"&gt;FLOAT4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;DOUBLE&lt;/span&gt; &lt;span class="nb"&gt;PRECISION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;FLOAT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="n"&gt;FLOAT8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;BOOLEAN&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;FALSE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;BOOL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;FALSE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;CHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;CHARACTER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;NCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="n"&gt;BPCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;CHARACTER&lt;/span&gt; &lt;span class="nb"&gt;VARYING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="n"&gt;NVARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;WITHOUT&lt;/span&gt; &lt;span class="n"&gt;TIMEZONE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="n"&gt;TIMESTAMPZ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COLUMN_NAME&lt;/span&gt;     &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TIMEZONE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Which Data Type to use?
&lt;/h2&gt;

&lt;p&gt;So there's a lot of data types to pick, and plenty of overlap, so why not just use VARCHAR(999) for everything and go about your day? Reasons, that's why! The below will help.&lt;/p&gt;
&lt;h3&gt;
  
  
  Dealing with numbers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Firstly, numbers are WAY more performant than text so you should never use CHAR or VARCHAR when you could be using INT, DECIMAL, or DATE.&lt;/li&gt;
&lt;li&gt;INTEGERS don't have decimal places, so don't use them when your customer needs to go down to the decimal level, i.e Currency.&lt;/li&gt;
&lt;li&gt;INT2 is obviously more performant than INT4/INT8 however it will cap out at 32767 so only use it for small numbers.&lt;/li&gt;
&lt;li&gt;When dealing with decimals, be smart about how many decimal points, it helps performance. If you only need two decimal places then just specify 2&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Dealing with Flags
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Bool is what you should use for flags,&lt;/li&gt;
&lt;li&gt;Don't use CHAR(1) or VARCHAR(1) when you could use BOOL as CHAR(1) uses up 4 bytes of data and VARCHAR(1) uses 6 bytes when BOOL uses 1 byte.&lt;/li&gt;
&lt;li&gt;Try and set a default flag as it will save you in the long run&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Dealing with Dates
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;When you are just dealing with the DATE, i.e the day, month, or year then use DATE&lt;/li&gt;
&lt;li&gt;For Time just use TIMESTAMP&lt;/li&gt;
&lt;li&gt;Only use TIMESTAMPZ when dealing with multiple time zones, good data warehouses use the same timezone across all data&lt;/li&gt;
&lt;li&gt;Never put a date into a VARCHAR&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Dealing with Text
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Don't use CHAR if you don't know how long the text is going to be, you will hit size errors, i.e. this text is too big for the field, or you will be forced to set the limit too high wasting space (char uses up all the space by filling it in with spaces).&lt;/li&gt;
&lt;li&gt;Conversely don't use VARCHAR if you know the length of all your values. For example a GUID, which is always 36 characters long should be char(36) not VARCHAR(36) as VARCHAR(36) is actually 40 bytes long. Redshift will perform better on char in these scenarios.&lt;/li&gt;
&lt;li&gt;Don't use VARCHAR for anything less than 6 bytes, you won't gain any space with VARCHAR(2)&lt;/li&gt;
&lt;li&gt;Don't use CHAR or VARCHAR if you are using a flag as BOOL will be quicker and smaller&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  Data Compression
&lt;/h1&gt;

&lt;p&gt;Compression, also known as Encoding, makes the column smaller. You can chose different types of compression for different scenarios, and some compression types can only be used on certain data types. In theory, compressing data too much can make it longer to read, however that's not often the case as Amazon makes sure it's compression methods balance out storage and reading. In some scenarios, compression actually can use up more space.&lt;/p&gt;
&lt;h2&gt;
  
  
  Compression Types
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/1hMgCfglrcw6HahbXp/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/1hMgCfglrcw6HahbXp/giphy.gif" alt="Smush"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Keyword&lt;/th&gt;
&lt;th&gt;Applicable Data Types&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Raw (no compression)&lt;/td&gt;
&lt;td&gt;RAW&lt;/td&gt;
&lt;td&gt;All Types&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AZ64&lt;/td&gt;
&lt;td&gt;AZ64&lt;/td&gt;
&lt;td&gt;INT2/INT4/INT8/DECIMAL/DATE/TIMESTAMP/TIMESTAMPZ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Byte Dictionary&lt;/td&gt;
&lt;td&gt;BYTEDICT&lt;/td&gt;
&lt;td&gt;All but BOOL &amp;amp; TEXT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delta&lt;/td&gt;
&lt;td&gt;DELTA&lt;/td&gt;
&lt;td&gt;INT2/INT4/INT8/DECIMAL/DATE/TIMESTAMP/TIMESTAMPZ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delta 32k&lt;/td&gt;
&lt;td&gt;DELTA32K&lt;/td&gt;
&lt;td&gt;INT4/INT8/DECIMAL/DATE/TIMESTAMP/TIMESTAMPZ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LZO&lt;/td&gt;
&lt;td&gt;LZO&lt;/td&gt;
&lt;td&gt;All but BOOL and FLOAT/FLOAT8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mostly8&lt;/td&gt;
&lt;td&gt;MOSTLY8&lt;/td&gt;
&lt;td&gt;INT2/INT4/INT8/DECIMAL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mostly16&lt;/td&gt;
&lt;td&gt;MOSTLY16&lt;/td&gt;
&lt;td&gt;INT4/INT8/DECIMAL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mostly32&lt;/td&gt;
&lt;td&gt;MOSTLY32&lt;/td&gt;
&lt;td&gt;INT8/DECIMAL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Run-length&lt;/td&gt;
&lt;td&gt;RUNLENGHT&lt;/td&gt;
&lt;td&gt;All but TEXT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text&lt;/td&gt;
&lt;td&gt;TEXT255, TEXT32K&lt;/td&gt;
&lt;td&gt;VARCHAR Only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zstandard&lt;/td&gt;
&lt;td&gt;ZSTD&lt;/td&gt;
&lt;td&gt;All but TEXT&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  What one do I use?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/3o7btPCcdNniyf0ArS/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/3o7btPCcdNniyf0ArS/giphy.gif" alt="Confusion"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Short Answer
&lt;/h3&gt;

&lt;p&gt;AZ64, unless it doesn't apply, then ZSTD for everything else. Most analysis won't require you to nail compression. &lt;/p&gt;
&lt;h3&gt;
  
  
  Long Answer
&lt;/h3&gt;

&lt;p&gt;Each compression has their specific use cases. Some are more general in their storage while some work in a very specific manner which lead to some very specific downsides if not used properly.&lt;/p&gt;

&lt;p&gt;Some TLDRs...&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RAW:&lt;/strong&gt; No compression, your supposed to not compress your first sort key, so use it there.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AZ64:&lt;/strong&gt; An aggressive compression algorithm with good savings and performance. A general all rounder for Integers and Dates. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BYTEDICT:&lt;/strong&gt; Creates a dictionary of values, and so repeated values take up no space. Only use on tables with values that repeat a lot. It's dictionary is limited to 256 values, before new ones get stored as RAW.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DELTA /DELTA32K:&lt;/strong&gt; Compresses data by only recording the difference between values. Only good if the delta (difference) is small and incremental. Large differences can actually cause DELTA to use more space than if it was uncompressed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LZO:&lt;/strong&gt; An aggressive compression algorithm with good savings and performance. Used to be the go to, though now you should use ZSTD or AZ64 instead as they are newer and perform better.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MOSTLY8/16/32:&lt;/strong&gt; This method is one you use when &lt;em&gt;most&lt;/em&gt; of the values in a column are 8/16/32 bits, however their are some outlying larger values. So by specifying a Mostyl8, you are saying the majority of values can be compressed to 8 bytes and the outliers left as raw. Only use when you know you can compress &lt;em&gt;most&lt;/em&gt; of the columns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RUNLENGHT:&lt;/strong&gt; Very similar to ByteDict in the fact that repeat values, are recycled with one value. It's not limited like ByteDict to a set number of values, however you shouldn't use this on any a sort key column  .&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TEXT255/TEXT32K:&lt;/strong&gt; For text values only, works similarly to ByteDict in that it makes a dictionary of values. TEXT255 stores values in 1 byte indexes and is limited to the first 245 unique words in a column before new values are stored uncompressed. TEXT32k stores values in 2 byte indexes and will keep storing new words until it hits a hard limit of 32k bytes of data for the combined dictionary. Only low unique text values will result in good compression.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ZSTD:&lt;/strong&gt; An aggressive compression algorithm with good savings and performance. Will seldom result in using more data than it saves unlike other compression method. Use this where AZ64 does not apply.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pro-Tip&lt;/strong&gt;: If sort key columns are compressed more aggressively than other columns in the same query, Redshift may perform poorly. &lt;/p&gt;
&lt;h3&gt;
  
  
  Redshift can tell you what it recommends.
&lt;/h3&gt;

&lt;p&gt;If you build a table and run the below command, Redshift will recommend, per column, what the compression should be and will even include it's guess at how MUCH the new compression will help by.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;analyse&lt;/span&gt; &lt;span class="n"&gt;compression&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;p&gt;&lt;em&gt;header image drawn by me&lt;/em&gt;&lt;/p&gt;


&lt;h4&gt;
  
  
  Who am I?
&lt;/h4&gt;

&lt;p&gt;&lt;iframe class="tweet-embed" id="tweet-1183218570164965376-966" src="https://platform.twitter.com/embed/Tweet.html?id=1183218570164965376"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-1183218570164965376-966');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1183218570164965376&amp;amp;theme=dark"
  }



&lt;/p&gt;
&lt;h4&gt;
  
  
  You should read....
&lt;/h4&gt;


&lt;div class="ltag__link"&gt;
  &lt;div class="ltag__link__content"&gt;
    &lt;div class="missing"&gt;
      &lt;h2&gt;Article No Longer Available&lt;/h2&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>data</category>
      <category>analytics</category>
      <category>sql</category>
      <category>aws</category>
    </item>
  </channel>
</rss>
