geraldew

Posted on Jun 3, 2024

Changing SQL Dialect From Teradata To SQLite

#sql #database #data

Source of This Annotation

I recently had the occasion to convert an interesting SQL script of mine from being in Teradata SQL to running in SQLite. As I did so, I made notes of the various specific changes I had to make. From those, I have plucked out the concepts and some examples and then added annotations.It is not a master guide for the topic, rather is just some sharing of a single experience.

Environment

Some of the issues encountered are probably not about the difference in dialect, but are instead just that my place in the two environments are quite different. For Teradata the environment is mostly not in my control. For SQLite, while it is completely in my control, I've not added any elements that might make it more equivalent - largely because the intention is to easy to replicate.

The most immediate aspect of this is that of namespaces, where for Teradata I must use multiple namespaces as forced by system-wide settings. For SQLite everything happens inside a single namespace. These issues will be explicitly apparent in the text below, but it will not call out which are due to the dialect difference and which to the environment.

Details

Enough preamble - into the details we go. These are not in any specific order, as this has merely been adapted from run-and-hit-error approach to making all the required changes.

Table names

While in some senses obvious - that in changing from one platform to another, that you might not have exactly the same named tables present - another aspect is that the whole structure of name referencing can be different.
From
DatabaseName.TableName
To
TableName
However, that kind of change was going to cause problems for situations where I had actually made use of the different databases as name spaces.

For example, it happens that where I work, I don't have system permissions to create views and macros in the same databases where I can make tables. And while that is a hassle and requires me to instead make any Views and Macros in my own "user" database (because in Teradata, a user account IS a user's own database) that is also handy.

For example, I used my own space as a place to create a view
MyUser.NameOfPurpose
and then use it to make a table of the same name but different database location as in:

CREATE TABLE
    DatabaseName.NameOfPurpose
AS
SELECT
    *
FROM
    MyUser.NameOfPurpose

Which is nice and neat, but then poses a problem if we're translating everything to a single database/namespace.

I'm not saying the following is a great strategy, but it was my quick during the process choice to make use of prefixes in lieu of namespaces and it got me through this time.

So I would change from

MyUser.Something To
VEW_Something and from WorkingDatabase.MyUser_Something (because a polite convention where I work is to put our usernames as a prefix*) To TBL_Something

If I was going to do more of this, then I might come up with something more sophisticated.

(* by the way, the politeness is to not make other analysts look up the data dictionary to get the CREATORNAME and also has a nice effect that our general object names can't interfere with each other)

ANZSIC Resource changes

This section is unique to the specific purpose I had for this conversion, which was that is was using the Australian and New Zealand Standard Industrial Classification (ANZSIC) codes. Follow that link if you want to know more about these. My understanding is that while an independent standard used across Australia and New Zealand, there are similar things in other parts of the world.

And indeed I've been using these kinds of codes in my data work for over 20 years.

As the whole purpose of the SQL script I was converting is to do something interesting with data classified using these codes, necessarily I also had to ensure the handling of these codes was correctly adapted from one environment to the other.

I will leave this section here, it may still be of interest as the kind of changes that occur in this type of exercise.

Things I had to deal with included:

Reduce from 5 digit coding to just 4
Specific table and column names
Change some specific ANZSIC codes

Reduce from 5 digit coding to just 4

As it happens, in my workplace, we enhance some ANZSIC codes by adding an extra digit. In picking up both the ANZSIC definitions and some example data from public "open data" resources, these use only the four digit codes, some various changes had to be made to suit that.

Specific table and column names

Similarly, at my workplace, the reference tables for the ANZSIC codes were made for me by other people (and had the added 5th digit) so in downloading them independently from the ABS (Australian Bureau of Statistics) I don't have the same table structures on tap.

As it happened, this could have given me the chance to do something more appropriate in terms of the structures I would make but didn't want to be rewriting a lot of things while I was mainly just trying to adapt some 2,000 lines of SQL. So I chose a compromise, loading the lookup tables in a way that was convenient to construct from the downloads but then using a custom view to emulate the structure assumed by the script I was converting.

Change some specific ANZSIC codes

As the script that I was converting has a special feature of being able to isolate a single (or small set of) ANZSIC codes for treatment, I didn't have any wish to expose which specific ones my workplace had a special interest in. So I chose another one that seemed to have a similar distribution property (in totally different data, mind).

So that meant changing each place where the script had (the secret digit sequence):

'NNNN' and replacing it with
'8512' Note that those are both strings/chars because while many of the ANZSIC codes are digits, not all are.

Making Views Syntax Change

On Teradata you can make a View using either the keyword CREATE or the keyword REPLACE - with the latter working even when a view of that name already exists. As a consequence there is no reason not to always use REPLACE. But SQLite doesn't have this feature.

FWIW by contrast, Teradata SQL still doesn't have a form of DROP that is safe to use regardless of whether an item does or does not already exist. Yes, there are some work-arounds but that's a whole other topic.

Therefore, a first step is to search/replace throughout the script to find every

REPLACE VIEW and change it to
CREATE VIEW

Actually, that's not quite enough, because in part, one writes scripts so they can be run and re-run. For SQLite this is a simple matter of putting a bulletproof DROP before each CREATE

DROP VIEW NameOfView IF EXISTS ;
CREATE VIEW NameOfView AS

Making Macros Significant Change

While noting that, like with views, for its macro syntax Teradata allows a REPLACE MACRO as well as a CREATE MACRO - but that's not really the problem.

Instead, the situation is that SQLite doesn't appear to have any "macro" feature. Luckily, that's not quite true.

Because what SQLite does have, is triggers. And while a trigger has to be tied to some other database action, it provides a way of defining a statement that will execute.

Of course, a Teradata macro has quite a few other aspects, such as parameter passing, but in this case I didn't need any of that.

So for converting a Teradata Macro into SQL, what we do is create a Trigger for a view that does nothing. To execute the Trigger, we do a DELETE FROM for the view name. In SQLite the delete would fail on the view but still triggers the .. trigger to execute. Ergo, we have a kind of macro.

So, if we had a macro named MacroName, then in Teradata SQL we would do:

REPLACE MACRO MacroName AS
(
-- sql statements
;
)
;

And to get a similar effect in SQLite, we first create a dummy view:

CREATE VIEW 
  View4Trigger_MacroName
AS
SELECT
  CURRENT_TIME 
;

then we define a trigger that will do our bidding

CREATE TRIGGER
  Trigger4_MacroName
INSTEAD OF DELETE ON
  View4Trigger_MacroName
BEGIN
-- sql
END
;

and then we trigger the trigger, with a statement that does:

DELETE FROM 
  View4Trigger_MacroName
;

Whither Stored Procedures

p.s. Who said "Stored Procedure" ? Actually, you won't find "macro" mentioned anywhere - and that's because macros are a Teradata oddity. There's a lot I could say about this, but it's just Teradata history and not particularly relevant here. Teradata did eventually add Stored Procedures to its feature set - quite a long time ago now - but macros are still there and have some permission conveniences.

Remove Temporary Scaffolding Code

This point merely says something about how I write Teradata SQL, which is that:

I can be lazy about writing CREATE statements and so will use a CREATE .. AS construct to make a table and then use SHOW TABLE to get a CREATE statement ready to adapt.
but I also don't leave a script with a CREATE AS in it, and replace it with a CREATE for making an empty table and then do an INSERT to populate it.

As a consequence, I will end up with a script which included a whole bunch of CREATE AS WITH NO DATA and SHOW TABLE then DROP TABLE statement combinations lying around.

In the SQLite environment - and with the script feature complete - these were just extraneous and could be deleted.

Remove Character Set Declarations

One of the consequences of my CREATE AS and SHOW TABLE method is that the table creations I thereby get will have all the specific defaults applied and made explicit. In the Teradata environment that's quite a plus.

But they immediately caused me problems in SQLite and had to go.

So any column definitions that had the following text

CHARACTER SET LATIN NOT CASESPECIFIC had to have them removed.

Remove Set Table Declarations

Along the same line as the above, the Teradata (good) default is to have "set" tables (as opposed to "multiset") but this is not a feature of SQLite and so had to go.
Thus

CREATE SET TABLE becomes simply
CREATE TABLE

Remove More Table Declarations

And to cut a long story short, for my script the following stock clause also had to be removed from CREATE TABLE statements:

    FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1

UPI recoding

Where UPI = UNIQUE PRIMARY INDEX

In Teradata, the equivalent to a "primary key" is a "unique primary index". But as well as the keyword change, the SQLite syntax is different. In Teradata the UPI setting comes as a clause after the list of column definitions has been closed - with a ) character. In SQLite, the primary key setting is done inside the column list.

Actually in SQLite it has two forms of that but we'll only use one here.

From

  LastColumnName DataType )
UNIQUE PRIMARY INDEX ( 
  PrimaryIndexColumn )
;

  LastColumnName DataType , 
  PRIMARY KEY ( 
    PrimaryIndexColumn )
  )
;

SAMPLE and TOP

This is yet another one of those things that simply varies among SQL dialects - and frankly I have no which one, if either, is in the SQL standard.

Teradata has a SAMPLE clause, but as I was only using it for some minor data testing of view constructions, could be easily be replaced.

Ditto for the Teradata syntax of using SELECT TOP x at the beginning of a SELECT statement.

Both of those could be replaced by using the SQLite syntax of add LIMIT x at the end of a SELECT statement.

While not important anywhere, this was annoying to enact as it couldn't be done by simple search/replace actions in the editor.

No System Calendar Pseudo Table

This is another small thing that I've become used to doing on Teradata, because it supplies a built-in pseudo table that generates a full calendar.

While obviously useful for calendar related things, the fact that it has a day_of_calendar column makes it easy to construct scratch tables as if out of thin air.

In the script at hand I was using it as the base for some cross joins to generate a full set of digit combinations. As this was only a matter of ten rows - for the digits "0" to "9" - I chose to replace it with a real table and simply wrote ten insert to literally populate it with the desired digits.

DROP TABLE IF EXISTS
  TBL_1_Digits
;
CREATE TABLE 
  TBL_1_Digits
(
  DigitChar CHAR(1) ,
  PRIMARY KEY ( 
    DigitChar )
  )
;
INSERT INTO TBL_1_Digits ( DigitChar ) VALUES ( '0' ) ;
INSERT INTO TBL_1_Digits ( DigitChar ) VALUES ( '1' ) ;
INSERT INTO TBL_1_Digits ( DigitChar ) VALUES ( '2' ) ;
INSERT INTO TBL_1_Digits ( DigitChar ) VALUES ( '3' ) ;
INSERT INTO TBL_1_Digits ( DigitChar ) VALUES ( '4' ) ;
INSERT INTO TBL_1_Digits ( DigitChar ) VALUES ( '5' ) ;
INSERT INTO TBL_1_Digits ( DigitChar ) VALUES ( '6' ) ;
INSERT INTO TBL_1_Digits ( DigitChar ) VALUES ( '7' ) ;
INSERT INTO TBL_1_Digits ( DigitChar ) VALUES ( '8' ) ;
INSERT INTO TBL_1_Digits ( DigitChar ) VALUES ( '9' ) ;

In retrospect that was probably a better solution all round - I'm used to using the Teradata sys_calendar.CALENDAR for all kinds of things in many kinds of scales, so my general reasons for doing so remain valid.

Replace FULL OUTER JOIN

As SQLite does not support FULL OUTER JOIN then something will have to be done instead.

As it happens, this is a well covered topic, with some opinions being that FULL OUTER JOIN is something that should be avoided even where it is supported. The stock advice seems to be to replace it with a combination of UNION and GROUP BY structures.

Here is what I had as Teradata SQL

-- Attempt a coalesce of the two lookup methods
SELECT
  COALESCE( AZ_ML.At_Lvl_Cd, DC_AU.DigitsN ) AS At_Lvl_Cd ,
  COALESCE( AZ_ML.Lvl_At , CHARACTER_LENGTH( DC_AU.DigitsN) ) AS Lvl_At ,
  COALESCE( AZ_ML.Up_Lvl_Cd, DC_AU.DigitsM ) AS Up_Lvl_Cd
FROM
  DbName.MyUser_ANZSIC_DigitCharsAndUps AS DC_AU
  FULL OUTER JOIN
  DbName.MyUser_ANZSIC_Multi_Level_Lookup_Base AS AZ_ML ON
    AZ_ML.At_Lvl_Cd = DC_AU.DigitsN
;

And here is the SQLite replacement

-- Attempt a coalesce of the two lookup methods
SELECT
  U.At_Lvl_Cd ,
  MAX( U.Lvl_At ) AS Lvl_At ,
  MAX( U.Up_Lvl_Cd ) AS Up_Lvl_Cd
FROM
  ( -- U
    SELECT
      AZ_ML.At_Lvl_Cd AS At_Lvl_Cd ,
      AZ_ML.Lvl_At AS Lvl_At ,
      AZ_ML.Up_Lvl_Cd AS Up_Lvl_Cd
    FROM
      TBL_ANZSIC_Multi_Level_Lookup_Base AS AZ_ML
    UNION
    SELECT
      DC_AU.DigitsN AS At_Lvl_Cd ,
      LENGTH( DC_AU.DigitsN) AS Lvl_At ,
      DC_AU.DigitsM AS Up_Lvl_Cd
    FROM
      TBL_ANZSIC_DigitCharsAndUps AS DC_AU
    ) AS U
GROUP BY
  U.At_Lvl_Cd
;

To be frank, there is much more that can be said about FULL OUTER JOIN versus UNION .. GROUP BY but this was all I needed for the few places the issue was present in this script.

There is plenty to read elsewhere on this topic.

CHARACTER_LENGTH

While simple, this one caught me by surprise because I thought this function was part of the SQL "standard" - not that that ever means very much in practice. Anyway, the solution was a simple change of function name.

I replaced
CHARACTER_LENGTH
with
LENGTH

QUALIFY

Teradata is one of the dialects that has QUALIFY and uses it to enable filtering in the same query layer as a "window" function is declared. In short, SQLite does not have this.

In my case, this is not new, as I've previously had to re-code from Teradata to HiveQL, which is similar is having window functions but not a "QUALIFY" clause.

What it requires is adding an extra layer of table abstraction - as a derived table or a CTE (common table expression) and then use a WHERE clause to do what the QUALIFY did.

For example, here is how it might look in Teradata SQL

SELECT
    T.*
FROM
    TheTableName AS T
QUALIFY
    ROW_NUMBER() OVER ( 
        PARTITION BY 
            GroupCol 
        ORDER By 
            Sortcol 
        ) = 1
;

And a first step to conversion is to make the window function a named element in the SELECT clause. Note that Teradata is happy to apply the QUALIFY to that named value.

SELECT
    T.* ,
    ROW_NUMBER() OVER ( 
        PARTITION BY 
            GroupCol 
        ORDER By 
            Sortcol 
        ) AS Rw_Num
FROM
    TheTableName AS T
QUALIFY
  Rw_Num = 1
;

Now we can abstract all of the above and remove the QUALIFY and instead do the same filtering as a WHERE clause.

SELECT
    D_T.*
FROM
    ( -- D_T
        SELECT
            T.* ,
            ROW_NUMBER() OVER ( 
                PARTITION BY 
                    GroupCol 
                ORDER By 
                    Sortcol 
                ) AS Rw_Num
        FROM
            TheTableName AS T
        ) AS D_T
WHERE
  D_T.Rw_Num = 1
;

With the QUALIFY gone, the syntax is now ready to work in SQLite.

Do note that I'm not saying this will be internally planned and executed the exact same way - or that it won't, that being a complex per-platform topic.

Change from PRIMARY INDEX

Now, you might be excused for thinking this was already covered in the text above. But no, this is another twist, because it is UNIQUE PRIMARY INDEX which is the PRIMARY KEY equivalent.

Instead, in Teradata, a non-unique "PRIMARY INDEX" merely assists with data spreading at execution. While this technically means that the clause can just be dropped without having any effect, there's a good chance that the reason it was there will imply some thinking about what should be done instead.

In the one case of this for the conversion I was attempting I had used "PRIMARY INDEX" simply because I was too lazy to work out what column combination would be a UPI (i.e. primary key). Having re-assessed that, I made a useful selection and set a new PRIMARY KEY clause.

Change Syntax for Defining a Recursive View

While this was confusing to sort out, from comparative reading of documentation and examples, the change is simple enough - albeit with a couple of twists. Indeed, there were three issues to be dealt with here. Do note: I'm not going to try to explain recursive queries in this context - if you need to gain comfort with those then you will need to seek elsewhere.

The three issues are:

Syntax
Final Select
Naming

Syntax

On the face of it, it mainly looks like a change from the Teradata SQL

CREATE RECURSIVE VIEW
  VEW_ANZSIC_Multi_Level_Recursive_Lookup
(
  At_Lvl_Cd ,
  Lvl_At ,
  Up_Lvl_Cd ,
  Up_Lvl_At ,
  DegreeOfSep )
AS
(

To the SQLite form of

CREATE VIEW 
  VEW_ANZSIC_Multi_Level_Recursive_Lookup 
AS
WITH RECURSIVE 
  Recursive_Lookup
(
  At_Lvl_Cd ,
  Lvl_At ,
  Up_Lvl_Cd ,
  Up_Lvl_At ,
  DegreeOfSep )
AS

But as we will see, there is more to it than this.

Final Select

CREATE RECURSIVE VIEW NameOfView ( ListOfColumns) AS ( 
-- Seed Select Statement
UNION ALL
-- Recursive Select Statement that uses NameOfView
);

Note how this compares to a non-view use of recursion in Teradata SQL, where a recursive with clause is followed by a final SELECT that uses it.

WITH RECURSIVE NameOfWith ( ListOfColumns) AS ( 
-- Seed Select Statement
UNION ALL
-- Recursive Select Statement that uses NameOfWith
)
SELECT
    Something
FROM
  NameOfWith
;

Now look back to the RECURSIVE VIEW syntactic structure and see that it does not have the final SELECT. In effect, that happens when you use the view in a later select. Clearly the Teradata idea of a recursive view is only a way of saving the WITH RECURSIVE clause as a named item, available outside its own definition.

By comparison, the SQLite idea seems to be that you're merely defining a view - hence CREATE VIEW rather than CREATE RECURSIVE VIEW but then allows you to put a recursive WITH inside the definition.

Does this matter? Well it might.

As it happened, the Teradata recursive view that I had written realy required being used with a WHERE clause each time - filtering WHERE Up_Lvl_Cd IS NOT NULL
In adding that final SELECT inside the view to make it work in SQLite, that WHERE clause was "burnt into" the view. But by the nature of that specific clause, having it also used in later uses of the view would be quite harmless.

I suspect that converting in the other direction - from SQLite to Teradata - it would be prudent to add another view just to add the effect of that final select.

Naming

Another quirk of the object naming difference between the two environments, is that the Teradata syntax, has the strange thing by which the view may have had to be declared as being
in a named database but then its non-database name must be used on the inside.

To adapt the syntax example given above, we add DatabaseName. to the create clause, but note that it cannot be used inside the definition.

CREATE RECURSIVE VIEW DatabaseName.NameOfView ( ListOfColumns) AS ( 
-- Seed Select Statement
UNION ALL
-- Recursive Select Statement that uses NameOfView
);

That has led to my name changing scheme from earlier steps to have confused things - as only one of the two names was replaced - which led to some fun until I twigged to what was going on.

As already noted, that problem is not present in the SQLite syntax for which the named recursive element is an alias fully inside the view definition, rather than part of its outside.

Summary

To be honest, the process of conversion proved to be both as difficult as I expected (but with more details) all while clearly always going to be possible (thankfully because my knowledge proved to be sufficient to be confident that all the issues were covered). So it was both annoying and yet satisfying to achieve.

If you've read this far I hope you found the annotations either helpful or interesting as an exercise. I wrote it because I have often been grateful for when other people openly documented their experiences.

Make it make sense

Only the context you need to fix your broken code with Sentry.

Start debugging →

DEV Community