DEV Community

Zohar Peled
Zohar Peled

Posted on • Updated on

Random strings. Generate on the fly, or select from a pre-populated table?

Update:

I've written an improved version of the random string generator and have blogged about it here:

Disclaimer:

1: This article's code samples are all T-SQL. For other SQL dialects, some changes need to be done in the code, but the principles still hold.

2: An article called How to pre-populate a random strings pool was just published on my blog.

This post is a little bit of self-promotion, and a little bit of knowledge sharing. With that out of the way, let's begin.

Why would I want a pool of random strings?

Random strings can be very useful in many different use cases – from shortened URLs to encryption keys and salts, we use them everywhere. But why would we want to pre-populate a pool of them instead of generating them on the fly when you need them? Well, if you just want your strings to be random, then by all means, generate them on the fly. but if you want them to be both random and unique – then you need a pool of them. Why? because of what’s known as the birthday problem.

In a nutshell, the birthday problem concerns the probability that two people in a group of x people will share their birthday.
Surprisingly, that probability is a lot higher than most people would assume.
In a group of just 23 people, the chance of two people to share a birthday is 50% - and it climbs to 99.9% when the group has just 70 people.

How to generate random strings?

This question has an answer by SQL Server expert Martin Smith.

I've edited his code a bit to also allow numbers in the random strings, and to pack it into a user defined inline table function. To do that, I had to first create a view, because you can't use the newid operator in a user defined function. The view is very simple:

CREATE VIEW GuidGenerator
AS
    SELECT Newid() As NewGuid
Enter fullscreen mode Exit fullscreen mode

and the function based on Martin's code that use it is this:

CREATE FUNCTION dbo.RandomStringGenerator
(
    @Length int,
    @Count int -- Note: up to 1,000,000 rows
)
RETURNS TABLE
AS 
RETURN 

WITH E1(N) AS (SELECT N FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) V(N)),   -- 10
     E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
     E3(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
     E4(N) AS (SELECT 1 FROM E3 a, E2 b), --1,000,000
     Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY @@SPID) FROM E4)

SELECT TOP(@Count) (
    SELECT  TOP (@Length) CHAR(
            CASE  Abs(Checksum(NewGuid)) % 3
                WHEN 0 THEN 65 + Abs(Checksum(NewGuid)) % 26 -- Random upper case letter
                WHEN 1 THEN 97 + Abs(Checksum(NewGuid)) % 26 -- Random lower case letter
                ELSE 48 + Abs(Checksum(NewGuid)) % 10 -- Random digit
            END
            )
    FROM Tally As t0  
    CROSS JOIN GuidGenerator 
    WHERE  t0.n <> -t1.n -- force SQL Server to re-evalulate the subquery for each row in the main query.
    FOR XML PATH('')
    ) As RandomString
FROM Tally As t1

Enter fullscreen mode Exit fullscreen mode

Basically, the idea is that you create an inline tally table using stacked common table expressions. You use this tally table as a row multiplier - in the inner query for the number of chars in the generated string (the @Length argument), and in the outer query for the number of strings to return (the @Count argument).

Inside the inner query, you use Abs(Checksum(NewGuid)) as a random number, and you base your return value on the ascii codes of chars - numbers from 65 to 90 are ascii codes for upper case letters, from 97 to 122 are lower case, and from 48 to 57 are for digits. You also use the same technique to determine whether to return a digit, an upper case letter or a lower case letter.

You use this function to generate random strings like this:

SELECT RandomString 
FROM dbo.RandomStringGenerator(20, 100); 
Enter fullscreen mode Exit fullscreen mode

This will return a list of 100 random strings of 20 chars each.

If you want to guarantee your random strings will also be unique, you have to pre-populate a table with them, and have a unique constraint/index on the column holding the strings.

You can read more about it in How to pre-populate a random strings pool over on my blog. There's also a link there to an online demo on rextester.

Oldest comments (0)