pazvanti

Posted on Mar 3, 2021 • Edited on Dec 22, 2022 • Originally published at petrepopescu.tech

Exposing sequential IDs is bad! Here is how to avoid it.

#java #security #database

Article originally posted on my personal blog: How not to expose your primary keys

When working on LOGaritmical, I initially had my primary keys defined as UUIDs. I took this approach for two reasons: security and to avoid collisions even when there are many rows. My initial reasoning was that I will probably need to store each log line in a separate entry and considering that one log can have a few thousands of lines, there was a small risk of overflowing the Integer. Was my reasoning correct? Probably not.

Furthermore, I stumbled upon an interesting article about using UUIDs as primary keys. There were some really great points about performance and query optimization. To keep it short, It is best to use numerical values as the primary key. So, I started to change my implementation. But then, another problem came into my head: security.

Why is exposing the PK bad?

By their nature, Auto Incremented primary keys grow by one for each new entry. This is great for avoiding collision, but it means that they are easily guessable. If you somehow find out that a user with the ID 27 exists, it most probably means that there are at least 26 other users with IDs 1 to 26. An attacker can try to exploit this by sending requests with different IDs. Furthermore, since the Administrator is usually the first user registered, it can easily guess the ID and try to get access.

There were also many reports, even on high-profile sites, where a full or partial dump of the contents could be done by simply incrementing the ID. This is how Parler data was exposed, for example. Other such attacks are rather easy to find, so exposing the internal ID is bad practice. You can still use an auto-incremented primary key internal and use it for all foreign keys as well, but whenever it is sent externally, an alternative must be found.

The Play Session Cookie

You may think that as long as your URLs do not contain the primary key, everything is ok. However, here is another scenario: cookies. Whenever a user is logged in, a session must be saved so that the user can remain logged in and can properly navigate the restricted pages. Play Framework does this by storing a Play Session Cookie that looks something like this:

This can easily be found using the browser’s inspector. At first sight, it may seem a random string of characters, however it is much more than that. It is actually a BASE64 encoded JSON string with some additional, non-human-readable data, at the end. Decoding the string is easy and, even though changing it in the cookie may not be possible, it can still provide valuable information to an attacker if the PK is stored.

Solution: How to avoid exposing the PK

You use an UUID! But wait, didn’t I just say that using UUIDs is bad? You still use a numerical, auto-incremented, value for the PK and FK, but attach to it, in a separate column, a UUID. Whenever you are dealing internally, in the same application/system with the data, the numerical primary key is used.

@Entity
@Table(name = "users")
public class UserDO {
    @Id
    @Column(name = "id", nullable = false)
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Integer id;

    @Column(name = "uuid", columnDefinition = "VARCHAR(36)")
    @Type(type = "uuid-char")
    private UUID uuid;

    @Column
    private String username;

    @Column
    private String email;

    @Column
    private String passwordHash;

    @Column
    private String passwordSalt;
}

However, when the data needs to be sent externally, you provide the UUID. Now the attacker can’t guess the UUID (or at least he will have a really hard time in doing so) and the internal workings are hidden. You store the user UUID in the session and use that one when performing security checks. You provide the UUID to the article/user/what-ever in the URL and use that one when searching for the right item. Once you find it, internally, you can start referencing it by the PK, just be careful not to expose it outside.

Any additional data that is referenced using FK can still be easily retrieved by the DB and Hibernate since the numerical value is being used. In short, you get the best of both worlds by using a numerical PK internally and an UUID externally. Initial search for an item my be a bit slower, but I believe it is negligible.

Article originally posted on my personal blog: How not to expose your primary keys

Top comments (6)

Pervez Choudhury • Mar 29 '21

You can use the uuid as the primary key and instead address the security and performance issues directly.

You can start to address the performance concerns by not storing the uuid as a string, and using the built on database mechanism to store them efficiently.

In some databases like SQL Server you can also avoid making them the primary key the clustered index so that inserts do. It happen randomly throughout the data when you add new rows.

To avoid people guessing the different primary keys, assume an attacker already knows the id and instead ensure you have good authorisation checks in place in your application to prevent returning data that the user is not allowed to see.

For example include a property that tells you the owner of the data and check this value matches the current logged in user account for every read.

Victorio Berra • Mar 29 '21

In some databases like SQL Server you can also avoid making them the primary key the clustered index so that inserts do. It happen randomly throughout the data when you add new rows.

Can you elaborate on this with SQL Server please?

Vincent Milum Jr • Mar 28 '21

I'd highly suggest reading this article, along with several of the linked items within it. Essentially, security through obscurity isn't really security at all. en.wikipedia.org/wiki/Security_thr...

pazvanti • Mar 29 '21

Yes, security just by making things harder to guess is not truly security. Still, time and time again we have data leaks simply because the IDs are easily guessable, either due to a lack of security on the app, or to use-error (making things public instead of private, se the Parler data dump). I am not saying that it is enough to have he PK hidden, I am just saying that it definitely helps.

Shalvah • Mar 29 '21

I think a hashid might be a better idea for that second use case?

Koas • Mar 29 '21

Totally agree, that’s what I use when returning ids.