Database seeding/access for acceptance testing

#100daysofcode #testing

If you're writing an application with a database of some kind - and most of them do these days - and you're writing acceptance tests for the application, at some point you are going to come up against the conundrum of test data.

There are two different ways that this problem comes up, and both are important.

Firstly, there is the case of populating the database so that your tests work. Assuming you are following proper test principlesÂ¹, you have problems when it comes to testing any read-only actions. You need to be able to have some data in the system to actually retrieve - or else be able to guarantee the absence of this data.

The way I see it, there are 4 ways that you can solve this, but with different levels of viability:

Use the existing APIs for populating the data. For example, if you are testing retrieving a user by ID, then use the APIs for Delete User and Create User to ensure that the database is in the correct state first. This does assume that such APIs exist, and it does muddy the testing a bit, since you are using something that might have failed testing to set up this test.
Create a new API purely for seeding the database. You can have an endpoint that you pass an XML or JSON document to describing the database state, and have the backend set up the data correctly. This means more work to create this endpoint, but it is a lot more flexible than the above in the long run. It does open a potential security risk if you're not careful though
Access the database directly from the tests. This is one that I see fairly often, and it works well enough, but it does mean duplication of knowledge between the code and the tests, and it does mean that the tests can only ever run from an environment that can access the database.
Pre-seed the database with all of the required test data. This is the easiest option in some ways, because you have a single setup script and you're done. However, maintenance is a real pain.

My personal preferences are #2 or #3, depending on the size of the application under test.

The other problem that you have is the opposite - reading from the database to ensure that tests have run correctly. Again, there are 4 options that I can think of here, with varying levels of viability.

Use the existing APIs. If you've just created a user, use the Retrieve User API to check the data. This will only ever show data that you can retrieve from the APIs, so will not be perfect.
Create a new API purely for retrieving the database. Essentially a no-op wrapper around database access. This ends up being a lot of duplication around your real endpoints though
Access the database directly from the tests.
Don't bother about it.

Technically, #4 is the more correct answer for Acceptance Testing. After all, as long as the system does the correct thing from the outside it shouldn't matter how the insides work. That's what Unit and Integration tests are for. However, #4 does sometimes fall down when there are changes that don't have a visible outside effect but still need testing. Those cases are sufficiently rare that that's when I would be tempted to fall back on one of the earlier options.

1) By "Proper Test Principles", I mean that each test should perform exactly one action, and that the order of tests should be completely independent. Performing multiple actions in a test means you lose clarity of what is and isn't working, and tests being inter-dependent makes them brittle, and again means that one failing test can cause a cascade onto other tests.

Top comments (3)

Nilgun • Jun 19 '17

Can you please elaborate the part "You can have an endpoint that you pass an XML or JSON document to describing the database state, and have the backend set up the data correctly" on item #2 ? Especially the difference between #1 and #2. What can be the content of "XML or JSON document describing the database state"?

Graham Cox • Jun 19 '17

Imagine you are writing a Blog. You're standard API will give you:

Create Post
Edit Post
Delete Post
Comment on Post
Edit Comment
Delete Comment

When you are testing that retrieving a Blog Front Page - which gives a list of all posts, and includes the number of comments on each post - works correctly, you can seed the data by:

Finding all posts and deleting them
Creating the new posts
Creating the new comments on the posts

If we are testing a front page that has 5 posts, with 5 comments on each, this will be 30 requests + whatever is needed to delete the existing data. And if any of those are broken then this test will fail.

Alternatively you could have a single endpoint that is only accessible by the tests - it absolutely must be locked down so that it can never be accessed by a real client - that you send a model of the data to and it makes the database have the correct data.

For the same test as above, you would make one request to this endpoint with one document, and this document would contain all of the posts and comments. The backend would then process this and make the database contain only the data in this document.

The end results are the same, but the means to get there are very different. One puts the burden on the client - and assumes that APIs exist to make the required changes. The other puts the burden on the server.

Nilgun • Jun 19 '17

Thanks for the clarification :)