C# Tip: 2 ways to generate realistic data using Bogus

#csharp #dotnet #bogus #testing

In a previous article, we delved into the creation of realistic data using Bogus, an open-source library that allows you to generate data with plausible values.

Bogus contains several properties and methods that generate realistic data, such as names, addresses, birthdays, and so on.

In this article, we will learn two ways to generate data with Bogus. Both produce the same result; the main difference lies in reusability and modularity. In my opinion, it's mostly a matter of preference: there is no approach absolutely better than the other, and either approach can be preferable depending on the context.

For the sake of this article, we are going to use Bogus to generate instances of the Book class, defined like this:

public class Book
{
    public Guid Id { get; set; }
    public string Title { get; set; }
    public int PagesCount { get; set; }
    public Genre[] Genres { get; set; }
    public DateOnly PublicationDate { get; set; }
    public string AuthorFirstName { get; set; }
    public string AuthorLastName { get; set; }
}

public enum Genre
{
    Thriller, Fantasy, Romance, Biography
}

Expose a Faker inline or with a method

It is possible to create a specific object that, using a Builder approach, allows you to generate one or more items of a specified type.

It all starts with the Faker<T> generic type, where T is the type you want to generate.

Once you create it, you can define the rules to be used when initializing the properties of a Book by using methods such as RuleFor and RuleForType.

public static class BogusBookGenerator
{
    public static Faker<Book> CreateFaker()
    {
        Faker<Book> bookFaker = new Faker<Book>()
         .RuleFor(b => b.Id, f => f.Random.Guid())
         .RuleFor(b => b.Title, f => f.Lorem.Text())
         .RuleFor(b => b.Genres, f => f.Random.EnumValues<Genre>())
         .RuleFor(b => b.AuthorFirstName, f => f.Person.FirstName)
         .RuleFor(b => b.AuthorLastName, f => f.Person.LastName)
         .RuleFor(nameof(Book.PagesCount), f => f.Random.Number(100, 800))
         .RuleForType(typeof(DateOnly), f => f.Date.PastDateOnly());

        return bookFaker;
    }
}

In this way, thanks to the static method, you can simply create a new instance of Faker<Book>, ask it to generate one or more books, and enjoy the result:

Faker<Book> generator = BogusBookGenerator.CreateFaker();
var books = generator.Generate(10);

Clearly, it's not necessary for the class to be marked as static: it all depends on what you need to achieve!

Expose a subtype of Faker, specific to the data type to be generated

If you don't want to use a method (static or not static, it doesn't matter), you can define a subtype of Faker<Book> whose customization rules are all defined in the constructor.

public class BookGenerator : Faker<Book>
{
    public BookGenerator()
    {
        RuleFor(b => b.Id, f => f.Random.Guid());
        RuleFor(b => b.Title, f => f.Lorem.Text());
        RuleFor(b => b.Genres, f => f.Random.EnumValues<Genre>());
        RuleFor(b => b.AuthorFirstName, f => f.Person.FirstName);
        RuleFor(b => b.AuthorLastName, f => f.Person.LastName);
        RuleFor(nameof(Book.PagesCount), f => f.Random.Number(100, 800));
        RuleForType(typeof(DateOnly), f => f.Date.PastDateOnly());
    }
}

Using this way, you can simply create a new instance of BookGenerator and, again, call the Generate method to create new book instances.

var generator = new BookGenerator();
var books = generator.Generate(10);

Method vs Subclass: When should we use which?

As we saw, both methods bring the same result, and their usage is almost identical.

So, which way should I use?

Use the method approach (the first one) when you need:

Simplicity: If you need to generate fake data quickly and your rules are straightforward, using a method is the easiest approach.
Ad-hoc Data Generation: Ideal for one-off or simple scenarios where you don't need to reuse the same rules across your application.

Or use the subclass (the second approach) when you need:

Reusability: If you need to generate the same type of fake data in multiple places, defining a subclass allows you to encapsulate the rules and reuse them easily.
Complex scenarios and extensibility: Better suited for more complex data generation scenarios where you might have many rules or need to extend the functionality.
Maintainability: Easier to maintain and update the rules in one place.

Wrapping up

This article sort of complements the previous article about Bogus.

I think Bogus is one of the best libraries in the .NET universe, as having realistic data can help you improve the intelligibility of the test cases you generate. Also, Bogus can be a great tool when you want to showcase demo values without accessing real data.