DEV Community

Cover image for C# Tip: 2 ways to generate realistic data using Bogus
Davide Bellone
Davide Bellone

Posted on • Originally published at code4it.dev on

C# Tip: 2 ways to generate realistic data using Bogus

In a previous article, we delved into the creation of realistic data using Bogus, an open-source library that allows you to generate data with plausible values.

Bogus contains several properties and methods that generate realistic data, such as names, addresses, birthdays, and so on.

In this article, we will learn two ways to generate data with Bogus. Both produce the same result; the main difference lies in reusability and modularity. In my opinion, it's mostly a matter of preference: there is no approach absolutely better than the other, and either approach can be preferable depending on the context.

For the sake of this article, we are going to use Bogus to generate instances of the Book class, defined like this:

public class Book
{
    public Guid Id { get; set; }
    public string Title { get; set; }
    public int PagesCount { get; set; }
    public Genre[] Genres { get; set; }
    public DateOnly PublicationDate { get; set; }
    public string AuthorFirstName { get; set; }
    public string AuthorLastName { get; set; }
}

public enum Genre
{
    Thriller, Fantasy, Romance, Biography
}
Enter fullscreen mode Exit fullscreen mode

Expose a Faker inline or with a method

It is possible to create a specific object that, using a Builder approach, allows you to generate one or more items of a specified type.

It all starts with the Faker<T> generic type, where T is the type you want to generate.

Once you create it, you can define the rules to be used when initializing the properties of a Book by using methods such as RuleFor and RuleForType.

public static class BogusBookGenerator
{
    public static Faker<Book> CreateFaker()
    {
        Faker<Book> bookFaker = new Faker<Book>()
         .RuleFor(b => b.Id, f => f.Random.Guid())
         .RuleFor(b => b.Title, f => f.Lorem.Text())
         .RuleFor(b => b.Genres, f => f.Random.EnumValues<Genre>())
         .RuleFor(b => b.AuthorFirstName, f => f.Person.FirstName)
         .RuleFor(b => b.AuthorLastName, f => f.Person.LastName)
         .RuleFor(nameof(Book.PagesCount), f => f.Random.Number(100, 800))
         .RuleForType(typeof(DateOnly), f => f.Date.PastDateOnly());

        return bookFaker;
    }
}
Enter fullscreen mode Exit fullscreen mode

In this way, thanks to the static method, you can simply create a new instance of Faker<Book>, ask it to generate one or more books, and enjoy the result:

Faker<Book> generator = BogusBookGenerator.CreateFaker();
var books = generator.Generate(10);
Enter fullscreen mode Exit fullscreen mode

Clearly, it's not necessary for the class to be marked as static: it all depends on what you need to achieve!

Expose a subtype of Faker, specific to the data type to be generated

If you don't want to use a method (static or not static, it doesn't matter), you can define a subtype of Faker<Book> whose customization rules are all defined in the constructor.

public class BookGenerator : Faker<Book>
{
    public BookGenerator()
    {
        RuleFor(b => b.Id, f => f.Random.Guid());
        RuleFor(b => b.Title, f => f.Lorem.Text());
        RuleFor(b => b.Genres, f => f.Random.EnumValues<Genre>());
        RuleFor(b => b.AuthorFirstName, f => f.Person.FirstName);
        RuleFor(b => b.AuthorLastName, f => f.Person.LastName);
        RuleFor(nameof(Book.PagesCount), f => f.Random.Number(100, 800));
        RuleForType(typeof(DateOnly), f => f.Date.PastDateOnly());
    }
}
Enter fullscreen mode Exit fullscreen mode

Using this way, you can simply create a new instance of BookGenerator and, again, call the Generate method to create new book instances.

var generator = new BookGenerator();
var books = generator.Generate(10);
Enter fullscreen mode Exit fullscreen mode

Method vs Subclass: When should we use which?

As we saw, both methods bring the same result, and their usage is almost identical.

So, which way should I use?

Use the method approach (the first one) when you need:

  • Simplicity: If you need to generate fake data quickly and your rules are straightforward, using a method is the easiest approach.
  • Ad-hoc Data Generation: Ideal for one-off or simple scenarios where you don't need to reuse the same rules across your application.

Or use the subclass (the second approach) when you need:

  • Reusability: If you need to generate the same type of fake data in multiple places, defining a subclass allows you to encapsulate the rules and reuse them easily.
  • Complex scenarios and extensibility: Better suited for more complex data generation scenarios where you might have many rules or need to extend the functionality.
  • Maintainability: Easier to maintain and update the rules in one place.

Further readings

If you want to learn a bit more about Bogus and use it to populate data used by Entity Framework, I recently published an article about this topic:

🔗Seeding in-memory Entity Framework with realistic data with Bogus | Code4IT

This article first appeared on Code4IT 🐧

However, the best place to learn about Bogus is the official documentation on GitHub.

🔗 Bogus repository | GitHub

Wrapping up

This article sort of complements the previous article about Bogus.

I think Bogus is one of the best libraries in the .NET universe, as having realistic data can help you improve the intelligibility of the test cases you generate. Also, Bogus can be a great tool when you want to showcase demo values without accessing real data.

Top comments (0)