DEV Community

Cover image for Generating Fake bulk data using Factory-Boy in Python
Subhodeep Sarkar
Subhodeep Sarkar

Posted on

Generating Fake bulk data using Factory-Boy in Python

You often need fake data to test against. You can, of course, use some sample data in your tests. But what if you need hundreds of records, or even thousands of records, of test data? Then it can get tedious to create and maintain. Sometimes you just need a fake version of a class in your program but you want to have it be realistic data that behaves like the real class. Factories and fakes make this all possible.

In this article we will be using Factory-boy a package in Python for generating fake data.

Please make sure that you install the correct version of factory-boy using the command pip install factory-boy==2.12.0

Let's try to generate some fake data using factory boy

import factory #import factory-boy
name = factory.Faker('name') #create Faker object with 'name' provider
for _ in range(5):
    print(name.generate()) #generate and print 5 unique names
Enter fullscreen mode Exit fullscreen mode

Results:

Faker

factory.Faker() accepts an argument which is called provider and is used for generating the type of data e.g 'name' provider will be used for generating names, 'email' provider will be used for generating email etc.
Let's try using some more providers

import factory #import factory-boy
name = factory.Faker('name') #create Faker object with 'name' provider
country = factory.Faker('country') #create Faker object with 'country' provider
email = factory.Faker('email') #create Faker object with 'email' provider

for _ in range(3):
    print(f'''
My name is {name.generate()}
I live in {country.generate()}
Reach out to me at {email.generate()}''')
Enter fullscreen mode Exit fullscreen mode

Results:

Faker

Along with Faker class, factory-boy has another module which is Fuzzy, let's take a look at how it works

import factory #import factory-boy module
import factory.fuzzy #import fuzzy module
name = factory.Faker('name') #create Faker object with 'name' provider
gender = factory.fuzzy.FuzzyChoice(choices=['girl','boy']) #create fuzzyChoice object which selects from the given options
grade = factory.fuzzy.FuzzyFloat(30,80) #create FuzzyFloat object which generates a random float number between the lower and upper limit
age = factory.fuzzy.FuzzyInteger(12,18) #create FuzzyInteger object which generates a random float numbeer between the lower and upper limit

for _ in range(3):
    print(f'''My name is {name.generate()}, I am a {gender.fuzz()} 
I got a grade of {grade.fuzz():.2f}% and my age is {age.fuzz()}
''')
Enter fullscreen mode Exit fullscreen mode

Results:

Fuzzy

To learn more about the different Classes in fuzzy module and the various options for providers for Faker visit: https://factoryboy.readthedocs.io/en/stable

Now that we know how Factory-boy can mimic data, let's try mimicking data models which are generally used for creating database tables for applications such as Flask, Django etc.
For this project make a requirements.txt and paste the below contents and then install all the required packages using pip3 install -r requirements.txt

# Pin dependancies that might cause breakage
Werkzeug==2.1.2
SQLAlchemy==1.4.46

# Dependencies for this project
Flask==2.1.2
Flask-SQLAlchemy==2.5.1

# Testing dependencies
nose==1.3.7
pinocchio==0.4.3
coverage==6.3.2
factory-boy==2.12.0
pylint==2.14.0
Enter fullscreen mode Exit fullscreen mode

And since we would be running unit tests using nosetests please review my previous article - "Test-Driven Development in Python using Unittest and Nose" (https://dev.to/h4ck3rd33p/test-driven-development-in-python-using-unittest-and-nosetest-24ck)

create a setup.cfg file for nosetests configuration and paste the below contents

[nosetests]
verbosity=2
with-spec=1
spec-color=1
with-coverage=1
cover-erase=1
cover-package=models

[coverage:report]
show_missing = True
Enter fullscreen mode Exit fullscreen mode

now create two folders models which will contain the data model in account.py and the basic setup in __init__.py and tests which will contain the factory model which will mimic the actual model in factories.py and the associated unit tests for testing the application in test_account.py.
Eventually your folder structure should look like this:

.
├── models
│   ├── account.py
│   └── __init__.py
├── requirements.txt
├── setup.cfg
└── tests
    ├── factories.py
    └── test_account.py

2 directories, 6 files

Enter fullscreen mode Exit fullscreen mode

Let's say we need to test a data model that handles customer accounts. We'll start by creating this data model. We will use a popular object relational mapper called SQLAlchemy and so we create a db instance of the SQLAlchemy class. Now we build our model. We create a class called Accounts that inherits from the base model of SQLAlchemy. Now we can add our columns, which will be represented as class variables. We add an id. It will serve as a non-information bearing key so we label the id as the primary key. We add a name as a string and an email field as a string. We also add a phone number as a string. We make that phone number optional, so we set nullable to True. Let's add a Boolean field to determine if this account is disabled, and let’s make the default False. And finally, we'll add a date joined column as a DateTime and make it optional as well.

models > __init__.py

"""
Data Models
"""
from flask import Flask
from flask_sqlalchemy import SQLAlchemy

app = Flask(__name__)
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///test.db'
db = SQLAlchemy(app)
Enter fullscreen mode Exit fullscreen mode

models > account.py

"""
Account class
"""
import logging
from sqlalchemy.sql import func
from models import db

logger = logging.getLogger()


class DataValidationError(Exception):
    """Used for an data validation errors when deserializing"""


class Account(db.Model):
    """ Class that represents an Account """

    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(64))
    email = db.Column(db.String(64))
    phone_number = db.Column(db.String(32), nullable=True)
    disabled = db.Column(db.Boolean(), nullable=False, default=False)
    date_joined = db.Column(db.Date, nullable=False, server_default=func.now())

    def __repr__(self):
        return '<Account %r>' % self.name

    def to_dict(self) -> dict:
        """Serializes the class as a dictionary"""
        return {c.name: getattr(self, c.name) for c in self.__table__.columns}

    def from_dict(self, data: dict) -> None:
        """Sets attributes from a dictionary"""
        for key, value in data.items():
            setattr(self, key, value)

    def create(self):
        """Creates a Account to the database"""
        logger.info("Creating %s", self.name)
        db.session.add(self)
        db.session.commit()

    def update(self):
        """Updates a Account to the database"""
        logger.info("Saving %s", self.name)
        if not self.id:
            raise DataValidationError("Update called with empty ID field")
        db.session.commit()

    def delete(self):
        """Removes a Account from the data store"""
        logger.info("Deleting %s", self.name)
        db.session.delete(self)
        db.session.commit()

    ##################################################
    # CLASS METHODS
    ##################################################

    @classmethod
    def all(cls) -> list:
        """Returns all of the Accounts in the database"""
        logger.info("Processing all Accounts")
        return cls.query.all()

    @classmethod
    def find(cls, account_id: int):
        """Finds a Account by it's ID
        :param account_id: the id of the Account to find
        :type account_id: int
        :return: an instance with the account_id, or None if not found
        :rtype: Account
        """
        logger.info("Processing lookup for id %s ...", account_id)
        return cls.query.get(account_id)
Enter fullscreen mode Exit fullscreen mode

Now let's create the fake class that mimics the original Account class. Let's name it AccountFactory. Also make a inner class Meta and make an attribute model and set it to Account which will let Factory-Boy know exactly which data class it has to mimic and thus AccountFactory will now have all the methods automatically that Account class has.

tests > factories.py

"""
AccountFactory class using FactoryBoy
"""
import factory
from datetime import date
from factory.fuzzy import FuzzyChoice, FuzzyDate
from models.account import Account

class AccountFactory(factory.Factory):
    """ Creates fake Accounts """

    class Meta:
        model = Account

    id = factory.Sequence(lambda n: n)
    name = factory.Faker("name")
    email = factory.Faker("email")
    phone_number = factory.Faker("phone_number")
    disabled = FuzzyChoice(choices=[True, False])
    date_joined = FuzzyDate(date(2008, 1, 1))
Enter fullscreen mode Exit fullscreen mode

id = factory.Sequence(lambda n: n) will generate continuous sequence of numbers 0,1,2...

Now let's write the unit tests for testing our AccountFactory

Test-fixtures: These are methods that are run to set up the state of the system before and after running test cases.

  • setUpClass(cls): This is used to set the system state before running any test cases in the class. In our example, it is creating the DB and the tables.

  • tearDownClass(cls): This is used to clean up the system after running all the test cases in the current class. In our example, it is deleting any remaining test data and disconnecting from the database.

  • setUp(self): This is used to set up the system before running an individual test case. In our example, it is deleting the data and saving it.

  • tearDown(self): This is used to reset the system state after running an individual test case. In our example, it is closing the current database session.

  • setUpAll: This is used to set up the system state before even going into any class. We haven't used this in our example

  • tearDownAll: This is used to reset the system state after finishing with all the classes. We haven't used this in our example

tests > test_account.py

"""
Test Cases TestAccountModel
"""

from random import randrange
from unittest import TestCase
from models import db
from models.account import Account, DataValidationError
from factories import AccountFactory


class TestAccountModel(TestCase):
    """Test Account Model"""

    @classmethod
    def setUpClass(cls): #Runs before running any unit test
        """ Create table """
        db.create_all()  # make our sqlalchemy tables

    @classmethod
    def tearDownClass(cls): #Runs after running all the tests
        """Delete test data and Disconnext from database"""
        db.session.query(Account).delete()
        db.session.close()

    def setUp(self): #Runs before running every individual test
        """Drop the table"""
        db.session.query(Account).delete()
        db.session.commit()

    def tearDown(self): #Runs after ruunning every individual test
        """Remove the session"""
        db.session.remove()

    ######################################################################
    #  T E S T   C A S E S
    ######################################################################

    def test_create_all_accounts(self):
        """ Test creating multiple Accounts """
        for _ in range(10):
            account = AccountFactory()
            account.create()
        self.assertEqual(len(Account.all()), 10)

    def test_create_an_account(self):
        """ Test Account creation using known data """
        account = AccountFactory()
        account.create()
        self.assertEqual(len(Account.all()), 1)

    def test_repr(self):
        """Test the representation of an account"""
        account = Account()
        account.name = "Foo"
        self.assertEqual(str(account), "<Account 'Foo'>")

    def test_to_dict(self):
        """ Test account to dict """
        account = AccountFactory()
        result = account.to_dict()
        self.assertEqual(account.name, result["name"])
        self.assertEqual(account.email, result["email"])
        self.assertEqual(account.phone_number, result["phone_number"])
        self.assertEqual(account.disabled, result["disabled"])
        self.assertEqual(account.date_joined, result["date_joined"])

    def test_from_dict(self):
        """ Test account from dict """
        data = AccountFactory().to_dict()
        account = Account()
        account.from_dict(data)
        self.assertEqual(account.name, data["name"])
        self.assertEqual(account.email, data["email"])
        self.assertEqual(account.phone_number, data["phone_number"])
        self.assertEqual(account.disabled, data["disabled"])

    def test_update_an_account(self):
        """ Test Account update using known data """
        account = AccountFactory()
        account.create()
        self.assertIsNotNone(account.id)
        account.name = "Rumpelstiltskin"
        account.update()
        found = Account.find(account.id)
        self.assertEqual(found.name, account.name)
        self.assertIsNotNone(account.id)
        account.name = "Rumpelstiltskin"
        account.update()
        found = Account.find(account.id)
        self.assertEqual(found.name, account.name)

    def test_invalid_id_on_update(self):
        """ Test invalid ID update """
        account = AccountFactory()
        account.id = None
        self.assertRaises(DataValidationError, account.update)

    def test_delete_an_account(self):
        """ Test Account update using known data """
        account = AccountFactory()
        account.create()
        self.assertEqual(len(Account.all()), 1)
        account.delete()
        self.assertEqual(len(Account.all()), 0)
Enter fullscreen mode Exit fullscreen mode

Read the comments to understand more about the test cases. Now let's run nosetests

nosetests

Congratulations! All the test cases have been passed and hence we can conclude that our AccountFactory works exactly like Account class and have tested the application using fake data using Factory-Boy!

Top comments (0)