I was writing a program to process some CSV data, exported from a vendor. And it has a date-time column in some creative, funky format.
To help with ingesting this into a Pandas dataframe, I wrote a converter function: it takes that string, and returns a Python datetime object. Simple, right?
But you know, this is a PERFECT place to write a unit test. And I decided to use Pytest, starting with this simple test:
from datetime import datetime
from myprog import parse_vendor_datetime
def test_parse_vendor_datetime():
actual = parse_vendor_datetime('12/25/38 1:14am EST')
expected = datetime(2038, 12, 25, 1, 14)
assert expected == actual
I run the test, and it fails, like it should. Then I write parse_vendor_datetime(), and the test passes. Great!
Now, what no one ever talks about is how TDD works with version control. The secret is that there's actually a really productive way to do it.
Because after I check this test-passing code into version control, in my private branch... the next thing I do is run my program on some real data. And I find it doesn't handle time zones quite right.
Okay... so here's what I do. I get an example of the problematic data, and add to my test function:
def test_parse_vendor_datetime():
actual = parse_vendor_datetime('12/25/38 1:14am EST')
expected = datetime(2038, 12, 25, 1, 14)
assert expected == actual
actual = parse_vendor_datetime('08/17/38 11:05pm EDT')
expected = datetime(2038, 8, 17, 11, 5)
assert expected == actual
I run it, and verify it fails like it should. And I check THAT into version control - again, in my private branch. Because when I have a correctly failing test for a bug, that's progress. Even if that test isn't passing yet.
But it bugs me that this code is repetitive. Doesn't it bug you? Ugh. Makes my skin crawl, just looking at it.
So I refactor it, to use something Pytest calls "parametrized tests":
@pytest.mark.parametrize(
'dt_str,expected', [
('12/25/38 1:14am EST', datetime(2038, 12, 25, 1, 14)),
('08/17/38 11:05pm EDT', datetime(2038, 8, 17, 11, 5)),
])
def test_parse_vendor_datetime(dt_str, expected):
actual = parse_vendor_datetime(dt_str)
assert expected == actual
See how the test code is not repetitive anymore? Even if you don't now what this "parametrize()" thing is, because I haven't explained it, you still understand this is an improvement. If I need to change something, or add another test case, I only need to change the code in one place.
So now, I check THIS into version control. The test is still failing. But changing the code to be less repetitive - more maintainable - is still forward progress, and worthy of checkpointing.
And: Because I had checked in the code before it too, if I mess up the test somehow, I can always roll back to the point it was correct.
What's fascinating:
When you build on this process, and add a few more ingredients... there's a COGNITIVE benefit.
It's potent. You can actually get into a state of flow very easily. In a way that's easy to maintain, and to keep your focus, and become extremely productive.
Having a blast, as you continue cranking out code. Riding that wave for as long as you want. It comes out of how you use version control along with writing automated tests, and doing test-driven development.
If you liked this, you will enjoy the Powerful Python Newsletter.
Top comments (0)