DEV Community

Cover image for Fakeish
Garrett Gregor
Garrett Gregor

Posted on

Fakeish

For those that have used Faker before, you might have many experiences where it has dramatically improved your development time - especially when it comes to thinking about seeding data or utilizing test driven development; not having to think about what you're going to name something and what zip code something is can help make you more efficient.

Or can it?

In the course of creating an application that generated zip codes in seed data, I came to find that the difference between fake data and fakeish data is real. Case in point, when one is utilizing that generated data to get weather information for a particular zip code via a call to a third party API. The problem is that when that generated data looks and feels real, it can be hard to debug:

90210 - sometimes populated by faker Beverly Hills, CA
90200 - sometimes by faker nowhere in CA

I think this is a small logical gap in the way Faker produces data in that one inputs a real state abbreviation, but doesn't actually get back a real zip code, just one that could be real. That's why I wanted to add to the Faker project and help devs that might be experiencing similar bugs they weren't aware they were introducing either.

I had the privilege of working closely with Andy Andrea and came up with a plan to add a zip_code_real_us to the Faker::Address library. This addition to the already-in-use-sometimes-real-zip-generator would populate a random real zip code from a provided state abbreviation, or simply return a random real zip code. This still gives the option for users of Faker to generate truly random data when they want that option.

The first thing I did was to generate a hash of real zip codes provided by a CSV available at SimpleMaps.

ruby method to generate hash map of zips

Then I was able to implement a rudimentary implementation that drew from a helper/fixture file:

rudimentary implementation to generate "real" random zip

Andy and I both agreed though, that to truly be useful to the faker team, it would be best to draw the data from the YAML file, as they were also doing. So with that in mind I worked on transposing that information to YAML since that was how Faker was fetching their zips. Fortunately, Ruby has a built in method I learned called to_yaml, as it would have been incredibly tedious to translate this:

screenshot of hash map zips

to this:

screenshot of yamls zips

(Unless I knew Vim and/or was more versed in writing quick macros, which - fair point - I should probably do at some point.) The final method, which I called zip_code_real_us looked something like this:

faker implementation of real zips

Conclusion

TL;DR: If you need real zip codes for any sort of data, generating them from Faker can be hit or miss.

I think the thing I learned from all of this is just how much random people are willing to go out of their way to help, just for the sake of helping and seeing others succeed. I want to give my absolute sincerest thanks to Andy for giving me so much of his time to help offer advice and insight, and teaching me a little bit about just how powerful git is.

While my implementation of zip_code_real_us didn't make it into Faker, I'm inspired to give back to this community by making a small gem that helps Rubyists generate real zip codes, and while that's not ready yet, hopefully in the meantime this gives you some solace in knowing that it's really not you, it's them.

gif about it's not you it's me

Top comments (0)