loading...

Ruby CSV Library

andershornor profile image Anders Hornor ・5 min read

I recently worked through a take home code challenge and had some fun converting a CSV File into JSON, but after review of my code by the interviewers realized I made my life much harder than it needed to be. I did the challenge in Ruby as I am most familiar with it but didn’t do enough homework before starting. Maybe it was my nerves or maybe it was my drive to do everything from scratch but I didn’t research any ruby libraries that may have helped me with my task.
I think I should say here that on my journey into programming I assumed I was expected to do it all myself. I assumed I am supposed to know every little detail about a language to use its rawest form to do everything I need to. Although it is good to know how anything you use works “under the hood” and working through the development of a language is helpful in learning that language no one expects you to reinvent the wheel (unless otherwise specified) especially when doing a coding challenge.

Taking a step back and thinking about anything someone does in any field or science today (anything really) is building upon all the work that others have passed on in or around that field. I mean is that not what these higher-level languages are? The culmination of millions (billions?) of hours of work to refine something raw into a powerful tool to make work a little bit easier, breathe life a little bit harder, and push boundaries a little bit farther?

So after that tangential life lesson back to the matter at hand. After working through this code challenge I became aware of one the many libraries this car of a wheel called ruby offers. I became aware of one of the tools built into the language I felt confident I knew and probably should have assumed existed due to the nature of the language; I became aware of the CSV library and class built into Ruby. So check out the documentation for more info on the subject later but read here to get a taste of this amazing tool.

Basic Access Methods

Like all libraries built into Ruby to access the class methods provided you need to have a contemporary version of Ruby looks like anything v2.6.1 or newer and the require library statement, require 'csv', at the top of your file.

With access to the CSV class methods now you can use two powerful methods for cracking into the files or values you want access to,

CSV.read("input_filename_or_path_here.csv")
CSV.parse("1,Comma,Separated\n2,Values,Are\n3,The,Best")

For examples sake let's say that the input_filename_or_path_here.csv file has three lines in it that look like

id, first_word, second_word
1,Comma,Separated
2,Values,Are
3,The,Best

The output of the two methods above is a CSV::table (basically an array or arrays) like so:

[
[id, first_word, second_word],
[1,Comma,Separated],
[2,Values,Are],
[3,The,Best]
]

You can work with this CSV::table Object like any other array if you only need a value or two or utilizing the options built into the class methods you can shape the output table as it's created.

The options below display the default values that are assumed when a CSV method is used without any given options.

  • headers: false
  • col_sep: ","
  • quote_char: '"'
  • field_size_limit: nil
  • unconverted_fields: nil
  • return_headers: false
  • row_sep: auto
  • skip_blanks: false
  • force_quotes: false
  • skip_lines: nil
  • liberal_parsing: false
  • quote_empty: true
  • header_converters: nil
  • converters: nil

Of note are the headers and converters options which I'll dive into a bit here.

Headers

The headers option CSV.read("filepath.csv", headers: true) informs the read method of the presence of headers adding them to the top of the CSV::table that the method outputs. In the test case above setting headers to true will organize the table such that each column is accessible by the header that shares the same nested index. For example,

table = CSV.read("input_filename_or_path_here.csv", headers: true)
table["id"] #=> ["1", "2", "3"]
table.first #=> CSV::Row "id":"1", "first_word":"Comma", "second_word":"Separated"

Converters

Converters are useful for manipulating the quality of the values you want to add to your table. What I mean by that is the read method sans converters will add each value to the output table or new CSV file as strings. So the ids above are all saved as strings in the table. To make our tables more representative, using a converter when we first read the file will give the add flavor, texture, weight, feel, nuance, luminosity, quality, originality, shape, capacity, and inevitably power to the data.
The "built-in" converters are integer, float, numeric, date, date_time, and all. numeric is a combination of integer and float (not quite sure really...) and all is a combination of date_time and numeric.
Converters are another option and are used as such e.g.

table = CSV.read("filepath.csv", headers: true, converters: numeric)
table.first[0].class #=> Integer

There is also the converter creator which adds even more capacity to the CSV methods. It allows you to create a new converter like so:

CSV::Converters[:boolean] = ->(value) { value.downcase.to_s == "true" rescue value }
table = CSV.parse("1,true\n2,false\n3,false", converters: :boolean)
table[0][1] #=> TrueClass

Bonus: The classic Write-a-File

So now you can read some files but what about writing them. There's a lib for that ;D.
Like reading a CSV file to create a one you use the build in method openobviously... A weird ruby thing (or maybe it's programming languages in general I'm new to this ok) which there are very few of is the built-in class methods that have a million uses and obscure general names. Like CSV.open There are CSV.generate and CSV.new methods, but why use those when there's CSV.open the one stop shop for all your CSV needs. Wait not that's CVS or is it!!!!?!/ Well yes.
To make a new CSV file quickly and efficiently run something like:

CSV.open('filename.csv', "wb") { |csv| csv << [1,"yes","yes"]; csv << [2,"no","no"] }

The CSV.open method takes in a few arguments namely the filename and then an option to write a file instead of read it or otherwise. It also has access the options covered above which add quality to the values you're compiling.

Conclusion

Some thoughts to sit with.

*Do not reinvent the wheel...It's generally unnecessary, time consuming, and duly under-appreciated because of the first two(sometimes a good learning exercise but also give credit to those who did it without knowing if it was possible).
*Ruby has tons of great built in class methods like the CSV library!
*Life's hard so Git-gud.

Posted on by:

andershornor profile

Anders Hornor

@andershornor

I'm a forever-student trying to learn more; maybe I'm never 'knowing' enough. also check out my undergrad thesis(a.k.a. something else I wrote): https://www.researchgate.net/profile/Anders_Hornor

Discussion

markdown guide