DEV Community

Cover image for CSV Challenge
jorin
jorin

Posted on

CSV Challenge

You got your hands on some data that was leaked from a social network and you want to help the poor people.

Luckily you know a government service to automatically block a list of credit cards.

The service is a little old school though and you have to upload a CSV file in the exact format. The upload fails if the CSV file contains invalid data.

The CSV files should have two columns, Name and Credit Card. Also, it must be named after the following pattern:

YYYYMMDD.csv.

The leaked data doesn't have credit card details for every user and you need to pick only the affected users.

The data was published here:

data.json

You don't have much time to act.

What tools would you use to get the data, format it correctly and save it in the CSV file?


Do you have a crazy vim configuration that allows you to do all of this inside your editor? Are you a shell power user and write this as a one-liner? How would you solve this in your favorite programming language?

Show your solution in the comments below!

Oldest comments (33)

Collapse
 
thomasrayner profile image
Thomas Rayner

PowerShell to the rescue!

$json = invoke-webrequest 'gist.githubusercontent.com/jorinvo...' | convertfrom-json

$json | select name,creditcard | export-csv "$(get-date -format yyyyMMdd).csv" -NoTypeInformation

Collapse
 
elcotu profile image
Daniel Coturel

Excellent, man

Collapse
 
tobias_salzmann profile image
Tobias Salzmann • Edited

ramda-cli:

curl -s https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json \
| ramda 'filter where name: (complement isNil), creditcard: (complement isNil)' 'map (x) -> x.name + ", " + x.creditcard' -o raw > `date +%Y%m%d.csv`

scala:

import java.io.{BufferedWriter, FileOutputStream, OutputStreamWriter}
import java.text.SimpleDateFormat
import java.util.Date

import io.circe.generic.auto._
import io.circe.parser._

object Data extends App {
  case class CCInfo(name: Option[String], creditcard: Option[String])

  val url = "https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json"
  val json = scala.io.Source.fromURL(url).mkString

  val infos = decode[List[CCInfo]](json).toOption.get

  val lines = infos.collect{case CCInfo(Some(name), Some(creditcard)) => s"$name, $creditcard"}

  Helper.writeToFile(lines, s"${Helper.formatDate("yyyyMMdd")}.csv")
}

object Helper {
  def writeToFile(lines: TraversableOnce[String], fileName: String): Unit = {
    val writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(fileName)))
    for (x <- lines) {
      writer.write(x + "\n")
    }
    writer.close()
  }

  def formatDate(format: String, date: Date = new Date()) = 
    new SimpleDateFormat(format).format(new Date())
} 
Collapse
 
r0f1 profile image
Florian Rohrer

Nice post!

import json
from csv import DictWriter

with open("data.json", "r") as f:
    users = json.load(f)

cols = ["name", "creditcard"]
with open("20150425.csv", "w", newline='') as f:
    dw = DictWriter(f, cols)
    dw.writeheader()
    for u in users:
        if u["creditcard"]:
            dw.writerow({k: u[k] for k in cols})

All users share the same date. So I didn't bother and didn't write into separate files.
Another thing, I was going to write "Hey, that's not valid json you are giving us.", because I saw the objects are in a list and that list is not wrapped into an outer object. But my Python parser did not complain, so it turns out valid. You learn something new every day.

Collapse
 
tobias_salzmann profile image
Tobias Salzmann • Edited

Seems like json can have an array at the root, even according to the first standard: tools.ietf.org/html/rfc4627, section 2

Collapse
 
jorinvo profile image
jorin • Edited

Having arrays on the top-level of JSON documents is indeed valid although it is definitely an anti-pattern. By doing so you block yourself from adding any meta information in the future.
If you build an API, you always want to wrap an array in an object. Then you can add additional fields like possible errors or pagination later on.
e.g.

{
  "data": [],
  "status": "not ok",
  "error": { "code": 123, "message": "..." },
  "page": 42
}
Collapse
 
tobias_salzmann profile image
Tobias Salzmann

Personally, I'd prefer the array in most cases. If I call an endpoint called customers, I would expect it to return an array of customers, not something that contains such an array, might or might not have an error and so on.
If I want to stream the response, I'd also be better off with an array, because whatever streaming library I use probably supports it.

Collapse
 
ioayman profile image
Ayman Nedjmeddine • Edited

A oneliner if you're a linuxer 😉

curl -sSLo- https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json \
| jq -r '.[] | {name: .name, creditcard: .creditcard} | join(",")' \
> `date +%Y%m%d`.csv

However, there is something you have not mentioned in your post: Should the CSV file have the header line?

If yes, then use this:

echo 'name,creditcard' > `date +%Y%m%d`.csv && \
curl -sSLo- https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json \
| jq -r '.[] | {name: .name, creditcard: .creditcard} | join(",")' \
>> `date +%Y%m%d`.csv
Collapse
 
rmetzler profile image
Richard Metzler

Doesn't the second solution need a >> in the last line, so the output is appended?

Collapse
 
ioayman profile image
Ayman Nedjmeddine

Yes, it does. (Didn't copy the correct version)

Thanks ☺

Collapse
 
sukima profile image
Devin Weaver

This adds quotes.

"Dax Brekke II,1234-2121-1221-1211"
"Brando Stanton Jr.,1228-1221-1221-1431"
"Lacey McDermott PhD,"
"Elza Bauch,"

Maybe adding this sed command:

curl -sSLo- https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json \
| jq '.[] | {name: .name, creditcard: .creditcard} | join(",")' \
| sed -e 's/^"//' -e 's/"$//' -e 's/\\"/"/g' \
> "$(date +%Y%m%d).csv"
Collapse
 
thorstenhirsch profile image
Thorsten Hirsch • Edited

Well, at work I would use a tool called "IBM Transformation Extender", which is specialised on data transformation. It breaks the job down into 3 tasks:

  1. create the csv output format (there's a gui for that)
  2. import some example json data in order to create the input format
  3. develop the "map" by configuring 1 as output, 2 as input, and the following "mapping rule" for the transformation:
=f_record(EXTRACT(Record:json, PRESENT(creditcard:.:json)))

...and in f_record() one would simply drag'n'drop the name and the credit card fields from the input to the output.

Not the cheapest solution, obviously, but its maintainability is great if you have hundreds of these mappings.

Collapse
 
wolpear profile image
Jakub Karczewski • Edited

Since I started learning Ruby this week my solution written in it :D

require 'open-uri'
require 'json'

url = "https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json"
data = JSON.parse(open(url).read)
i = 0

File.open(DateTime.now.strftime("%Y%m%d") + ".csv", "w") do |f|
    f.write("Name,\"Credit Card\"")
    data.each do |record|
        if record["creditcard"]
            i+=1
            name = record["name"].match(/\s/) ? "\""+ record["name"] +"\"" : record["name"]
            f.write("\n"+name+","+record["creditcard"])
        end    
    end 
end

printf("Created CSV file, %d affected accounts detected", i)

Thanks for another great challenge Jorin :)

Collapse
 
jonathanstowe profile image
Jonathan Stowe

Perl 6? :

use JSON::Fast;

my $json = 'data.json'.IO.slurp;

my $d = Date.today;
my $out-filename = sprintf "%04i%02i%02i.csv", $d.year, $d.month, $d.day;

my $out = $out-filename.IO.open(:w);

for from-json($json).list -> %row {
    if %row<creditcard> {
        $out.say: %row<name>, ',', %row<creditcard>;
    }
}
$out.close;

Of course in reality you'd probably want to use Text::CSV to properly format the CSV output in order to handle quoting and escaping properly.

Collapse
 
curusarn profile image
Å imon Let • Edited

Oneliner:

curl "https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json" 2>/dev/null | \
 jq '.[] | .name +","+ .creditcard' --raw-output > `date +"%Y%m%d.csv"`
Collapse
 
reed1 profile image
reed1

Almost all (except 2 at this time) submission writes csv by hand, not using library. The output will not be valid if a value contains , or "

Collapse
 
niemandag profile image
Michael

True. I have not thought of that.
I open the csv in LibreOffice, to make sure it comes out fine, but with really big files, it might not be possible.

Collapse
 
doshirae profile image
Doshirae
require "json"
require "open-uri"
url = "https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json"
data = JSON.parse(open(url).read)

filtered_data = data.select { |line| not line["creditcard"].nil? }
file = File.open(DateTime.now.strftime("%Y%m%d") + ".csv", 'w')
file.write "Name,Creditcart\n"
filtered_data.each do |line|
    file.write [line["name"], line["creditcard"]].join(',')
    file.write("\n")
end

Or if you guys line nasty oneliners (requre statements don't count)

require "json"
require "open-uri"
File.open(DateTime.now.strftime("%Y%m%d") + ".csv", 'w') { |file| file.write "Name,Creditcard\n"; JSON.parse(open("https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json").read).select { |line| not line["creditcard"].nil? }.each { |line| file.write "#{line['name']},#{line['creditcard']}\n" } }

I'm trying to do it in Elixir now :D