DEV Community

Cover image for CSV Challenge
jorin
jorin

Posted on

CSV Challenge

You got your hands on some data that was leaked from a social network and you want to help the poor people.

Luckily you know a government service to automatically block a list of credit cards.

The service is a little old school though and you have to upload a CSV file in the exact format. The upload fails if the CSV file contains invalid data.

The CSV files should have two columns, Name and Credit Card. Also, it must be named after the following pattern:

YYYYMMDD.csv.

The leaked data doesn't have credit card details for every user and you need to pick only the affected users.

The data was published here:

data.json

You don't have much time to act.

What tools would you use to get the data, format it correctly and save it in the CSV file?


Do you have a crazy vim configuration that allows you to do all of this inside your editor? Are you a shell power user and write this as a one-liner? How would you solve this in your favorite programming language?

Show your solution in the comments below!

Latest comments (33)

Collapse
 
davebucklin profile image
Dave Bucklin

awk:

awk '
    BEGIN {FS=",";OFS=","}
    /creditcard/ {
        split($1,namearr,":")
        name = namearr[2]; gsub(/"/,"",name)
        split($5,dtarr,":")
        dt = dtarr[2]; gsub(/"| .+|-/,"",dt)
        split($6,ccarr,":")
        cc = ccarr[2]; gsub(/"|}/,"",cc)
        fname=dt ".csv"
        print name, cc >> fname
    }
' data.json
Collapse
 
jorinvo profile image
jorin

Just leaving a note here for everyone that would like see more tools and solutions. Checkout the original CSV Challenge.

Collapse
 
rpalo profile image
Ryan Palo

Using the CSV module to avoid any quoting pitfalls. :)

require 'CSV'
require 'date'
require 'JSON'

data = JSON.parse(`curl #{ARGV[0]}`)
filename = Date.today.strftime('%Y%m%d') + '.csv'

CSV.open("#{filename}.csv", 'w') do |csv|
  data
    .select { |item| item['name'] && item['creditcard'] }
    .map { |item| [item['name'], item['creditcard']] }
    .sort
    .each { |item| csv << item }
end
Collapse
 
jorinvo profile image
jorin • Edited

Ruby is still one of the most pretty languages!
Maybe you can use the open(url).read from require 'open-uri' instead of curl to allow it to run on other systems 🙂

Alernatively could look like this:

CSV.open "#{Date.today.strftime '%Y%m%d'}.csv", 'w' do |csv|
  JSON.parse(open(ARGV[0]).read).each { |x| csv << x if x['creditcard'] }
end
Collapse
 
rpalo profile image
Ryan Palo

Oh, I like that!

  1. I didn't know about those extra options for CSV. Awesome.
  2. I didn't know about the open-uri built-in. Also awesome.
  3. I love the short and sweet each block! It even feels a little Pythonic, which is nice. Also also awesome!
Collapse
 
alexdante profile image
alex䷰dante • Edited
import csv
import requests

LEAK_URL = 'https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json'

with open('20171112.csv', 'w') as csv_file:
    writer = csv.DictWriter(csv_file, ['Name', 'Credit Card'])
    writer.writeheader()
    writer.writerows(
        {'Name':x['name'],'Credit Card':x['creditcard']} for x in requests.get(LEAK_URL).json() if x['creditcard']
    )
Collapse
 
alad profile image
Al • Edited

R

library("jsonlite")

frames <- fromJSON("https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json")
frames <- frames[!is.na(frames$creditcard),]
frames <- frames[,c("name","creditcard")]

write.csv(frames, file="20171112.csv", row.names=FALSE)
Collapse
 
sukima profile image
Devin Weaver

Since the input JSON could be really large, here is a Node.JS steaming version (using stream-json package):

#!/usr/bin/env node
let fs = require('fs');
let { Transform } = require('stream');
let StreamArray = require("stream-json/utils/StreamArray");
let stream = StreamArray.make();

function escapeCSV(str) {
  if (str == null) { return ''; }
  return /[",]/.test(str) ? `"${str.replace(/"/g, '\\"')}"` : str;
}

class CsvStream extends Transform {
  constructor() {
    super({objectMode: true});
  }
  _transform(chunk, enc, cb) {
    let { name, creditcard } = chunk.value;
    let line = [name, creditcard].map(escapeCSV).join(',');
    this.push(`${line}\n`);
    cb();
  }
}

process.stdin
  .pipe(stream.input);

stream.output
  .pipe(new CsvStream())
  .pipe(process.stdout);
Collapse
 
jorinvo profile image
jorin

Nice! There is also csv-write-stream then you can save some code :)

Collapse
 
mindflavor profile image
Francesco Cogno

Aaaand Rust :)

Really an overkill for this task but fun nevertheless!

extern crate chrono;
extern crate csv;
extern crate futures;
extern crate hyper;
extern crate hyper_tls;
extern crate serde;
#[macro_use]
extern crate serde_derive;
extern crate serde_json;
extern crate tokio_core;

use futures::prelude::*;
use futures::future::ok;
use tokio_core::reactor::Core;
use hyper::client::Client;
use hyper_tls::HttpsConnector;
use chrono::{DateTime, FixedOffset};
use std::collections::HashMap;
use csv::Writer;
use std::fs::File;


#[derive(Debug, Deserialize, Clone)]
struct Record {
    name: String,
    email: Option<String>,
    city: Option<String>,
    mac: String,
    timestamp: String,
    creditcard: Option<String>,
}

#[derive(Debug, Clone)]
struct RecordParsed {
    record: Record,
    ts: DateTime<FixedOffset>,
}

const FORMAT: &'static str = "%Y%m%d";

fn main() {
    let mut core = Core::new().unwrap();
    let client = Client::configure()
        .connector(HttpsConnector::new(4, &core.handle()).unwrap())
        .build(&core.handle());

    let uri = "https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json"
        .parse()
        .unwrap();

    let fut = client.get(uri).and_then(move |resp| {
        resp.body().concat2().and_then(move |body| {
            let array: Vec<Record> = serde_json::from_slice(&body as &[u8]).unwrap();
            let mut a_parsed: HashMap<String, Vec<RecordParsed>> = HashMap::new();

            array
                .into_iter()
                .filter(|item| item.creditcard.is_some())
                .map(|item| {
                    let dt =
                        DateTime::parse_from_str(&item.timestamp, "%Y-%m-%d %H:%M:%S %z").unwrap();

                    let rp = RecordParsed {
                        record: item,
                        ts: dt,
                    };

                    let date_only = format!("{}.csv", rp.ts.format(FORMAT).to_string());

                    let ret = match a_parsed.get_mut(&date_only) {
                        Some(ar) => {
                            ar.push(rp);
                            None
                        }
                        None => {
                            let mut ar: Vec<RecordParsed> = Vec::new();
                            ar.push(rp);
                            Some(ar)
                        }
                    };

                    if let Some(ar) = ret {
                        a_parsed.insert(date_only, ar);
                    }
                })
                .collect::<()>();

            a_parsed
                .iter()
                .map(|(key, array)| {
                    println!("generating file == {:?}", key);
                    let file = File::create(key).unwrap();
                    let mut wr = Writer::from_writer(file);

                    array
                        .iter()
                        .map(|record| {
                            let creditcard = match record.record.creditcard {
                                Some(ref c) => c,
                                None => panic!("should have filtered those!"),
                            };
                            wr.write_record(&[&record.record.name, creditcard]).unwrap();
                        })
                        .collect::<()>();
                })
                .collect::<()>();

            ok(())
        })
    });

    core.run(fut).unwrap();
}
Collapse
 
sukima profile image
Devin Weaver

A vanilla Node.JS version:

#!/usr/bin/env node

function escapeCSV(str) {
  if (str == null) { return ''; }
  return /[",]/.test(str) ? `"${str.replace(/"/g, '\\"')}"` : str;
}

let data = require('./sample.json');
process.stdout.write('Name,Credit Card\n');
for (let { name, creditcard } of data) {
  let line = [name, creditcard].map(escapeCSV).join(',');
  process.stdout.write(`${line}\n`);
}
Collapse
 
joshcheek profile image
Josh Cheek

solution

A few things to note: cache is a program I wrote that caches command-line invocations, it's to make it cheap to iterate (e.g. so you don't have to hit the network each time) github.com/JoshCheek/dotfiles/blob...

My shell is fish (fishshell.com) which allows multi-line editing, and the parentheses in fish are like backticks in bash, so the > (...) is redirecting the output into a file whose name is the result of the ...

Collapse
 
niemandag profile image
Michael

I set myself a time limit of 15 minutes, with no google. I did not know how to download using python, so i used wget or powershell. The rest is straight forward.

#!/usr/bin/env python3
import json
from datetime import datetime
import os
from sys import platform

URL = "https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json"

if platform == "linux" or platform == "linux2" or platform == "darwin":
    os.system("wget -O data.json %s" % URL)
elif platform == "win32" or platform == "win64":
    os.system("powershell Invoke-WebRequest -Uri %s -OutFile data.json" % URL)

with open('data.json', 'r') as input_file:
    input_data = json.load(input_file)

with open('%s.csv' % datetime.today().strftime('%Y%m%d'), 'w') as output_file:
    for victim in input_data:
        if victim['creditcard']:
            output_file.write("%s,%s\n" % (victim['name'], victim['creditcard']))
Collapse
 
tzurbaev profile image
Timur Zurbaev

PHP:

<?php

$json = json_decode(file_get_contents('https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json'), true);

$users = array_filter($json, function (array $item) {
    return !empty($item['name']) && !empty($item['creditcard']);
});

$file = fopen(date('Ymd').'.csv', 'w+');

foreach ($users as $user) {
    fputcsv($file, [$user['name'], $user['creditcard']]);
}

fclose($file);
Collapse
 
simplymichael profile image
Michael Orji • Edited

You beat me to the PHP implementation. And your solution is so elegant.

Collapse
 
doshirae profile image
Doshirae
require "json"
require "open-uri"
url = "https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json"
data = JSON.parse(open(url).read)

filtered_data = data.select { |line| not line["creditcard"].nil? }
file = File.open(DateTime.now.strftime("%Y%m%d") + ".csv", 'w')
file.write "Name,Creditcart\n"
filtered_data.each do |line|
    file.write [line["name"], line["creditcard"]].join(',')
    file.write("\n")
end

Or if you guys line nasty oneliners (requre statements don't count)

require "json"
require "open-uri"
File.open(DateTime.now.strftime("%Y%m%d") + ".csv", 'w') { |file| file.write "Name,Creditcard\n"; JSON.parse(open("https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json").read).select { |line| not line["creditcard"].nil? }.each { |line| file.write "#{line['name']},#{line['creditcard']}\n" } }

I'm trying to do it in Elixir now :D

Collapse
 
reed1 profile image
reed1

Almost all (except 2 at this time) submission writes csv by hand, not using library. The output will not be valid if a value contains , or "

Collapse
 
niemandag profile image
Michael

True. I have not thought of that.
I open the csv in LibreOffice, to make sure it comes out fine, but with really big files, it might not be possible.

Collapse
 
curusarn profile image
Šimon Let • Edited

Oneliner:

curl "https://gist.githubusercontent.com/jorinvo/7f19ce95a9a842956358/raw/e319340c2f6691f9cc8d8cc57ed532b5093e3619/data.json" 2>/dev/null | \
 jq '.[] | .name +","+ .creditcard' --raw-output > `date +"%Y%m%d.csv"`
Collapse
 
jonathanstowe profile image
Jonathan Stowe

Perl 6? :

use JSON::Fast;

my $json = 'data.json'.IO.slurp;

my $d = Date.today;
my $out-filename = sprintf "%04i%02i%02i.csv", $d.year, $d.month, $d.day;

my $out = $out-filename.IO.open(:w);

for from-json($json).list -> %row {
    if %row<creditcard> {
        $out.say: %row<name>, ',', %row<creditcard>;
    }
}
$out.close;

Of course in reality you'd probably want to use Text::CSV to properly format the CSV output in order to handle quoting and escaping properly.