Cover image for Feedback on Small Java Package

Feedback on Small Java Package

awwsmm profile image Andrew (he/him) ・1 min read

I just finished a small package built to intelligently infer schemata ("schemas") of CSV files:

GitHub logo awwsmm / scheme

A minimal package for intelligently inferring schemata of CSV files


A minimal package for intelligently inferring schemata of CSV files.

JaCoCo Java Code Coverage Score Build Status Link to Javadoc

  • Self-contained -- no external dependencies
  • Compatible -- runs on any Java version >= 8
  • Easy -- works immediately with no configuration required

Built to more intelligently infer schemata for creating Parquet files from CSV.


Download the repository (and unzip if you downloaded the ZIP file):

GitHub 'download' button and menu

Navigate to the target directory:

in Windows Explorer

Locating the JAR file in Windows Explorer

in Windows cmd prompt

C:\>cd C:\Users\myusername\Downloads\scheme-master\target
 Volume in drive C is Windows
 Volume Serial Number is 14EE-41C8
 Directory of C:\Users\myusername\Downloads\scheme-master\target
18 Sep 2019  17:30    <DIR>          .
18 Sep 2019  17:30    <DIR>          ..
18 Sep 2019  17:30               931 coverage.svg
18 Sep 2019  17:30            17,449 scheme-1.0.jar
               2 File(s)         18,380 bytes
               2 Dir(s)  2,749,439,602,688 bytes free

in a bash (or similar) shell on a UNIX-like OS

$ git clone https://github.com/awwsmm/scheme.git
Cloning into 'scheme'
remote: Enumerating objects:

I'm looking for feedback on it! Let me know if the layout is unclear or if the documentation could use some work, etc. I tested it on a few different versions of Java and I haven't had any problems.

Javadoc available here.

Anything you like about what I did? Anything you hate? Anything you'd change?

Let me know in the comments! And thanks for your help!


markdown guide

I've only read a bit of CSV.java and didn't like the huge schema method.
Also didn't like where it's called (
return schema(file, -1, -1, 35, false, false, false, true);)
That line isn't exactly easy to read.
Why is every method static?

I appreciate the effort in documenting everything.
The readme page is great too.

(I don't know if you are looking for feedback on using the library or the library code)


Yeah I'm sort of fighting against OOP here. I don't want to have to create a CSV object and then run the algorithm, etc. etc. I just wanted the user to be able to say "okay, give me the schema for this file", with no other input required.

Maybe that's not the best way to go about it, though.


Not what you were asking but codereview.stackexchange.com can be awesome for details...


I don't have any csv files now, but I will check out!