I just finished a small package built to intelligently infer schemata ("schemas") of CSV files:
scheme
A minimal package for intelligently inferring schemata of CSV files.
- Self-contained -- no external dependencies
- Compatible -- runs on any Java version >= 8
- Easy -- works immediately with no configuration required
Built to more intelligently infer schemata for creating Parquet files from CSV.
Download
Download the repository (and unzip if you downloaded the ZIP file):
Navigate to the target
directory:
in Windows Explorer
in Windows cmd
prompt
C:\>cd C:\Users\myusername\Downloads\scheme-master\target
C:\Users\myusername\Downloads\scheme-master\target>dir
Volume in drive C is Windows
Volume Serial Number is 14EE-41C8
Directory of C:\Users\myusername\Downloads\scheme-master\target
18 Sep 2019 17:30 <DIR> .
18 Sep 2019 17:30 <DIR> ..
18 Sep 2019 17:30 931 coverage.svg
18 Sep 2019 17:30 17,449 scheme-1.0.jar
2 File(s) 18,380 bytes
2 Dir(s) 2,749,439,602,688 bytes free
in a bash
(or similar) shell on a UNIX-like OS
$ git clone https://github.com/awwsmm/scheme.git
Cloning into 'scheme'
remote: Enumerating objects:
…I'm looking for feedback on it! Let me know if the layout is unclear or if the documentation could use some work, etc. I tested it on a few different versions of Java and I haven't had any problems.
Anything you like about what I did? Anything you hate? Anything you'd change?
Let me know in the comments! And thanks for your help!
Top comments (6)
Yeah I'm sort of fighting against OOP here. I don't want to have to create a
CSV
object and then run the algorithm, etc. etc. I just wanted the user to be able to say "okay, give me the schema for this file", with no other input required.Maybe that's not the best way to go about it, though.
I don't have any csv files now, but I will check out!
There are example ones in the package!
at src/main/resources/
thanks will check it out!
Not what you were asking but codereview.stackexchange.com can be awesome for details...
Great tip!