The analysis of Big Data has been a hot topic for at least a decade. These analyses often consists of a sequences of steps that mold and transform the data into features that have some meaningful interpretation. And maybe you want to extract features for subsequent machine learning analyses.
Would it not be cool to _graphically build analysis pipelines _? This is not a new idea of course. You can find node editors in graphics, genetic analysis, financial tech, and in many more places. Yet, if you have written your own data analysis toolbox, wouldn’t it be great if you could give your users access to your tools and example pipelines in a visual manner?
Introducing GiraffeTools
GiraffeTools is a generic G raphical I nterface for R eproducible A nalysis F or work F low E xperiments! I originally developed GiraffeTools for ‘neuroimaging’. On a daily basis, I work with people that analyse all sorts of brain data (MRI, EEG, MEG, etc.), but quickly noticed that all important toolboxes have completely different APIs and are often written in different programming languages.
So I developed an GitHub-integrated web editor that automatically creates analysis code from a visual workflow…
- To which you can add your own toolbox
- To which you can add your custom modules to any supported toolbox
- To which you can add a custom ‘grammar’ that transforms the graph representation into code for your toolbox.
- From which you can save any pipeline to a GitHub repository and easily build examples for your users
- In which you can inspect your visual workflow at any previous commit, because your workflow is fully GitHub version controlled!
Your project is accessible at: https://giraffe.tools/workflow/$username/$repository/$branch_or_commit, for example: https://giraffe.tools/workflow/TimVanMourik/SomeGiraffeExample looks like:
Add your own content
Ok, so how does that work? Well, any GitHub project can be a GiraffeTools project by simply putting a GIRAFFE.yml configuration file in the root of your project. There, you can specificy
- A files list that points to a file to which you want to write your visual pipeline,
- A nodes list that points to toolbox modules that you might want to use in your pipeline,
- A grammars list that specifies the interpreter that transforms the graph into code.
**# Content of a GIRAFFE.yml file in the root of your repository.**
tools:
workflow:
# A file in your repository to which the UI state is saved
# Currently only a single file is supported
files:
- GIRAFFE/keras.json
# A list of nodes to load into the editor
# The path is either relative to the root or a full URL
nodes:
- https://raw.githubusercontent.com/TimVanMourik/keras/giraffe-tools/keras_nodes.json
# You can load your own JavaScript code generator. Documentation will follow soon
# The path is either relative to the root or a full URL
grammars:
- language: 'Keras'
script: GIRAFFE/test.js
format: python
Now the good part is that once you built a nodes library and a grammars code interpreter, anyone can build pipelines with your software! Anyone can create a new repository and build a custom workflow with your software that is insightful and immediately shareable (by a simple Github ‘fork’).
Getting started
How what do these files look like? Examples of how to programmatically create a nodes file (Python and MATLAB code) are included in https://github.com/GiraffeTools/Libraries. It requires a JSON specification of your toolbox, the categories in which you may want to subdivide all your functions, a list of nodes (functions) that you want users to be able to drag and drop into the editor, and a list of their input and output ports:
{
"toolboxes": [
{
//specify the name of the toolbox, e.g. Keras
"name": "Keras",
//specify a list of categories within the toolbox
"categories": [
{
//specify the name of the category
"name": "core",
//specify the nodes within that category
"nodes": [
{
//specify the name and attributes of a single module
"name": "Dense",
"category": "core",
"toolbox": "Keras",
"web_url": "https://keras.io/layers/core/Dense",
//add any set of parameters that are required to generate
//your specific code.
"code": [
{
"language": "Keras",
"argument": {
"name": "Dense",
"import": "from keras.layers import Dense"
}
}
],
//add input and output ports.
"ports": [
{
"name": "units",
"input": false,
"output": false,
"visible": true,
"editable": true,
"code": [
{
"language": "Keras",
"argument": {
"kwarg": false,
"arg": 0
}
}
]
},
...
These nodes are loaded into the editor to drag’n drop, regardless of whether a grammars code interpreter is specified. Separately, you could work on this interpreter, which is a JavaScript file. In its simplest form it contains a writeCode and writeFiles function. The former translates code the pipeline graph into code, the latter specifies to what file in your repository it is written, in case a user uses the ‘Save to GitHub’ functionality.
module.exports = () => {
async function writeCode(nodes, links) {
return "I am creating code here!";
}
async function writeFiles(nodes, links) {
const myFilename = 'GIRAFFE/my_code.py';
return {
[myFilename]: await writeCode(nodes, links)
}
}
return {
writeCode,
writeFiles,
}
}
This is plain JavaScript that receives the (nodes, links) props, based on which you can write more elaborate code for your specific purpose. Examples are included in https://github.com/GiraffeTools/CodeGenerators.
Project page
As a bonus, there is the project page in which all your commits to a specific project are listed. This way, you can open the workflow at any previous point in time. You can also take a look at code from different branches of our repository.
Open source
The platform is built to create your own open source analysis pipelines. The GiraffeTools code itself is also entirely open source: https://github.com/GiraffeTools/GiraffeTools! Stars, forks, bug reports, feature requests, or contributions are much appreciated!
Documentation
This article is just an introduction of GiraffeTools. More documentation and Medium posts will follow! This is all work in progress and in active development! Any feedback is welcome!
Top comments (0)