DEV Community

Code-free machine learning with Ludwig

Chris Hunt on March 09, 2019

Intro and Ludwig At the start of February 2019, Uber made their code-free machine learning toolbox, Ludwig, open-source. Website - http...
Collapse
 
timusletap profile image
TimusLetap

Can you please elaborate setting up the definition of yaml for model definition??

Collapse
 
chrishunt profile image
Chris Hunt • Edited

Hi - yeah sure. The modeldef file is a YAML file which defines, at its most basic, the input and output features. In this most basic of examples, we're telling Ludwig that the input features are either numerical or category. This then preprocessed the data to train our model.

More complex models may use a level of NLP to break down sentences or process images. This would all be specified in the input feature.

Equally, we may want to define how the output features are generated. Again, that's specified here.

In my final refinement above, I've also specified the training parameters.

Documentation and plenty of examples can be found here - uber.github.io/ludwig/examples/

The final, full modeldef.yaml from the above example looks like gist.github.com/c-m-hunt/3271efb2a...

Collapse
 
timusletap profile image
TimusLetap

So for a scenario specific question I'm running ludwig in colab and the format of the yaml file as described in the docs doesn't work the same way rendering errors. Have you had an opportunity to explore this?

Thread Thread
 
chrishunt profile image
Chris Hunt

Took a bit of fiddling to get running in Colab but here it is...
colab.research.google.com/drive/1Z...

I think the explanation at the top of the notebook may have been the issue you were having. I had been running from terminal so hadn't come across it. See Github issue for details.

The first bit is getting the data and just dropping NaNs. Also changed one column name as I couldn't work out how to get it working with column names with spaces in.

The rest is the training. The key is that the training is code free.

Thread Thread
 
timusletap profile image
TimusLetap

In both instances however you use a pre-built model_definition.yaml file. Any chance you would now how to create a simple model_definition.yaml file from scratch in-line? I kind of wanted to understand how that would work as it is not described in the documents so well.

Thread Thread
 
chrishunt profile image
Chris Hunt

It's not "pre-built" - you have to write the model definition yourself.

The docs at uber.github.io/ludwig/user_guide/#... explain the basics of the model definition file. It's just a yaml file defining the input and output features along with any additional parameters you want to override the Ludwig defaults.

Thread Thread
 
timusletap profile image
TimusLetap

Thank you. I figured it out. I just didn't realize how to start it off.

Collapse
 
hfugers profile image
Hans Fugers • Edited

Chris
I'm trying to redo your example but the CSV seems to have constraints which I cannot find.
Firstly I saw that the output feature should be the last field ( makes some sense)
I reconfigured data in the order of the YAML definiton but keep running into errors.

=====
indexer = self.columns.get_loc(key)
File "c:\users\uan401\appdata\local\continuum\anaconda3\envs\ludwig\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Cylinders'

removed some data with missing values but same results.

What is the structure you are using ( 4-line example??)

Collapse
 
hfugers profile image
Hans Fugers • Edited

my example data
8\307.0\130.0\3504.\12.0\70\1\18.0
8\350.0\165.0\3693.\11.5\70\1\15.0
8\318.0\150.0\3436.\11.0\70\1\18.0
8\304.0\150.0\3433.\12.0\70\1\16.0

NB df = pandas.read_csv("autos3.csv",header=None, sep='\')
works fine with my data (\ is a double \ )

Collapse
 
chrishunt profile image
Chris Hunt

Hi. Have you had a look at the colab notebook in the previous comment? That runs end to end including the little bit of fiddling with the data

Collapse
 
mariyamimtiaz profile image
Mariyamimtiaz • Edited

Hello!
May you please elaborate Visualization commands? I am facing some warnings related to TensorFlow.
Thanks in anticipation

Collapse
 
chrishunt profile image
Chris Hunt

Hi

I think it's the latest version of Tensorflow is warning about features which are going to be deprecated in v2 of Tensorflow which is currently sitting in alpha. It's nothing to be concerned with at the moment. I'm going to try and find out if Ludwig will be moved to support TF2. I'm pretty confident it will be.

Thanks

Chris