Hello and welcome to the third blog for Outreachy. We are almost done with half of the internship already! Times flies by fast doesn’t it. After being with moja global for over a month, I am in a great position to compile the gist of moja global as a community along with talking about my project. Let’s get started!
1. What does moja global do?
As a community, moja global’s mission is to develop open-source monitoring, reporting and verifying (MRV) software for forestry, agriculture and other land use (AFOLU) sectors. We aim to provide an accurate and affordable estimate of greenhouse gas emissions and its removal from AFOLU sectors.
The next question comes - why do we need to do so? You see, the AFOLU sector accounts for about 25% of global greenhouse gas emissions. When we talk about climate change and achieving net neutrality, it is important to reduce emissions and improve sinks in AFOLU sectors.
To put it in brief:
"Moja global came into being, by building a community of experts, scientists and developers, to create the first open-source source software that is affordable, accurate and fully customizable to a user’s needs and available data, whether the user is a country, a region, a local government, a company or a project planner."
2. What is my project?
To understand my project, let me first briefly explain the software developed by moja global.
FLINT (Full Lands Integration Tool) is moja global’s flagship software. It is an open-sourced tool which provides countries the ability to build and run advanced MRV system quickly and efficiently and GCBM (generic Carbon Budget Model) is an open-sourced model that operates on the FLINT platform. GCBM assesses and reports the cumulative effects of anthropogenic and natural disturbances on our forests.
Now for my project. Simply put, the objective of the project is to adopt the reproducibility and resilience of MLOps in GCBM models through DVC and CML. A lot of new words at once, eh? Yeah, I was in a similar space at the beginning of the internship. While we will talk about these terms in detail in following blogs, let me define a few, briefly, here.
A. MLOps (Machine Learning Operations)
The formal definition of MLOps states it as the extension of the DevOps methodology to include Machine Learning and Data Science assets as first-class citizens within the DevOps ecology. In other words, MLOps provide us with capabilities to deploy, monitor, manage and govern ML models in a production environment.
B. DVC (Data Version Control)
DVC is a tool developed by iterative.ai that helps manage data and ML models making them shareable and reproducible. DVC helps us use remote storage for data and models. Hence, we do not need to keep large amounts of data and gigantic models in our git repositories. When we wish to work with or modify these files, we can pull them from the storage through DVC. Moreover, DVC can manage pipelines for ML projects to make the experiments reproducible for say, all members. These lightweight pipelines list names of stages, their dependencies, their outputs, etc and are created using dependency graphs.
Now you have a general idea of what the project is about, including a high-level view of how DVC can be used for ML projects. This is exactly what I have done as the first task for my project! I used DVC for keeping the large output files in a remote storage, for easier sharing of files, along with creating a pipeline for the numerous stages in GCBM models to make the experiments reproducible.
3. What makes me most excited to work on this project?
One of the major reasons I was excited to work with moja global is their objective. I had been looking for opportunities to work on a project or with an organisation that primarily works towards climate change. Moja global provides me with the same. On top of that, the project itself helps me learn how to generate business value out of ML models. So, it is an exciting opportunity towards gaining a better understanding of a career in this field!
Top comments (0)