DEV Community

Ananyashree
Ananyashree

Posted on

Mid-internship project report

Hello and welcome to the fourth blog for Outreachy. This blog is a mid-point report of the project. It has been in the drafts for too long thanks to my perception that it is not good enough. I guess it is finally time to talk about the project without the hindrance of the thought about the quality of conveyance.

1. What was my original project timeline?

The original timeline at the starting of the internship was to create DVC pipelines and implement CML action for GCBM Carpathians. It was divided among time to learn about MLOps, followed by a few couple of weeks each for DVC pipelines and CML workflow implementation.

2. What have I accomplished in the first half of the internship?

In the first half, I focused on creating a DVC pipeline for GCBM Carpathians. It currently consists of 6 stages which were put into the pipeline through DVC. Each stage in the dvc.yaml file, consists of the command to be executed for that stage, the working directory where the command is to be executed, dependencies for the stage and the outputs. The outputs of each stage are tracked by DVC and hence, are not by git (unless used with --force).
To execute the DVC pipeline in a particular order, one can provide the output of one stage as the input for the next stage in the pipeline. I used log files from each stage as its output. Hence, with the command dvc repro, the user can now recreate or reproduce the pipeline and execute all the stages in a Standalone GCBM model. With the help of DVC, one can create a pipeline where one can monitor how different the stages vary if the dependencies change. One can also track datasets and results with DVC. This helps us to keep bulky datasets or results out of our git repository while only keeping a lightweight metadata in git. Besides pushing the outputs of a stage, one can also directly upload the same to a remote storage to be tracked by DVC. The six stages of the DVC pipeline are shown below:

DVC pipeline stages

3. What project goal took longer than expected?

It took me a few days to get all the dependencies right so that GCBM Carpathians simulation was running, along with latest DVC. There were many conflicts which had to be resolved while working with python 3.8+. Eventually, after some testing, I settled for installing dependencies through the wheel files originally released with GCBM Carpathians.

4. What would you do differently if you were starting the project over?

I would probably not do anything differently. I had very limited knowledge as I started the project. It took me time and effort to experiment and learn the skills required for the project. If I were to start the project again, with the same limited knowledge, I would probably go down the same path. The only difference would be to communicate with my mentors more as I was highly unsure of my capabilities in the beginning.

5. Plan for the second half of the internship?

The plan for the next half remains the same. I am currently working on a CML report for GCBM Carpathians which will help plot figures and provide metrics with different configurations. To create reports through CML action, GCBM Carpathians requires a Postprocessing step that summarizes the results. I have added this Postprocessing analysis to the same.

Top comments (0)