DEV Community

Cover image for Day 12: Configure a DVC Remote Storage
Thu Kha Kyawe
Thu Kha Kyawe

Posted on

Day 12: Configure a DVC Remote Storage

Lab Information

The xFusionCorp Industries ML team uses SeaweedFS as the shared S3-compatible object store for DVC-tracked data. A .dvc/config already declares a remote called s3 for the fraud-detection project, but dvc push currently fails. Correct the configuration and push the tracked data into the SeaweedFS bucket.

A project exists at /root/code/fraud-detection/ with DVC initialised and data/raw/transactions.csv already tracked.

SeaweedFS is already running on the controlplane:
    S3 endpoint: http://localhost:8333
    Filer UI: open the SeaweedFS Filer button at the top of the lab (forwarded port 8888) – buckets are visible under /buckets/.
    Credentials: weedadmin / weedadmin123 (already set in .dvc/config)
    Bucket name: dvc-storage (already created and visible in the Filer UI under /buckets/dvc-storage)

Review the existing .dvc/config and correct everything that prevents dvc push from succeeding. The remote called s3 must:
    point at the dvc-storage bucket using s3://;
    use the correct SeaweedFS S3 endpoint URL;
    be marked as the default remote.

Push the tracked data. After the push, the dvc-storage bucket in the SeaweedFS Filer UI must contain at least one object under the files/md5/... prefix.
Enter fullscreen mode Exit fullscreen mode

Lab Solutions

✅ Part 1: Lab Step-by-Step Guidelines

Step 1: Move into the repository

cd /root/code/fraud-detection
Enter fullscreen mode Exit fullscreen mode

Step 2: Inspect the current DVC configuration

View the config:

cat .dvc/config
Enter fullscreen mode Exit fullscreen mode

Or:

dvc remote list

You will likely find one or more configuration mistakes.

Step 3: Verify the remote configuration

The lab requires:

Setting Required Value
Remote name s3
Bucket s3://dvc-storage
Endpoint URL http://localhost:8333
Default remote s3

A correct configuration looks similar to:

['remote "s3"']
    url = s3://dvc-storage
    endpointurl = http://localhost:8333
    access_key_id = weedadmin
    secret_access_key = weedadmin123

[core]
    remote = s3
Enter fullscreen mode Exit fullscreen mode

Step 4: Fix the configuration
Set the remote URL

dvc remote modify s3 url s3://dvc-storage
Enter fullscreen mode Exit fullscreen mode

Set the SeaweedFS endpoint

dvc remote modify s3 endpointurl http://localhost:8333
Enter fullscreen mode Exit fullscreen mode

Set the remote as default

dvc remote default s3
Enter fullscreen mode Exit fullscreen mode

Step 5: Verify the final configuration

cat .dvc/config
Enter fullscreen mode Exit fullscreen mode

Confirm:

[core]
    remote = s3
['remote "s3"']
    url = s3://dvc-storage
    endpointurl = http://localhost:8333
    access_key_id = weedadmin
    secret_access_key = weedadmin123
Enter fullscreen mode Exit fullscreen mode

Step 6: Check DVC status

dvc status
Enter fullscreen mode Exit fullscreen mode

Expected:

Data and pipelines are up to date.

Step 7: Push the data

Run:

dvc push
Enter fullscreen mode Exit fullscreen mode

Expected output:

root@controlplane fraud-detection on  main [!] ➜  dvc push
Collecting                                           |1.00 [00:00,  658entry/s]
Pushing
1 file pushed   
Enter fullscreen mode Exit fullscreen mode

Step 8: Verify remote storage

You can verify from DVC:

dvc status -c
Enter fullscreen mode Exit fullscreen mode

Expected:

root@controlplane fraud-detection on  main [!] ➜  dvc status -c
Cache and remote 's3' are in sync.           
Enter fullscreen mode Exit fullscreen mode

Step 9: Verify in SeaweedFS

Open the SeaweedFS Filer UI:

SeaweedFS Filer
→ /buckets/
→ dvc-storage/

You should see a path similar to:

files/
└── md5/
└── xx/
└── xxxxxxxxxxxxxxxxxxxx

The exact hash will vary.

This confirms the dataset was uploaded successfully.


🧠 Part 2: Simple Step-by-Step Explanation (Beginner Friendly)

  • What is happening in this lab?

Previously, DVC was tracking the dataset locally.

Right now:

transactions.csv

transactions.csv.dvc

But the actual file content is only stored on your machine.

The team wants all datasets backed up to a shared storage system.

  • What is SeaweedFS?

SeaweedFS is acting like a private Amazon S3 service.

Think of it as:

Local Machine

DVC

SeaweedFS Bucket

Instead of storing datasets in Git, DVC stores them in the bucket.

  • Why does dvc push fail?

Usually because one of these is wrong:

Wrong bucket name
Wrong endpoint URL
Remote not set as default

The lab specifically tells us the correct values:

Bucket: dvc-storage
Endpoint: http://localhost:8333
Remote: s3

  • What does dvc remote default s3 do?

It tells DVC:

Whenever I push or pull,
use the remote named "s3".

Without this setting, DVC may not know which remote to use.

  • What does dvc push do?

When you run:

dvc push

DVC uploads the tracked dataset into SeaweedFS.

Conceptually:

transactions.csv

DVC Cache

SeaweedFS Bucket


Resources & Next Steps
📦 Full Code Repository: KodeKloud Learning Labs
💬 Join Discussion: DEV Community - Share your thoughts and questions
💼 Let's Connect: LinkedIn - I'd love to connect with you

Credits
• All labs are from: KodeKloud
• I sincerely appreciate your provision of these valuable resources.

Top comments (0)