DEV Community

Cover image for I tried to find MongoDB connection strings over 1000 public GitHub repositories
Kayode
Kayode

Posted on • Edited on • Originally published at blog.zt4ff.dev

I tried to find MongoDB connection strings over 1000 public GitHub repositories

I tried to see if I could get other people Mongo Database connection string by just searching for it on GitHub search. Yes, I found a few.

I tried connecting to a few and yes, it worked!

Before you call the cops on me, listen to my backstory. 🤗

cute dog cops

I was working on a NodeJS/Express application for practice and I remembered I pushed the .env file to my remote repository. While working on fixing this error, I thought about how many people would have made this error and it is going to stay somewhere in the commit histories even if the secrets eventually get unstaged.
So I took the bait and made this GitHub search. While most of the results are not an actual connection string, a good number of them are still alive and functional.

[DISCLAIMER: NO HARM INTENDED, THIS IS JUST TO CREATE A PUBLIC AWARENESS]

How I scanned through the 1000 repositories

Actually, GitHub Search API limits to 1,000 results for each search. Using the scripts below, I was able to generate repositories whose code included mongodb+srv:

// index.ts
import dotenv from "dotenv"
dotenv.config()

import axios from "axios";
import fs from "fs/promises";
import cliProgress from "cli-progress";

const jsonpath = "list_of_repo.json";

const makeSearch = async (page: number) => {
  const config = {
    headers: {
      Authorization: `Token ${process.env.GITHUB_API_TOKEN}`,
    },
  };

  const url = `https://api.github.com/search/code?q=mongodb%2Bsrv+in:file&page=${page}&per_page=100`;
  const result: {
    items: { html_url: string; repository: { html_url: string } }[];
  } = await axios.get(url, config);

  // make an an object from result
  let obj = {};
  result.data.items.forEach((item) => {
    obj[item.repository.html_url] = item.html_url;
  });

  await addToJson(jsonpath, obj);
};

async function addToJson(jsonpath: string, data?: object) {
  const oldJson = (await fs.readFile(jsonpath)).toString();
  let jsonData = JSON.stringify(data, null, 2);

  if (oldJson) {
    jsonData = JSON.stringify(
      { ...JSON.parse(oldJson), ...JSON.parse(jsonData) },
      null,
      2
    );
  }

  await fs.writeFile(jsonpath, jsonData);
}

async function main() {
    // I included a CLI progress loader because, who doesn’t like a loader.
  const bar1 = new cliProgress.SingleBar(
    {},
    cliProgress.Presets.shades_classic
  );
  // number of iteration 10
  bar1.start(10, 0);
  for (let i = 1; i <= 10; i++) {
    await makeSearch(i);
    bar1.update(1);
  }
  bar1.stop();
}

main();
Enter fullscreen mode Exit fullscreen mode

The results provided does not mean that an actual MongoDB connection string exists, it only implies that the repositories in the result have an in-file code that matches mongodb+srv:

I could go further to create a script to run a search through each code URL and run a regex to further find an actual connection string but that won’t be necessary as my purpose is to create public awareness and how to protect ourselves.

What I discovered and how we can protect ourselves

Some of my discoveries include:

  • some of the results include old commits in the commit history: Just like my mistake that led to this article, sometimes we forget to create a .gitignore file at the beginning of a project and have some secrets staged somewhere in the commit history.

    We can make use of tools like GitGuardian to continually scan our repo for secrets in our source code.

  • some results included messages from different log files and environment files: This probably happened due to not including a .gitignore.

    GitHub provides a repo with numerous type of .gitignore templates for different language, framework, tools, IDE e.t.c.

    And I created a simple interactive CLI to generate .gitignore templates based on the GitHub lists.

You can find the Interactive CLI tool to generate your .gitignore templates here: https://www.npmjs.com/package/gittyignore

Thanks for reading through! 🤗

If you enjoy reading this article, you can consider buying me a coffee

Top comments (16)

Collapse
 
agarwalvaibhav0211 profile image
Vaibhav Agarwal

While this is an excellent example of something wrong that happens with every developer, I have found that sometimes I need to rewrite the git history. I have found an excellent answer to this here: stackoverflow.com/questions/437623...

Collapse
 
akuoko_konadu profile image
Konadu Akwasi Akuoko

You're a savior 😘
Thanks man, I guess I need to do the same 🤣🤣

Collapse
 
techhead404 profile image
Dillon Greek

I have spend the past week pondering this after the near fatal mistake. Glad I'm not the only one who has forgotten gitignore or wondered about searching git. I worry more about keys and secrets since I work alot with algo trading bot. This could wipe out a trading bot.

Collapse
 
zt4ff_1 profile image
Kayode

exactly 😊

Collapse
 
ochsec profile image
Chris Ochsenreither

I was using Mongodb for personal projects, using .env to store the Mongodb url. Now I started a job where they use SAM, and one of the nice things is that the template doesn't have any endpoints, secrets etc. so they're never in your code base (nothing to .gitignore).

Collapse
 
neoprint3d profile image
Drew Ronsman

Is it bad to have a private repository with all the API keys shown in that repository?

Collapse
 
rolfstreefkerk profile image
Rolf Streefkerk

it is bad practice, it should be stored in secured (encrypted) storage that can be retrieved in the operating environment

Collapse
 
zt4ff_1 profile image
Kayode

A private repository cannot be queried via the GitHub Search API.

But then, it is more secured to not include your secrets in the repository.

Collapse
 
neoprint3d profile image
Drew Ronsman

Yeah so no one will be able to look atfve repository

Collapse
 
iamluisj profile image
Luis Juarez

I've definitely done this (accidentally) and only realized it when my API key got a rate limit response. Great reminder.

Collapse
 
zt4ff_1 profile image
Kayode • Edited

Wow!
Hope the API rates are not billed :)

Collapse
 
rehman000 profile image
Rehman Arshad

This is a good public awareness campaign 👏

Collapse
 
mykezero profile image
Mykezero

Great article, definitely a reminder not to store the credentials along with the application and to use a process that keeps them safe from exposure. The loader was a nice touch!

Collapse
 
zt4ff_1 profile image
Kayode

Thanks,

Imagine downloading without a loader of any progress indication 😊

Collapse
 
rehman000 profile image
Rehman Arshad

This happened a fair amount of times to me to the point that whenever I start on any project where I plan on using any API keys I instinctively add .env to the .gitignore file immediately before anything else.

And I recall my professor telling me about actual bots sifting through github looking for api keys accidentally commited in git histories to exploit.

Collapse
 
madeeh_syed profile image
Info Comment hidden by post author - thread only accessible via permalink
AlexHale

If you want to save a large amount of your money from tello then click the link and get a large variety of quality coupons from tello. So, click the link and save your money.
mysavinghub.com/store/tello-coupons

Some comments have been hidden by the post's author - find out more