IPFS stands for InterPlanetary File System. It is similar to the idea of torrents, but better. IPFS is a peer-to-peer hypermedia protocol designed to make the web faster, safer, and more open. I'm not going to nerd about IPFS more; just read the IPFS whitepaper.
I stumbled upon IPFS a couple of years ago and found it interesting. Back then, the only way to access IPFS was to spin up your own node(not sure, maybe lack of research). Today we have multiple free IPFS endpoints. We can use these endpoints to interact with the IPFS network.
The article is about storing text data on the IPFS network. This is something I've worked on the past few days, using the IPFS network to store data for free.
Tools used
- FastAPI
- MongoDB
- Svelte
- Infura IPFS endpoint
Why use a backend?
It is easy to make get/post requests to the IPFS endpoint using javascript's fetch
API. But the problem is IPFS creates a hash for each file. This hash can be used for file identification.
QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX
But it is not an easy job to remember such hashes, so we need to store an alias to these hashes using a database.
FastAPI will regulate the whole program flow. We'll build APIs for communication between services.
Building the application
Setup env variables
# .env
MONGO_CON_STRING=mongodb://localhost:27017/
Setup mongodb
Let's use Docker to spin up MongoDB. Docker removes the overhead for a local installation and other basic setups.
# pull MongoDB
docker pull mongo
# Start mongo container
docker run -it -v mongodata:/data/db -p 27017:27017 --name ipfs-store -d mongo
-v mongodata:/data/db
-v
is for specifying the volume. It is important to map MongoDB storage to the local directory to persist data even after the container is stopped. We map /data/db
of container to mongodata
of our project directory. Make sure the mongodata
folder exists.
# requirements.txt
aiofiles==0.5.0
fastapi==0.61.1
ipfs-api==0.2.3
pymongo==3.11.0
sqlitedict==1.7.0
uvicorn==0.12.2
Code the database
We'll use pymongo
to communicate with our database.
# database/database.py
class DataBase:
def __init__(self) -> None:
self.client = MongoClient(getenv("MONGO_CON_STRING"))
self.db = self.client.pasteit
self.col = self.db.links
def set(self, short: str, hash: str) -> str:
short_exists = self.col.find_one({"hash": hash})
if short_exists is not None:
return short_exists.get("short")
data = {"short": short, "hash": hash}
self.col.insert_one(data)
return short
def get(self, short: str) -> str:
data = self.col.find_one({"short": short})
if data is not None:
return data.get("hash")
return None
def close(self) -> None:
self.client.close()
Creating abstractions like this can make it easy to read code. I defined the set
and get
method with a series of pymongo operations to get the job done.
Every database insertion will be of this format,
{
"short": "hash"
}
You can also use Redis here since we're making all insertions key: value
based; I used MongoDB because this application is deployed on vercel with MongoDB atlas.
The code above is fairly simple. We create a get
method to fetch a hash based on the short provided. We define the set
method to store a short: hash
pair. But first, we make sure the hash isn't already in the database.
Make the IPFS connection
# ipfs/ipfs.py
class IPFS:
def __init__(self) -> None:
self.ipfs = ipfsApi.Client("https://ipfs.infura.io", 5001)
def add(self, text: str) -> str:
filename = f"/tmp/{str(uuid4())}"
with open(filename, "w") as f:
f.write(text)
res = self.ipfs.add(filename)
remove(filename)
print(res)
return res[0].get("Hash")
def cat(self, hash: str) -> str:
data = self.ipfs.cat(hash)
return data
Communications with the IPFS endpoint are simple get/post requests with payload, but you need to take care of the encoding. I used a library which has already done the basic things for us.
We define an add
method, which writes the input string to a file and then uploads it to IPFS. The cat
method reads the data using the hash.
Code the server
The server has two endpoints. /api/v1
to post the text to be uploaded and /
to fetch data using short URLs.
# main.py
async def connection() -> dict:
return {"db": DataBase(), "ipfs": IPFS()}
@app.post("/api/v1/")
async def pasteit(data: Data, con: dict = Depends(connection)) -> dict:
hash = con["ipfs"].add(data.text)
short = str(uuid4())[:6]
short = con["db"].set(short, hash)
con["db"].close()
return {"message": short}
@app.get("/{short}")
async def get_paste(short: str, con: dict = Depends(connection)) -> dict:
hash = con["db"].get(short)
if hash is not None:
data = con["ipfs"].cat(hash)
return {"message": data}
con["db"].close()
return {"message": "invalid short"}
Here we assume that all data is successfully uploaded. Then we create a custom identifier for each hash using the first six characters of uuid.uuid4()
. We need to perform a collision test on this method of short generation.
# collision_test.py
from uuid import uuid4
def get_id() -> str:
return str(uuid4())[:6]
def test_n(n: int) -> None:
outputs = [get_id() for _ in range(n)]
unique_outputs = set(outputs)
fraction = 1 - (len(unique_outputs) / len(outputs))
print(f"Test for {n} shorts, collision: {fraction*100:.2f}")
if __name__ == "__main__":
test_n(100)
test_n(1000)
test_n(10000)
test_n(100000)
test_n(1000000)
-> python collision_test.py
Test for 100 shorts, collision: 0.00
Test for 1000 shorts, collision: 0.00
Test for 10000 shorts, collision: 0.05
Test for 100000 shorts, collision: 0.26
Test for 1000000 shorts, collision: 2.93
-> python collision_test.py
Test for 100 shorts, collision: 0.00
Test for 1000 shorts, collision: 0.00
Test for 10000 shorts, collision: 0.01
Test for 100000 shorts, collision: 0.27
Test for 1000000 shorts, collision: 2.92
I guess the test passed, except for n=1,000,000
, which got ~30,000 collisions. But it's safe to assume we're not going to get that many requests in a short span of time.
The frontend
# src/App.svelte
<script>
let data = "";
let hash = "";
const upload = () => {
fetch("http://localhost:8000/api/v1", {
method: "POST",
body: JSON.stringify({ text: data }),
})
.then((res) => res.json)
.then((data) => (hash = data.message));
};
</script>
<textarea id="data" bind:value={data} />
<button id="upload" on:click={upload}>Upload</button>
<p>{hash}</p>
This code should give you a fair idea of the frontend build. The current text limit is set to 200 characters.
What's next for pasteit!
?
I'm planning to convert this into a file sharing service on IPFS. Maybe throw in a little encryption to make people interested!!
Discussion (0)