loading...

Deciding a database architecture for a Social Networking use-case?

presto412 profile image Priyansh Jain ・1 min read

So I'm in the process of writing a Node-backed server, and one major part of the application is the Social Network Platform. I don't plan on using MySQL, even though it's reliable.

After some research, I stumbled upon multiple articles stating why you should use one over the other or with the other. It does make sense that using a graph database is the best option, better than a simple RDBMS/conventional NoSQL.

As of now, I've got these options with me

  • MongoDB for document storage, and using mongo-connector in collaboration with neo4j for storing relationships, likes and stuff. Looks good to me, as some part of the application has already been developed with mongo, wouldn't have to rewrite everything.
  • OrientDB as a whole package from the ground up
  • Gremlin
  • Apache Cassandra(No idea how this works, any help could save me some time :D)

I'll have you know that the social network relationship is but a part of other functions the application is supposed to execute, the other one being a chat interface, along with real-time mapping.

Any advice would be much appreciated!

Posted on by:

presto412 profile

Priyansh Jain

@presto412

SWE at Endurance International Group

Discussion

markdown guide
 

You are on the right track, you will see the improvements over a mySQL-ilike database after you write a few dozens queries. Do not jump on Cassandra unless you have a team of devops and TB of data, is a big gun for big problems.

Few examples:
DGraph - written in Go and fast
Neo4J - big community
OrientDB - SQL like queries.

Here is a comparison between the 3 of them db-engines.com/en/system/Dgraph%3B...
and a list of popularity db-engines.com/en/ranking/graph+dbms

I would prefer a cloud-based managed DB in the first iterations because I don't want to handle the DevOps but nowadays all of them have docker images which makes them very easy to install.

 

Hmm, Dgraph does seem promising. I also looked at ArangoDB, and it seems damn nice for a direct transition from MongoDB, since I'm more used to JSON storage. Neo4j is commercial for enterprise and requires learning Cypher queries, and the same goes for ArangoDB.
Neo4j does seem well supported and popular, any issues I have should get resolved.

OrientDB, on the other hand, is free to use but does seem buggy, and some online content does point to the fact that using it in production may fail.

So finally, I think I will have to decide between using OrientDB and ArangoDB. What are your opinions on this?

 

You should probably go through a second phase of research and test them before you commit to one of them.

Managed options could be AWS Neptune and Azure Cosmos DB.

For the opensource options try also to go through their github and see how active they are and what kind of issue tickets they have open.

Hmm, the app is more waterfall staged, as in, the deployment will only be taken care of at the end, the MVP is when the entire product development is done.
I'm not sure I should commit to a cloud managed database, while in development, cause if later the requirements change it might load up the work.

Will definitely have a good look at the open-source options before proceeding, thanks!

 

Just came across this post right now. What did you end up using finally?

I used Arango at my previous workplace to build a "Suggested Friends" feature. Graph queries are a pleasure to write using Arango's Query Language. The database was very performant for our user base of 1 million with around 400 edges per user. We were able to run it on a single instance with 8 GB RAM. It out-performed our previous Postgres-based solution by almost 15 times and reduced a huge Redis instance which we had needed for caching results.

Woah. Nice. I ended up with using Postgres and graphql, although I'm not working on the project anymore. The consultant was hell bent on using a cloud managed database such as RDS, and we had to use postgres.

That is a shame, the devs will have a hard time scaling and querying 4+ deep relationships.

I was in the opposite situation a few months back, as a consultant I recommended a Graph database for a similar project and they ended up using a SQL.

 

Mongo won't grow all the way with you but if it's what you know and you don't have too complex a data model it can at least get you to the point where you can regroup and figure out a better longterm solution. A database in the hand is worth two in the bush & all.

Cassandra is useless for what you're trying to do. When you need someplace to collect high-volume analytics or something take another look at it.

 

Thanks for voiding out Cassandra. Any idea about the other multi-model/graph databases mentioned above?

 

I've never used any but from what I know about them, a social network seems like a good fit for graph dbs.

Btw, great work with massive.js, if I ever think of using RDBMS for anything in my project, I will keep it in mind!

this is my mail vishalsha95570@gmail.com Can you please ping me on this need to discuss.

 

Another more low-level option could be a key-value store like LevelDB.

You would have to figure out how to represent the relationships between users yourself, but there's no query language to learn and lookups are super fast.

Here's a cool LevelDB wrapper for Node that I've been meaning to try out:

github.com/level/level

 

A key value store wouldn't make sense for me to add the "friends of friends" functionality, since effectively it will be parsing through the entire list. I can still try to get an optimised method by using hashing to locate items I guess. Thanks for the suggestion.

 

I'd really suggest a graph or a triple store database for such a use case. That's where they basically shine. A lot of possible and unknown relationships and being able to find and query them.