In this post:
- Designing Relational and NoSQL Schemas
- Lessons from the Blog Project
- What's exciting about this data
Designing Relational and NoSQL Schemas
We're in the midst -- day three -- of SDC - the backend project in which we'll learn and apply concepts like:
- Database selection and schema design
- Horizontal and Vertical scaling
- Deployment and server administration
I'm excited to learn these things.
Our current challenge is to design schemas for a relational and a non-relational database. Our group is going with two common choices: PostgreSQL and MongoDB, respectively.
The relational schema design was fairly straightforward. Focusing on normalization and primary-foreign key relationships, I was able to conceptualize the schema without any real struggle.
Designing for MongoDB is more of a challenge and a bit more interesting for all of the choices the designer must consider. Choices like:
How are parent-child relationships referenced with data embedding? (What's the cardinality of the 1-to-N relationships)
Are there high read-write ratios?
How do you intend to access data?
Lessons from the Blog Project
Already my blog project has paid dividends because I've gotten some hands-on experience with the tradeoffs that these choices imply.
I decided to go with what I felt was the simplest schema for my early purposes - a blog post model that embedded author info within the post. Great for creating a feed of blog posts (the author is right there in the post data), but not great for accessing author information (creating a list of authors). The project is a great playground for trying new things - even weak ideas, to learn what makes them weak.
What's exciting about this data
The Size of the dataset. This is the first time working with data that comes in the hundreds of thousands of rows. We're talking csv files that are hundreds of MBs in size!
This is the kind of data I want to be working on -- as a student but also professionally -- so I'm excited to be at this stage. This is when things like big-O notation come back around - when designing projects with scale in mind. And I suspect that making bad decisions here can really punish us with poor performance and in turn, a poor user experience.
Some Modeling Considerations:
- Answers have to be accessed on their own.
- 1-to-N Question-Answer (Answers never stand alone)
- There are 'many' (versus 'some' or 'squillions') questions and answers respective to their parent relationship.
Questions:
- Post/insert speed: important or not important?
If there are more than a couple of hundred documents on the “many” side, don’t embed them; if there are more than a few thousand documents on the “many” side, don’t use an array of ObjectID references
I'd like to assume that there are several thousand questions per product and thus use an array of object id references to questions.
This is just the start and I'm excited to go deeper!
Top comments (0)