Back of envelope calculation for storage required (based on inputs from product requirements) . Aurora scales automatically so less of an issue.
Read scaling - Back of envelope no calculation for # of reads per sec - can be handled by adding read replicas.
Write scaling - In single-primary mode - there is a single writer DB . This can be the main factor for choosing the db instance.
Estimate the number of write connections needed per second . and pick the db instance based on the connection limits (keeping a factor for connection pooling) -
Also a guesstimate for the working set of data needed to be pulled into the memory for things like sorting/filtering etc will be a factor. The limits are mentioned in the above page.
This can only be a starting point . You should prioritise perf testing at a time your app is relatively stable to benchmark semi-realworld load. More often than not perf is constraint than storage.
(also benchmark reports in aws are highly optimistic, you mostly will not be able to achieve these results in your environment)
Assuming we will have total 10,00,000 users .
Each user record in DB will have - at max 10 fields.
Each field = max 1 kb (1000char) => Each User record = 10kb
So Total storage needed = 10,00,000 kb = 10 GB
The DB will be read heavy database. A customer will change password may be once a month but login multiple times.
DocumentDB tops out at a maximum database size of 64TB and 4,500 concurrent connections per instance. DocumentDB supports a single primary node for writes and up to 15 replicas within a single Amazon region.That means with 15 replicas DocumentDB can handle 15 * 4500(max) = 67500 concurrent connection.That means some connections will be waiting in the connection pool.However , these limits may be adjusted - we need to check with AWS.