Khan Academy scaled to 2.5x their web traffic in a week by using serverless architecture and CDN caching of all static data. They also extensively cached common queries, user preferences and session data.
You can also adopt a bottleneck-centric approach to scaling, by checking your resource monitoring to identify the bottlenecks. It is usually the database first. But bottlenecks can be memory, CPU, Network I/O or Disk I/O.
As a principle, make the web stack do less work for the most common requests .
Cache database queries
Index the database
Move session storage to an in-memory caching tool
HTML fragment caching
Use queues and more workers
Use HTTP caching headers
Add a Content Delivery Network in front of a static file host
Here's a guide for scaling up to 11M users, stage by stage:
Use vertical scaling early on but it has no failover or redundancy.
At >1000: Add availability zones, load balancers, and slave database to RDS.
At 10K-100Ks: Horizontal scaling of instances. Move static content to S3 and even some dynamic content to the Cloudfront CDN. Add more read replicas of the database to RDS. Shift session state off your web tier and store session state in ElasticCache or DynamoDB
At >500K: Add automation tools and decouple infrastructure. Add monitoring, metrics and logging.
At >10M: Use federation, sharding and explore other types of DBs
Found this helpful? I write a weekly newsletter on actionable CS research and software engineering best practices.