Scalability
Ability of a system to cope with increased load.
Load can be described using load parameters
Twitter - November 2012 case study
- 2 Operations
- Post Tweet - 12k requests/sec
- Home Timeline - 300k req/s sec
- 2 Approaches
- Every time you refresh your home timeline, look up everyone they follow and get their tweets
- Every time you write a post, send it to every followes' mail box aka home timeline.
- Most tweets are in approach 1, while the second approach is used for celebrities who have a lot of followers.
Performance Metrics
In a batch processing system - We use throughput
In online systems - Service's Response time.
Response Time - What the client sees
Latency - The duration that a request is waiting to be handled, during which it is latent
Thus, Response Time and Latency aren't the same.
It is better to think of the Response Time as a distribution and not as a single value.
It is better to use percentiles for response time, rather than mean.
High percentiles of response times - tail latencies.
High Percentiles are especially importatnt in backend services that arecalled multiple times, as part of serving a single end-user request. As it takes just one slow call to make the entire request slow - This causes an effect called Tail latency Amplification.
Forward Decay, T-digest, HdrHistorgram
Approaches for coping with Load
Scaling Up - Vertical Scaling - A More Powerful Machine
Scaling Out - Horizontal Scaling - Distributing the load across multiple machines - aka Shared-Nothing Architecture
Usually it is a mix of both
Elastic Systems -> Automatically add compute resources when they detect a load increase.
Common Wisdom - Keep your database on a single node (scale up) until scaling cost or high-availability requirements forced you to make it distributed.