Scalability

Ability of a system to cope with increased load.

Load can be described using load parameters

Twitter - November 2012 case study

2 Operations
- Post Tweet - 12k requests/sec
- Home Timeline - 300k req/s sec
2 Approaches
- Every time you refresh your home timeline, look up everyone they follow and get their tweets
- Every time you write a post, send it to every followes' mail box aka home timeline.
Most tweets are in approach 1, while the second approach is used for celebrities who have a lot of followers.

Performance Metrics

In a batch processing system - We use throughput
In online systems - Service's Response time.

Response Time - What the client sees
Latency - The duration that a request is waiting to be handled, during which it is latent

Thus, Response Time and Latency aren't the same.

It is better to think of the Response Time as a distribution and not as a single value.

It is better to use percentiles for response time, rather than mean.
High percentiles of response times - tail latencies.

High Percentiles are especially importatnt in backend services that arecalled multiple times, as part of serving a single end-user request. As it takes just one slow call to make the entire request slow - This causes an effect called Tail latency Amplification.

Forward Decay, T-digest, HdrHistorgram

Approaches for coping with Load

Scaling Up - Vertical Scaling - A More Powerful Machine
Scaling Out - Horizontal Scaling - Distributing the load across multiple machines - aka Shared-Nothing Architecture

Usually it is a mix of both

Elastic Systems -> Automatically add compute resources when they detect a load increase.

Common Wisdom - Keep your database on a single node (scale up) until scaling cost or high-availability requirements forced you to make it distributed.