This performance modeling typically involve the following steps.

- Build a graph model based on the component interaction.
- Express the mathematical relationship between input traffic, the resource consumption at the processing node (CPU and Memory based on the processing algorithm), and the output traffic (which will become the input of downstream processing nodes)
- Model external workload as random variable (with a workload distribution function)
- Run a simulation exercise to compute the corresponding workload distribution function for the workload of each link and node, such workload unit includes CPU, Memory and Network requirement (latency and bandwidth).
- Based on business requirement, pick a peak external load target (say 95%). Vary the external workload from 0 to the max workload and compute the corresponding range of workload at each node and link in the graph.
- The max CPU, Memory, I/O of each node defines capacity needed to provision for that node. The max value of each link defines the network bandwidth / latency requirement of that link

Notice that the resource are typically provisioned at the peak load target which means the resources are idle most of the time, impacting the efficiency of the overall system. On the other hand, SaaS based system introduce a more dynamic relationship (anyone can call anyone) between components which makes this tradition way of performance modeling more challenging. The performance modeling exercise need to be conducted whenever new clients or new services are introduced into the system, resulting in a non-trivial on going maintenance cost.

Thanks for the cloud computing phenomenon the underlying dynamics and economics has shifted quite significantly over the last few years and now doing capacity planning is quite different from before.

First of all, making a wrong capacity estimation is less costly when deploying additional resources are talking about minutes rather than month. Instead to attempting to construct the fully picture of the system, the cloud practices is to focus at each individual component to make sure each can "scale independently". The steps are as follows ...

- Each component scale independently using horizontal scaling. ie: f(a.x) = a.f(x)
- Instead of establish a formal mathematical model, just deploy the system in the cloud, adjust the input workload and measure the utilization at each node and link (e.g. AWS Cloudwatch)
- Based on the utility measurement, define the initial deployment capacity based on average load (not peak load).
- Use auto-scaling to adjust pool size of independent components according to runtime workload.
- Sync workload is typically frontend by Load balancer. Async workload will be frontend by scalable queues. Output can be a callout, stored in queue, or stored in scalable storage

By focusing in "scale independently", each component can plug and play much easier with other component due to less assumption is made on each other as each component can dynamically adjusted its capacity according to run-time need. This results in not only a more scalable, but also more flexible system.