There are at least two primary motivations for scaling a monolithic web application. The most obvious is for better performance under load as the number of concurrent users increases. Another important motivation is to allow for scaled development and maintenance, thus increasing agility and decreasing time between releases. I think it is important to consider both when evaluating strategies for scaling.
Scaling for Performance Under Load
Scalability and performance, though related, are not the same thing. An application could be lightning fast for a single user, but have response time increase exponentially as the number of concurrent users increases. The key to scalability is to ensure performance does not degrade under load, even if that means making some trade-offs in single-user performance (that can often be mitigated) and in increased complexity. Before considering any strategies for increasing performance under load, it is important to understand where the performance bottlenecks actually are using load testing. In my experience, they are not always obvious without careful analysis and can often be in unexpected places. At the lowest level, an application’s performance can be bound by CPU or IO on the server, by network transfer speeds and latency, and by the client’s CPU and even disk IO in some cases. Knowledge of these low-level bottlenecks can then be used to find pinch-points within specific technologies in the stack, at different application layers, and in the overall architecture.
Scaling for Development and Maintenance
As Fred Brooks observed in The Mythical Man-Month, adding more people to a software project does not always speed development and can often actually slow things down. Brooks also noted that that, over time, as a system evolves, it becomes less ordered and will eventually have to be redesigned. Since that book was published, a number of strategies for scaling development have emerged that address these and other issues. These strategies often involve breaking a system into components that can be developed and maintained independently, thus allowing more people to be added in the form of independent teams. The key to these approaches is to have well-defined, well-documented, versioned interfaces.
Strategies for Scaling
Below are three strategies that address scaling for performance under load, scaling for development and maintenance, or both, along with some advantages and disadvantages of each method. These are not mutually exclusive and can be used conjointly to further increase scalability. They are ordered by simplicity of implementation.
One of the simplest ways to increase performance under load is to introduce in-memory data stores for the caching of frequently accessed data. While this can be implemented within the model layer of the application itself, there exist a number of robust off-the-shelf technologies (e.g. memcached and Redis) that can be integrated with minimal effort. This is because these technologies provide a simple API, libraries for popular languages, and integration with popular ORMs. These caching solutions generally use a client-server model and thus can be distributed to allow for greater scalability.
One of the key advantages to this strategy is that it generally doesn’t require any redesign of the application and can often be implemented with little of no code in modern frameworks. This makes it pretty much a no-brainer when performance is degrading due to contention for access to data storage resources. These solutions can also cache data in a very denormalized form, eliminating the slowdown that occurs when retrieving highly normalized data, but still maintaining the benefits of reducing redundancy in the authoritative data store. Finally, the distributed nature of these caching technologies lends itself well to load balancing over multiple application instances since the cache can be easily replicated.
The main disadvantage to using a cache is that it has to be updated whenever data changes, so there is a potential for stale data. Unlike a centralized database, caches are not locked while writes to the authoritative data store are occurring and may not be refreshed immediately. Caching also adds some complexity in that an additional service has to be configured and maintained. Additionally, use of a cache incurs some overhead that could mean that, in certain situations, data is returned slower than would be with a direct retrieval from the authoritative data store and writes may be slower if the cache is not updated asynchronously. This leads to the next strategy for increasing scalability.
2. Asynchronous Architecture
Another way to increase scalability is to make some operations asynchronous so that they do not affect response times and can be completed as resources become available. This is particularly effective when slow or resource-intensive operations can be performed asynchronously, with email notifications and other asynchronous communications being good examples operations that do not need to complete immediately in order to return a response to the user. Message queues such as RabbitMQ or SQS can be used to implement this. To request an asynchronous operation, a message is published containing either the data to be processed or identifying information that can be used by a message consumer to retrieve and process the data. These systems generally support advanced routing configurations and many concurrent consumers.
The main advantage to this approach is that slow operations are removed from the request-response cycle. It also allows for consumers to be distributed so they are not using valuable resources on user-facing machines. While this approach may require more effort that caching, it can be relatively straightforward to implement if the application was designed with proper separation of concerns. For the use-cases where they fit, asynchronous processing can offer significant gains in scalability. This strategy can also allow for scaled development to some extent. As message consumers run independently of the main application, they can be developed independently and don’t even have to be written in the same language.
A disadvantage to asynchronous architecture is that, for the operations that are made asynchronous, there is no immediate feedback to the user when they fail. Most queuing systems do not remove messages from the queue until the consumer indicates success, but this does not guarantee all messages will be processable. Consumers have to rely on asynchronous notifications, such as email, which could also fail without the user being made aware. Introducing message queues and moving operations in message consumers also increases the complexity of an application and adds an additional dependency that must be configured and maintained.
3. Microservices Architecture
A third strategy for increasing scalability is to abandon the monolithic architecture and decompose the application into small, independent services that can be distributed over hardware, networks, and even geography. This follows the Unix philosophy of, “do one thing and do it well.” The main application would make calls into these various services and aggregate the data into a response. This may even all happen client-side in the user’s browser with no central back-end component. This approach requires the most amount of effort among the three listed here, but can potentially have the most benefit in terms of scalability.
The advantages to this strategy are many and allow for both scalability in number of concurrent users and in development. Each piece is, in a sense, its own application and can be developed, tested, tuned, released, and maintained independently of the other pieces. Each service can have multiple instances running for high availability and the systems they run on can be customized to meet their particular demands. This kind of functional decomposition also increases fault tolerance as a failure of one service does not necessarily affect the rest of the application. Furthermore, like message consumers, the independent nature of the services means they can be implemented in whatever language or technology stack best suits their requirements. They can be developed by independent teams that have to only agree on the API contract, thus allowing faster rates of change and implementation of new features.
The biggest disadvantage to a microservices architecture is that it can add a great deal of complexity to an application. Microservices also make for a complex production environment and require additional overhead to deploy and manage because the independent nature of the services means they can have heterogeneous dependencies, often do not follow the same release cycle, and are usually spread among independent containers. When each service uses its own database instance, which is usually necessary to maximize scalability, managing transactions can be challenging and additional complexity is need to ensure consistency. An event-driven architecture, where each service publishes an event when their data changes that triggers a refresh of other services, works well to ensure eventual consistency, but also adds additional complexity.
In summary, the most scalable applications will make use of all of these and other strategies integrated together. Fortunately, many excellent technologies exist that can be leveraged to implement these strategies rather than having to build it all from scratch. As more and more organizations get into the business of developing web applications, the demand for high quality tools for scalability continues to increase and drives the continued improvement of these products.