Peak season readiness proposal

1. Scaling Out CloudSQL with Read Replicas

As part of our strategy to handle database load during super-peak periods, scaling out our CloudSQL instances by leveraging read replicas will be an ideal choice. Here’s why:

Why Read Replica?

Separation of Read and Write Workloads: In a typical CloudSQL setup, write operations are performed on the primary instance, while read replicas handle the read queries. By offloading the read traffic to replica, the primary instance is freed up to focus on write-heavy operations. This balance optimises performance.
Horizontal Scalability: By adding read replica, we are horizontally scaling out database to accommodate increased read traffic, which is often the majority of traffic during peak times. Replica can serve multiple clients, allowing us to handle a much larger number of simultaneous connections.
Fault Tolerance and Availability: Read replica add redundancy. In the event of a failure in the primary instance, we can quickly promote replica to a primary role, ensuring minimal downtime.
Improved Query Performance: Offloading read queries enhances the response time and overall query throughput, providing a seamless experience to the end users.
Elasticity in Scaling: As traffic grows, additional replicas can be added to handle the excess load. After the peak period, unused replicas can be easily decommissioned to save costs.

Implementation Plan for Read Replicas

Identify the read-heavy queries and configure the application to route those queries to read replicas.
Provision read replica and continue monitoring to track replica performance.
Adjust the number of replicas based on traffic patterns.
Ensure that replicas are correctly synchronized with the primary database and that failover policies are in place.

2. Autoscaling for SideKiqs

In addition to scaling out the database, we must address the scalability of the microservices that interact with the database and handle user traffic. GoBolt currently have several microservices (SideKiqs) deployed with single pods, which limits its ability to handle surges in traffic.

Why Use Autoscaling?

Dynamic Resource Allocation: Autoscaling allows us to automatically increase or decrease the number of running pods based on current traffic and resource consumption (CPU, memory). This ensures that we always have the right amount of compute resources to serve incoming requests.
Preventing Overloading of Individual Pods: Without autoscaling, a single pod might become a bottleneck, causing slow response times or even service crashes. Autoscaling distributes the load evenly across multiple instances of the service.