Amazon Aurora : How they improved performance of Mysql & Postgresql for the cloud ?
Updated: Sep 16
Ever wondered what technical problem Aurora solves ? Is it just Mysql or Postgresql as a service ? Don'nt we already have RDS ? If these questions bother you, then read on. This blog goes under the hood to examine what Aurora is or is not and whether you should consider using it or not. The discussion is based on this Amazon Aurora paper that is listed below as a reference.
In the very early day of cloud services, when you deployed a service to the cloud that needed a database, you also had to provision the database in the cloud yourself. This meant spinning up another VM, downloading the mysql binary, installing and configuring it. Not only was this time consuming, it was also not resilient or scalable. If you wanted to add replicas or schedule backups, that was additional work.
Then came database as a service like RDS. RDS made life easier for many. The service can do provisioning , replication, backups etc for you for a price. The developer could focus on application development. Developer did not have to worry about the database VM going down or restarting.
3.0 The problem
Based on customer experiences with RDS, AWS observed the following:
In cloud environments, failures happen all the time. Disks, network components and various other things fail. And one failure is followed by other failures. It is important to recover from failures quickly.
Failures can addressed with redundancy and replication. When replication is used either for resiliency or scalability, It adds a lot of traffic leading to network congestion. Then network becomes bottleneck hurting throughput.
In Aurora there are a few innovations that address the above issues. Aurora is the next stage of improvement. Before we get into details, let us quickly see how it is used.
4.0 Amazon Aurora relation with RDS
Aurora shares responsibility with Amazon RDS. RDS does the hosting. Query tuning responsibility of the application developer. You choose Aurora Mysql or Aurora Postgresql as the dbengine when picking a dbengine in RDS.
5.0 Traditional database in the cloud
For each write that a client initiates , the DB engine does several writes to disk such as 1. write ahead log 2. B+tree page and/or data page 3. bin log
4. Written to EBS and then copied to EBS mirror ( cloud specific)
Then it might be replicated to 1 or more nodes, where it is all repeated. This generates too much network traffic.
Earlier it was thought that a cluster with 1 master and 2 replicas in different AZs provided enough resiliency. That would be 3 servers with quorum of 2/3. However it was observed that when failure occurs they propagate at a fast pace and can affect multiple AZs. Hence this is considered not adequate.
6.0 Aurora improvements
Aurora attempts to solve 2 problems:
1. Resilience in cloud environments
2. Reduce network bottleneck ( caused by 1) and increase throughput.
It does this by:
1. Separating storage from compute
2. Reducing disk i/o by relying mostly on redo logs
Since quorum 2/3 was observed to be in adequate, Aurora decided to go with 6 replicas with a write quorum 4/6 and read quorum 3/6. 6 replicas of data, 2 in each AZ. Now , you can lose an entire AZ + 1 additional node without losing data. You can lose an entire AZ without losing the ability to write.
To reduce mean time to repair:
Database (more accurately the redo log) is partitioned into segments of size 10GB
6 replicas of each segment (called protection group)
2 replicas in each AZ
A concatenated group of such segments forms a storage volume provisioned as host and SSD. 64TB limit. Maximum database size is 64 TB per cluster.
A segment is the unit of failure and repair. 2 copies 1 AZ and 1 copy in a 2nd AZ would need to fail for the segment to become unavailable. But a 10GB segment can be repaired very fast ( say by restoring from a backup) . Compare that with repairing a 1 TB dataset.
Above model gives resiliency but unfortunately creates too much network traffic.
6.2 Reduce i/o and network traffic
In Aurora , only the redo log is written to disk. Pages are not written to disk on eviction. Only the log is replicated. pages are generated on demand from the log.
"The log is the database". Any B+Tree page generated is simply a cache of the log.
Some technical details about the log (can be skipped if not interested):
Each log record has a log sequence number that is increasing LSN.
VCL or volume complete LSN is highest LSN that is replicated with quorum.
During recovery LSNs > VCL are truncated.
Database can declare consistency point LSNs called CPL.
VDL volume durable LSN - highest CPL that is lower than VCL.
Database can truncate logs greater than VDL.
Final log record of a mini transaction is a CPL
SCL - segment complete LSN -- highest LSN below which all log records are received.
The pages in memory from which reads are served and continuously updated in the background from the redo log. If a page that needs to be read is not in memory, some page needs to be evicted and the required page needs to constructed from redo log and read it. The aurora paper does not discuss the actual algorithm for redo apply.
But for discussions sake we can guess that some techniques used to efficient apply the log might include aggregating changes in the background, truncating overwritten changes, using multiple threads to read the log. Even in traditional database storage, the rows of a tables are not necessarily contiguous on disk. For each row, the databases uses indexes that has pointers to locations on disk. In traditional database, the redo log is applied only on recovery or startup. Once applied it is converted and stored as b+Trees and data pages. The aurora paper says that redo log is continuously applied and b+tree pages generated on the fly. My guess is that since they scan the log continuously, they must be maintaining pointers to rows to generate pages not in memory quickly.
What is important to understand is the only updates to the redo log are sent over the network. One the redo log is there, the compute and or/storage can do what ever pre processing necessary to generate pages quickly on demand.
By increasing resiliency and throughput for Mysql and Postgresql, Amazon Aurora has brought these databases back into the cloud database game. Enterprises that have been comfortable with Mysql or Postgresql no longer need to feel pressured to move to a NoSQL database when moving to the cloud. They can stay with them in the cloud with out worrying about resiliency. The 64 TB limit is more than sufficient for most enterprises. However be aware that this is still Mysql or Postgresql and there is no server side sharding. When sharding is necessary either for localization of data or to scale beyond the 64 TB limit, you would need do it manually by setting up additional clusters.