5+1 short-term ways to avoid high-load disasters in managed database services
In retail, high network traffic is generally a good thing that means people are flocking to your site and buying your stuff. But the flip side is that times of heavy load really highlight every weakness in your system. The services become slow or unresponsive, the customers get irate, sysadmins field phone calls from execs demanding to know why their Cousin Alice couldn’t get through to buy that discounted power drill…
We’ve all been there, right? And none of us want to be there again. So here are some tips on what to do when that traffic load starts going through the roof.
As a general note, do keep an eye on your metrics. For Aiven customers, the Metrics tab in the Aiven Console keeps you informed about your system status. We do send warning e-mails about critical levels (for example, >90% disk usage, sustained high load average, and running out of memory), but if you know a peak is coming it’s simply good practice to monitor them yourself.
The single easiest way to handle additional traffic is to upgrade your service plan. The Aiven Console even has a button for it, look:
What that Aiven service upgrade does is save you a lot of fiddling and grief. You get new nodes with additional CPUs, RAM, and storage; the latest data then gets streamed across and the system performs a controlled failover to the new nodes. And that’s it.
For small services, this takes minutes, for larger ones it might take some hours. The catch is that you really should upgrade before your load peaks, so that that failover is in fact controlled and modifications to data don’t slow down the process.
Let’s have a look at what you can do if your services are about to get steamrollered.
1. Take some downtime
No, this doesn’t mean escape to the beach or hide in a closet.
If you’ve left upgrading your plan too late, or if even the new plan isn’t enough to handle the traffic, you can bite the bullet and make modifications. Cut writes from your application code to reduce load on the existing nodes to help the node replacement complete faster. To do this, you need to take your services offline for a while, so it’s obviously not ideal.
2. Make more disk space available
When the disk is getting full, the first thing to go are backups because left to its own devices, your system will just keep writing in new data. As far as Aiven services are concerned, many of them will stop accepting writes when the disk is almost full precisely to make sure that there’s always enough disk space to take backups.
You can always delete some data to make space for new writes, but that’s rather like putting a band-aid on a burn — sure it helps, but you still need to get it seen to. We highly recommend upgrading your plan so that you don’t lose data at peak loads.
3. Add processing power
Make sure that you have enough CPU power to handle the increased load. Keep monitoring your load average and make sure that it stays below the number of CPUs available on your service plan.
If the load average gets too high, you can try and track down the cause from your monitoring software and resolve it. But if time is of the essence, you can temporarily add more CPUs by upgrading your plan.
4. Create more database connections
As your application scales horizontally to accommodate increased traffic, it will usually also consume more database connections. Check the service specific documentation for the connection limits that apply to you.
If you’re running Postgres, you can use transaction or statement level connection pooling to accommodate more connections. For everyone else, though, the easiest way out is, again, to upgrade your plan to set up more connections.
5. Speed up queries
If you are using Postgres or MySQL, inefficient queries can consume CPU time and memory unnecessarily. This time, you’ll be surprised to learn that although upgrading your plan might help, it’s often not enough to solve the issue.
One good way to speed up your queries is to create new indexes. First, go find the longest running or most frequent queries; in Aiven Console this is easy to do by sorting the columns in the Query Statistics table. Then you can run EXPLAIN queries to see if any new indexes need to be created.
If you’re an Aiven for Postgres user on a Business or Premium plan, you have a HA setup with standby nodes. You can run read-only queries directly on these standby nodes to reduce the impact of slow queries on the primary node.
+1: Get a better support contract
All Aiven customers have access to Basic support using the chat widget in the Aiven Console and the help pages. However, when Black Friday rolls around, it’s a good time to re-evaluate your support contract. You want to make sure that if things do go haywire, you won’t be alone in front of your monitor trying to solve impossible problems.
To get started on upgrading your support, contact your account manager or send an e-mail to firstname.lastname@example.org and we’ll take it from there.
Not using Aiven services yet? Sign up now for your free trial at https://console.aiven.io/signup!
In the meantime, make sure you follow our changelog and blog RSS feeds or our LinkedIn and Twitter accounts to stay up-to-date with product and feature-related news.
Originally published at https://aiven.io.