Redis and the secret of fluctuating reads
I was sitting in my office, sipping bourbon and cleaning my trusty keyboard, when the phone rang. It was a customer, a recent one, and they had a problem.
The customer’s read rates were fluctuating oddly. They were expecting a quick turnaround on the job since, they said, the culprit was obvious: it was Redis, unable to handle their loads. They were testing a premium plan with Aiven for Redis, and now I was getting an earful about how their expectations weren’t being met. They wanted a powerful database, they said, but were getting offered a third-rate flutterbox of an embarrassment.
I shelved my witty comebacks for now. Instead, I grabbed my trench coat and fedora (the air conditioning in my office is always an issue), plugged my keyboard in again, and connected to their cloud.
After poking around for a few hours, I was mystified, much more so than the customer. Redis is a very steady player. It handles high loads easily, and the customer’s expansion to a premium plan (with six servers, consisting of 1 primary and 5 replicas each) should have been more than enough. Then if worse came to worst, I should be able to see a sharp drop-off point when the database buckled under an increasing load.
Instead, the graph drew a mountain range, with reads going up and coming down and then going up again. I’d never seen Redis do that.
I called the customer back and laid into them. I wanted to know what they knew. What were they doing to poor Redis? The customer wasn’t happy with my grilling, but I did find out that they were using a custom testing suite. Well I’ll be darned, I thought.
I returned to my poking, and sure enough, now that I knew where to stick my nose, I got the scent. The customer tests were not going as planned and I needed to right this ship.
Rather than using a customer-specific testing harness, I turned to the industry standard memtier_benchmark for an unbiased opinion. I tested three Aiven plans: Business (primary + replica), Premium (primary + 2 replicas) and Custom (primary + 5 replicas). I needed to make sure that the WRITE rate to the primary node was consistent and that the READ rates across the replicas were increasingly faster across the plans as replicas increased.
Things started out quiet. The WRITE rate was consistent across the three services plans when attempting to write 10M entries, with a payload size of 2500 bytes. As the service plans increase in replica count, the expected write rate should decrease as additional IOPS serve to replicate data. Here’s a script that writes data to Redis and benchmarks it:
sudo docker run --rm redislabs/memtier_benchmark -s <HOST> \
-p <PORT> -a <PASSWORD> --hide-histogram --key-maximum=10000000 \
-n allkeys -d 2500 --key-pattern=P:P --ratio=1:0
With the WRITE operations done, I turned to benchmarking the READ rates. I did it consistently, of course, with READS done against the PRIMARY nodes, to ensure nothing fishy was going on between the three Aiven service plans. As expected, the READS fell in a narrow band across all plan types.
sudo docker run --rm redislabs/memtier_benchmark \
-s <PRIMARY / REPLICA HOST> -p <PORT> -a <PASSWORD> \
--hide-histogram \ -t 30 -c 10 --pipeline=1 --ratio=0:1 \
Lastly I came to test the READ rate directly against the replica nodes.
So far so good. The Business plan (blue line) read performance is matching that of the primary node as expected. The Premium plan with 2 replicas (red line) was almost double the throughput (not exactly double, but that was to be expected), so looking good!
Then things took a sideways turn as the line for the custom 6 node plan (green line) took a trip to the south. My jaw fell. What was going on? How was I getting only marginally better performance with lots more nodes in the game?
Just as I was ready to throw in the towel, I remembered where this quest started. The clue was in the benchmark test itself. Just as the customer was showing abysmal metrics with their homegrown solution, mine was also struggling but for a different reason; NOT ENOUGH LOAD!
Of course! I couldn’t expect the same test, with the same connections, to automatically generate more operations per second if the test itself was bottlenecked. Just because I had a bigger bucket I couldn’t expect the same garden hose to fill it at the same rate, I had to increase the amount of water to fill the bucket! Duh!
With this eureka in hand I fired up a second ‘memtier_benchmark’ on a separate VM and started filling the pail — with two hoses.
That was more like it!
So the tests were failing not because of Redis, but because of how the tests themselves were set up and executed. I sighed with contentment and drained my glass.
I was now faced with a problem of my own: how to let the customer know they were causing their own problems? I opened a second bottle of bourbon to help me think.
The next morning, I had a hangover but no better answers, so I just dialled the customer and plainly described what I’d found. Yep, they sure weren’t happy, but I got the message across. What’s more, they were so happy with my work, as well as Redis, after the testing issues were fixed, that they finalised their upgrade to the premium plan with no qualms.
Once again, I’d proved that Redis is a solid, dependable solution that simply works when you plug it in, and that Aiven gives better support than any other cloud company in the business. Make of that what you will.
Here’s a PSA: Trust your gut, walk the walk, and you may find your answer in an unexpected place. And lay off the bourbon if you want a clear head the next morning…
And if you had no idea what this article was about, check this out: An introduction to Redis.
If you’re not using Aiven services yet, sign up for a free trial at https://console.aiven.io/signup. Otherwise, you might just face… issues.
Originally published at https://aiven.io.