How the overall performance was plummet by half by upgrading to OpenSSL 3.0

You read the title, you know the cause. But me here was investigating the issue for 9 days. Like the chart says.

It happened when I upgraded Debian, the world's most stable Linux distribution, from Bullseye to Bookworm. As always, 5 different things broke just after the reboot, and were quickly fixed. But when the bot started, it ate all the CPU and showed only ½ of the performance. Feed refresh interval doubled from 6 minutes to 12. Welcome to disaster.

During the investigation, I learnt a couple of things.

  1. It's a bad idea to profile a code by counting CPU time inside a virtual machine. And the bot was running exactly inside ½ of the real server (provided by AlpineDC). Moreover, it's even worse idea to profile a code inside a busy VPS.
  2. It's damn hard to find the cause of performance degratation of this bot. Profiler always says it's something about XML parsing. That's because the worse the performance, the longer feed refresh interval, the higher chance for a feed to get new posts, the less chance of receiving 304 Not Modified and skip all the heavy work.

Later I was just plugging old Debian repositories, pulling old packages one by one, rebuilding the application and running it again. The performance was restored at the moment of downgrading libcurl4-openssl-dev.

I made a test case: got 100 of different feeds via HTTP and did nothing with them. The new version libcurl4-openssl-dev turned out to be 3 times slower.

Finally, using linux-perf, I figured out the root of problem being not in libcurl, but in the underlying OpenSSL. With the new Bookworm release, Debian maintainers inluded OpenSSL 3.0.9 instead of 1.1.1.

OpenSSL 3.0 is slow as fuck. I was genuinely shocked. I wrote the app in one of world's slowest programming languages (Ruby), used a bloated ORM (damn you ActiveRecord), was parsing 50 megabits of compressed XMLs per second, and killed the performance by upgrading an underlying widely used low-level library.

I still know nothing about what happened with OpenSSL. I'm just a script kid. Apparently all the internet is crying about its poor performance for a couple of years already. The only thing OpenSSL developers answers is a recommendation to do less handshakes.

Doing less handshakes is just a "perfect" idea for this bot. Right now the bot has 84,004 of feeds from 19,166 of domain names. And thanks to IP protocol, theoretically I could keep only 65,535 outgoing connections.

They probably think it's a good solution to do less handshakes because that's what http2 and http3 for.

Frankly, I hate using https everywhere by everybody (including me). I hate http2 for being a stupid implementation of TCP over TCP, and hate http3 for being just a little better solution. But that would be another story.

And if you look at the first chart, you'll see a little rise of performance at the end. That's because I switched from openssl to nss.

And just a day before that, nss support was removed from libcurl. No luck for me.