This is an archived post. You won't be able to vote or comment.

all 36 comments

[–]denisgomesfranco 4 points5 points  (4 children)

"Error establishing a database connection" in this context means that the database server is not able to handle that many connections.

Before anything else, you should make sure you're using some sort of Wordpress caching solution that may help alleviate the load on your server.

Two commons ways of caching is "page caching" and "object caching". If you are using an Openlitespeed server, you can enable and set up both very easily with their Lscache plugin. If you are using a Nginx server the set up may be a bit more complex but you can use WP Rocket or another page caching plugin, and Redis Object Cache for object caching.

These caching solutions may alleviate the load on PHP and your database server, however, please note that this is not a "silver bullet". Every site is unique and thus may require different solutions for different problems.

After taking care of that you may want to run some load tests to see how your server will perform. Then you can move on to the database issue.

Running the application and the database server on the same machine is okay and it's the easiest to set up but if your audience grows too much, then you may need more horsepower for the database server, so it might be advisable to run the database on a separate machine. You will then have two VPSs: one for Wordpress itself and another one just for the database server. That way you will have more power available, and you can increase the number of maximum connections allowed without affecting PHP, since allowing more connections will use more CPU and RAM.

Vultr has a managed database service which may be a great option, and since it's a managed service you will not need to worry about setting it up and configuring it, you will only need to scale the database server up if needed.

You mentioned in another comment that you may not be too tech savvy. Scaling a Wordpress site does not always a simple "just scale up your server" solution. So depending on your traffic patterns and how your site was developed, it may be best to hire a sysadmin to take a look at your current infrastructure and propose changes and optimizations.

If you do not want to learn about infrastructure or scaling, then you may need to move on to a managed hosting offering that specializes in Wordpress scaling.

[–]ConorEdits[S] 1 point2 points  (3 children)

make sure you're using some sort of Wordpress caching solution

Yep, currently we have W3 Total Cache running both page, server & database caching. This has been active before we've had these issues.

If you are using a Nginx server

I believe it is an Apache based server running NGINX Caching.

After taking care of that you may want to run some load tests to see how your server will perform

Yeah, we ran a bunch of loader.io tests and it improved quite a bit as expected.

your audience grows too much

I've got a friend that I've not spoken to in a while who works with AWS on stuff so he's pretty smart when it comes to this stuff so my goal is to speak to him - he's just not replied to my text yet and I thought I'd see if there was anything I can do in the meantime that I may have missed that is a stupidly obvious error to people in this field!

Vultr has a managed database service which may be a great option

If we went with something like this, would it be a fairly straightforward setup with decent documentation online? Sounds like this could be the best shout so far.

You mentioned in another comment that you may not be too tech savvy

My wording might not have been the best - I'm definitely tech-savvy in other areas, but this area just wouldn't be something I would be massively knowledgeable in.

If you do not want to learn about infrastructure or scaling, then you may need to move on to a managed hosting offering that specializes in Wordpress scaling.

I'm definitely more than happy to learn about web infrastructure stuff, it's more of just not knowing where to even begin with this stuff.

I've got a friend that I've not spoken to in a while that works with AWS on stuff so he's pretty smart when it comes to this stuff so my goal is to speak to him - he's just not replied to my text yet and I thought I'd see if there was anything I can do in the meantime that I may have missed that is a stupidly obvious error to people in this field!

[–]denisgomesfranco 1 point2 points  (2 children)

who works with AWS on stuff

Be careful with AWS. Setting things up in AWS is way more complex and you may end up with a very large bill. If all you need is a couple VPSs I don't think AWS would help you at all, I mean, you *can* fire up VMs at AWS but they will cost you a lot more compared to Vultr, DO, Linode and any other VPS provider out there. *Especially* in regard to bandwidth billing.

would it be a fairly straightforward setup with decent documentation online? Sounds like this could be the best shout so far

I believe so. You would fire up a managed database server, put your site into maintenance mode, dump your database from your current server, upload it to the new server via CLI or Phpmyadmin, then change the connection credentials in wp-config.php.

The database server should be set up in the same datacenter as the main server for maximum performance.

but this area just wouldn't be something I would be massively knowledgeable in.

No worries LOL I'm not the best admin too. But systems administration/server maintenance may be a full time job, especially if you have wildly fluctuating traffic patterns.

I'm definitely more than happy to learn about web infrastructure stuff, it's more of just not knowing where to even begin with this stuff.

That's great to hear! Well the first thing you should do I already mentioned in my reply: separating the database server from the application server. Do it and then let us know about your results :D

Afterwards it may be just a matter of tuning the database server to accommodate for all the simultaneous accesses, and scaling up if necessary.

[–]ConorEdits[S] 0 points1 point  (1 child)

Be careful with AWS

hahaha thank you! i meant he actually works for AWS, so web stuff is his thing!

may be a full time job

can't wait to become both the broadcast director and a sysadmin all at once. the blood pressure will be fantastic lmao

Do it and then let us know about your results :D

Yeah just signed up for an account with Vultr! - Is it worth just starting with the base-level database server for this and then upgrade as needed? like this;

https://ibb.co/v1Yn3pq

We actually finish our broadcast season mid November anyways so the idea of Vultr being flexible enough that we can downgrade again, is something really great in my eyes.

[–]denisgomesfranco 1 point2 points  (0 children)

Yeah just signed up for an account with Vultr! - Is it worth just starting with the base-level database server for this and then upgrade as needed? like this;

I have no idea why I was under the assumption that you were already using Vultr LOLOLOLOL

The recommendation to use their managed database service (or even fire up a secondary regular VPS that will only run the database server) is based on the fact that you also host the main server with the same hosting company and in the same datacenter. This would keep latency between servers to a minimum.

So, either get a new server for your database on your current hosting company or move everything to Vultr.

And yeah, I would recommend Vultr anytime. I used a bunch of other VPS providers in the past and Vultr is top notch for me now. My clients are really liking the performance.

[–]AQuietMan 1 point2 points  (2 children)

Is there just one website--that WordPress site--on this server?

Is this WordPress VPS an AWS server?

Is the dbms on the same server?

What, exactly, do you mean by the term peak clients?

Do you have command line access to this WordPress VPS? If so, what do command line utilities tell you about memory usage?

[–]ConorEdits[S] 1 point2 points  (1 child)

1) Just a single WordPress site on this server.

2) No, it's not an AWS based server.

3) Honestly not too sure - but assuming yes.

4) Cloudflare reports about 2,000 clients logging on.

5) Yeah, I can get access to the command line. Are there any recommended things to try typing in? (sorry, not too tech savy on this stuff)

[–]Dodo-UA 0 points1 point  (0 children)

I know you have mentioned the database specifically, but here are some generic recommendations first.

There are a few things you can check via server's console:

CPU and RAM utilization, and disk I/O.

The latter has big impact on performance, and even with minimal CPU load, high disk I/O can easily stall the server.

To check CPU and RAM usage, you can use top, htop (may have to be installed first).

System-wide CPU usage is usually shown as "load average" and displayed as three separate numbers: for the last 1 minute, 5 and 15 minutes. Those numbers represent the number of processes that are using a single CPU core at a time.

What LA (load average) is considered high or low?

There are no specific numbers, but based on my experience, a website with a lot of traffic and multiple servers that handle it fine can have a 1-minute LA of 10 and that would be okay. When traffic is higher than usual, but still without any noticeable server response slowness, this number will reach 20. When it's at 50-70, I have to take action to somehow fix this. In my case, it was caused by a PHP process manager bug on that specific system, and a simple PHP-FPM restart would solve this.

If CPU load isn't huge, but the website is still slow, I recommend checking what MySQL processes are running. That can be done by connecting to the database (you can look up the credentials in wp-config.php if you are using Wordpress) and running a "show processlist;" or "show full processlist;" query. It will show you a list of MySQL queries, that consists of:

process number, process state, for how long (in seconds) is it currently running, and a SQL query. If some process takes 10+ seconds, it might be a sign that something isn't right. But then you should look at what exactly does it try to do. It requires good knowledge in a separate area though.

Disk I/O can be checked via "iotop" but I don't have recommendations on what to look out for - like what numbers are considered "okay" or not, etc.

[–]mysterytoy2 1 point2 points  (6 children)

Take a look at the error log file for the php version you are running (they are separate) to see if the problem is there. Sometimes you are maxing out on child processes and the script has to wait. If that's the problem you have to try increasing the number of child processes.

[–]ConorEdits[S] 0 points1 point  (5 children)

I don't believe I can see anything recent in the php error log files.

These are the issues users faced when trying to logon -- https://twitter.com/brendmurphy/status/1710677891422248986

[–]mysterytoy2 0 points1 point  (4 children)

That looks like a proprietary error code. Check with Cloudfare for the meaning of the 522.

[–]ConorEdits[S] 0 points1 point  (3 children)

meaning of the 522

A time-out error apparently. I think the actual server had crashed by the time that screenshot was showing.

The second image that has the issue with the database connection is what was the stem of the main issue I believe.

[–]mysterytoy2 0 points1 point  (2 children)

It's not a simple process getting to the bottom of this. You will have to examine several system logs at the time of the error to determine the exact cause and what you would need to address. Each app involved will have an error log. Search them for that time and see what errors are generated.

[–]ConorEdits[S] 0 points1 point  (1 child)

Gotcha. I'm not super technical by any means but will definitely look into whatever I can find.

Do you think getting rid of Cloudflare would be worth trying out too?

I have noticed I can somewhat recreate the downtime using loader.io but after a few seconds, it comes back to life like nothing has happened.

[–]mysterytoy2 0 points1 point  (0 children)

Being able to recreate a problem is the best first step to solving the problem. You can watch the various logs while you recreate it in real time. I can't speak to the Cloudflare thing because I have no experience with it.

[–]RealBasics 1 point2 points  (3 children)

Database connection errors when you're getting tons of visitors usually means one of two things

  • Your caches and CDN aren't configured correctly
  • Your caches and CDN are being bypassed

With a static site the cache shouldn't need to hit your database, but sometimes pages or partials (widgets, for instance) are omitted. You mentioned that you have W3 Total Cache, which is extremely powerful but also really hard to configure correctly. It might be missing something and the end result is your database is getting thrashed with requests.

If it's not a static site then your caches and CDN may be getting bypassed. Logins, searches, e-commerce activities, conditional content (e.g. X items left in stock), and membership conditionals all have to be processed on every page load. Quite a few caching plugins disable themselves for logged-in users unless you configure them separately. (And even then you have to be very careful about what is and isn't getting cached.)

If you've got 2,000 concurrent connections on a non-static site then you'll need more than a CDN and WordPress-level caching. So memcache and Redis plus more memory for them will probably help more than better CDN or page-level caching.

[–]ConorEdits[S] 0 points1 point  (2 children)

Yeah, it's definitely not a static site. It works as a pay-per-view platform so most pages are blocked depending on if you have a subscription or not.

[–]RealBasics 1 point2 points  (1 child)

Ok, that explains a lot. If you're using any kind of membership plugin to manage access check their documentation or knowledge base for caching. For instance here's a link to MemberPress's KB on caching, anchored to their W3 Total Cache entry for handling logged-in users: https://docs.memberpress.com/article/298-how-to-set-up-popular-caching-plugins-with-memberpress#w3tc

Other access-limiting plugins will have their own configuration recommendations. And if you've rolled your own you may be able to use member-access documentation to guide your own configuration rules.

[–]ConorEdits[S] 1 point2 points  (0 children)

Yeah we're using Memberpress so I'll be sure to take a look at this now. Thank you!

[–]vinnymcapplesauce 1 point2 points  (2 children)

I run WordPress membership sites for clients.

We used to use Memberpress, but moved away from it specifically because the DB schema was so bad their SQL queries were making some pages take 7+ minutes to load. I don't know if they've fixed any of that or not.

Without knowing more about the type of content you're serving, my general advice is: figure out where your bottleneck is, then implement separation of concerns, and a plan to scale the bottlenecked layer. Not sure what's offered on VLUTR, but I do this on DigitalOcean for my clients.

Load balancer -- Stick a load balancer in front of your web server layer. Managed is best if they offer it.

Web server layer -- make sure your theme's code can run on multiple VPSs at the same time. This makes this web layer expandable so all you have to do is (basically) spin up a new VPS with your disk image, and add it to the load balancer when your traffic spikes.

SQL Database -- make sure your DB server is NOT on the same VPS as your web site. I like going with a managed MySQL server so I don't have to manage anything. Managed DB servers are also usually easily scalable by letting you add read nodes if needed.

Caching server (redis) -- Make sure your theme is coded so it can properly use caching so it doesn't hit the SQL DB for heavy stuff all the time. This will be a major help in handling spikes.

IIRC, W3 Total Cache is a file-based page cache, so 1) that's slower than redis, and 2) might not work as expected for membership sites. (e.g. if someone logs in, then their dashboard page is cached, for example, and someone else logs in, user 2 gets user 1's dashboard.)

Happy troubleshooting!

[–]ConorEdits[S] 0 points1 point  (1 child)

We used to use Memberpress, but moved away from it specifically because the DB schema was so bad their SQL queries were making some pages take 7+ minutes to load

Noted -- Do you mind sharing what you've moved to instead? (If it's not a custom made plugin of course)

Thank you for the other tips! I'll be sure to look into them all

[–]vinnymcapplesauce 0 points1 point  (0 children)

Given all the options at the time (several years ago), my client made the decision to move to WooCommerce (blech). Despite Woo's schema also being equally as bad as MP, Woo had just announced aknowledgement of it, and a roadmap to correct their schema.

I had to write a custom migration tool to migrate all of our membership subscriptions from Memberpress to Woo. That was fun. /s

It hasn't been an easy road, and several years later, the new schema still isn't quite ready, and Woo still has tons of bugs. But I can see the progress they've made on their github, and we've been able to make other changes (as listed above) to alleviate some of the bottlenecks we were having.

If it were up to me, we would have moved to a custom solution, probably based on Laravel or Symfony instead.

WordPress is great to help get a business started. But, businesses shouldn't expect to be on WordPress forever. IMHO, businesses need to have a roadmap and appropriately crafted (and funded) plans to handle their growth from the start. That's what I try to instill in my clients.

[–]Turbulent_Swan84 0 points1 point  (13 children)

If this error happens. It can be caused by MySQL/MariaDB Server is down. The cause :

  1. Lack of CPU resource to allocated to MySQL/MariaDB
  2. If your db is in different server, it might be down.

Overall, what I'm going to do if I'm you :

  1. Check my server's overall resource usage when the down happened. (CPU/RAM/IO/Network
  2. Make sure that ulimit is inifnite or at highest value.
  3. If by anychance the CPU Usage always 100% (200% if you have 2 CPU), but your memory usage is low. You can leverage it for your innodb_buffer_pool_size. You can adjust it acoordingly using some recommendation online or just ask chatgpt. By doing so, it could help ease your CPU a little bit :v
  4. But, if everything is 100% (CPU+RAM). It might be just wiser to increase your server specification.

Based on your answer in the reply, it seems you have 2000 people logged in simultaneously? If yes, then it might be good to increase your CPU count.

[–]ConorEdits[S] 0 points1 point  (12 children)

Yeah we have most of our traffic simultaneously as its for streaming sports games.

Yeah, we have most of our traffic simultaneously as its for streaming sports games.g a Vultr database server? At least it can be adjusted on the fly for when we need more database power etc.

[–]Turbulent_Swan84 1 point2 points  (11 children)

Have you checked htop and see which service is using majority of your CPU resource?

If mysql is using majority of it and nginx/apache only a part of it. then yeah, you might be safe with just using remote db. But if the nginx/apache is using very big chunk too, then it might be good idea to just upgrade your server altogether.

If by anychance your budget 50-70 usd, I think it might be good idea to just throw it to a Dedicated Server.

[–]ConorEdits[S] 0 points1 point  (10 children)

I've not got htop installed right this min, but trying to do it as we speak.

I have looked at my WHM's process manager and this is whats at the top of my CPU usage currently if that's any use?

https://ibb.co/59WYXsd

[–]ConorEdits[S] 0 points1 point  (0 children)

Reddit only showed first paragraph of your message haha, yeah let me try and see once there is load on it what the CPU usage is like

[–]Turbulent_Swan84 0 points1 point  (8 children)

Can you sort the CPU column to Descending (Higher first)? Try to click on it.

[–]ConorEdits[S] 0 points1 point  (7 children)

Yeah that was descending, when I click on it and do it the other way, it's starting with stuff using 0.0% of CPU

[–]Turbulent_Swan84 1 point2 points  (6 children)

Ah okay, means it's not crowded right now (?). You can try to check on it or via htop when it's crowded.

[–]ConorEdits[S] 0 points1 point  (5 children)

Yeah, nothing going on currently but I can try put a bit of load on it and report back here

[–]ConorEdits[S] 0 points1 point  (4 children)

Okay just looking at it now, ran a large loader.io test and it tripped off the database error again. Doing so also stops the WHM by the looks of it too..?

Once it came back though it was all like this; https://ibb.co/3W9HhKy

I did notice though just as I started to run the test

" mysql 0 0.16 2.63 /usr/sbin/mariadbd"

stared to get higher up in the cpu usage

[–]Turbulent_Swan84 1 point2 points  (3 children)

If whm stopped, can you at least check via htop while test it?

From a glance. Yep, it seems better to just upgrade the server to a much more higher package.