-
Notifications
You must be signed in to change notification settings - Fork 230
Description
Describe the bug
Please help setup PgCat correctly. During VM backup (VMware) of database server to which PgCat host has active connections, a snapshot is created and deleted after the backup is complete. This is a very I/O intensive operation and as a result, the database may respond more slowly. Sometimes, right at the time of deleting the snapshot, the following errors occur in the PgCat log (for greater clarity, I removed the parameters listed in {} ):
Terminating server Address because of: SocketError("Error flushing socket - Error: Os { code: 110, kind: TimedOut, message: "Connection timed out" }")
Failed health check on instance Address error: SocketError("Error flushing socket - Error: Os { code: 110, kind: TimedOut, message: "Connection timed out" }")
Server Address marked bad, reason: failed health check
Server connection terminated Address
Could not get connection from pool error: "AllServersDown"
I managed to fix this scenario of consecutive errors by increasing the tcp_user_timeout parameter (which has a default value of 10s).
Unfortunately, I still have a problem with another (slightly different) error scenario:
Health check timeout on instance Address error: Elapsed(())
Server Address marked bad, reason: failed health check
Server connection terminated Address
Could not get connection from pool error: "AllServersDown"
So far, no parameter adjustments have helped me reliably. I have tried increasing connect_timeout, healthcheck_timeout, healthcheck delay in particular.
Please help!! Thank you!
To Reproduce
Steps to reproduce the behavior:
Run VM backup of PostgreSQL host with active connections with PgCat host
Expected behavior
No errors. No terminated connections.
Additional context
OS: AlmaLinux 9
PgCat version: v1.2.0
Selected config parameters
pool_size = 160 (definitely a sufficient size)
min_pool_size = 2
connect_timeout = 60000
healthcheck_timeout = 60000
tcp_user_timeout = 60000