Skip to content

Commit b4cc982

Browse files
authored
fix: ensure embedded-postgres state is wiped between retries (#20809)
Retries were previously added when starting embedded postgres to mitigate port allocation conflicts (we can't use an ephemeral port for tests). Retries alone seemingly did not fix the test flakes. A new failure mode appeared on the retries: timing out connecting to the database. When a port discovery error occurrs, embedded-postgres does not create the database. If the data directory exists on the next attempt, embedded-postgres will assume the database has already been created. This seems to cause the timeout error. Wipe all state between retries to ensure attempts execute the same logic that creates the database. [#658](coder/internal#658)
1 parent a61b8bc commit b4cc982

File tree

1 file changed

+18
-11
lines changed

1 file changed

+18
-11
lines changed

cli/server.go

Lines changed: 18 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2143,21 +2143,33 @@ func startBuiltinPostgres(ctx context.Context, cfg config.Root, logger slog.Logg
21432143
}
21442144
stdlibLogger := slog.Stdlib(ctx, logger.Named("postgres"), slog.LevelDebug)
21452145

2146-
// If the port is not defined, an available port will be found dynamically.
2146+
// If the port is not defined, an available port will be found dynamically. This has
2147+
// implications in CI because here is no way to tell Postgres to use an ephemeral
2148+
// port, so to avoid flaky tests in CI we need to retry EmbeddedPostgres.Start in
2149+
// case of a race condition where the port we quickly listen on and close in
2150+
// embeddedPostgresURL() is not free by the time the embedded postgres starts up.
2151+
// The maximum retry attempts _should_ cover most cases where port conflicts occur
2152+
// in CI and cause flaky tests.
21472153
maxAttempts := 1
21482154
_, err = cfg.PostgresPort().Read()
2155+
// Important: if retryPortDiscovery is changed to not include testing.Testing(),
2156+
// the retry logic below also needs to be updated to ensure we don't delete an
2157+
// existing database
21492158
retryPortDiscovery := errors.Is(err, os.ErrNotExist) && testing.Testing()
21502159
if retryPortDiscovery {
2151-
// There is no way to tell Postgres to use an ephemeral port, so in order to avoid
2152-
// flaky tests in CI we need to retry EmbeddedPostgres.Start in case of a race
2153-
// condition where the port we quickly listen on and close in embeddedPostgresURL()
2154-
// is not free by the time the embedded postgres starts up. This maximum_should
2155-
// cover most cases where port conflicts occur in CI and cause flaky tests.
21562160
maxAttempts = 3
21572161
}
21582162

21592163
var startErr error
21602164
for attempt := 0; attempt < maxAttempts; attempt++ {
2165+
if retryPortDiscovery && attempt > 0 {
2166+
// Clean up the data and runtime directories and the port file from the
2167+
// previous failed attempt to ensure a clean slate for the next attempt.
2168+
_ = os.RemoveAll(filepath.Join(cfg.PostgresPath(), "data"))
2169+
_ = os.RemoveAll(filepath.Join(cfg.PostgresPath(), "runtime"))
2170+
_ = cfg.PostgresPort().Delete()
2171+
}
2172+
21612173
// Ensure a password and port have been generated.
21622174
connectionURL, err := embeddedPostgresURL(cfg)
21632175
if err != nil {
@@ -2204,11 +2216,6 @@ func startBuiltinPostgres(ctx context.Context, cfg config.Root, logger slog.Logg
22042216
slog.F("port", pgPort),
22052217
slog.Error(startErr),
22062218
)
2207-
2208-
if retryPortDiscovery {
2209-
// Since a retry is needed, we wipe the port stored here at the beginning of the loop.
2210-
_ = cfg.PostgresPort().Delete()
2211-
}
22122219
}
22132220

22142221
return "", nil, xerrors.Errorf("failed to start built-in PostgreSQL after %d attempts. "+

0 commit comments

Comments
 (0)