Skip to content

Commit 84b7a03

Browse files
authored
chore: abstract pg test logic and double runner sizes (#21091)
This PR does two things, both in service of helping to (hopefully!) speed up CI: 1. abstracts the parallelism logic into a common action and has all PG-related jobs use it 2. doubles runner sizes from [8->16 CPUs & 32->64GiB RAM](https://depot.dev/docs/github-actions/runner-types)* and concomitantly increases parallelism I only focused on the PG-related jobs since they are generally slowest & most RAM-intensive. [<img width="2011" height="460" alt="image" src="/api/flow.js?q=https%3A%2F%2Fgithub.com%2Fcoder%2Fcoder%2Fcommit%2F%253Ca%2520href%3D"https://github.com/user-attachments/assets/c38ec3fc-dd93-4753-8df3-f22dfe54a3f7">https://github.com/user-attachments/assets/c38ec3fc-dd93-4753-8df3-f22dfe54a3f7" />](https://app.datadoghq.com/ci/pipelines/health?query=%40git.repository.id_v2%3Agithub.com%2Fcoder%2Fcoder%20-%40git.is_default_branch%3Atrue&fromUser=true&group=ci-cost&sort=-billableTime&sp=%5B%7B%22p%22%3A%7B%22fingerprint%22%3A%22pVB6pq7htXrn%22%2C%22env%22%3A%22none%22%7D%2C%22i%22%3A%22ci-health-pipeline-panel%22%7D%5D&start=1762245172958&end=1764837172958&paused=false) _* `test-go-race-pg` doubles from 16->32 CPUs & 64->128GiB RAM and likewise for the Windows runners; MacOS runners have [only one size](https://depot.dev/docs/github-actions/runner-types#macos-runners)_ _**NOTE:** don't use the speed of the PG-related jobs in this PR's CI run as indicative. Tests run outside `main` may use cache, so the speed may seem artificially low._ --------- Signed-off-by: Danny Kopping <danny@coder.com>
1 parent 8ed1c1d commit 84b7a03

File tree

3 files changed

+224
-148
lines changed

3 files changed

+224
-148
lines changed
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
name: "Test Go with PostgreSQL"
2+
description: "Run Go tests with PostgreSQL database"
3+
4+
inputs:
5+
postgres-version:
6+
description: "PostgreSQL version to use"
7+
required: false
8+
default: "13"
9+
test-parallelism-packages:
10+
description: "Number of packages to test in parallel (-p flag)"
11+
required: false
12+
default: "8"
13+
test-parallelism-tests:
14+
description: "Number of tests to run in parallel within each package (-parallel flag)"
15+
required: false
16+
default: "8"
17+
race-detection:
18+
description: "Enable race detection"
19+
required: false
20+
default: "false"
21+
test-count:
22+
description: "Number of times to run each test (empty for cached results)"
23+
required: false
24+
default: ""
25+
test-packages:
26+
description: "Packages to test (default: ./...)"
27+
required: false
28+
default: "./..."
29+
embedded-pg-path:
30+
description: "Path for embedded postgres data (Windows/macOS only)"
31+
required: false
32+
default: ""
33+
embedded-pg-cache:
34+
description: "Path for embedded postgres cache (Windows/macOS only)"
35+
required: false
36+
default: ""
37+
38+
runs:
39+
using: "composite"
40+
steps:
41+
- name: Start PostgreSQL Docker container (Linux)
42+
if: runner.os == 'Linux'
43+
shell: bash
44+
env:
45+
POSTGRES_VERSION: ${{ inputs.postgres-version }}
46+
run: make test-postgres-docker
47+
48+
- name: Setup Embedded Postgres (Windows/macOS)
49+
if: runner.os != 'Linux'
50+
shell: bash
51+
env:
52+
POSTGRES_VERSION: ${{ inputs.postgres-version }}
53+
EMBEDDED_PG_PATH: ${{ inputs.embedded-pg-path }}
54+
EMBEDDED_PG_CACHE_DIR: ${{ inputs.embedded-pg-cache }}
55+
run: |
56+
go run scripts/embedded-pg/main.go -path "${EMBEDDED_PG_PATH}" -cache "${EMBEDDED_PG_CACHE_DIR}"
57+
58+
- name: Run tests
59+
shell: bash
60+
env:
61+
TEST_NUM_PARALLEL_PACKAGES: ${{ inputs.test-parallelism-packages }}
62+
TEST_NUM_PARALLEL_TESTS: ${{ inputs.test-parallelism-tests }}
63+
TEST_COUNT: ${{ inputs.test-count }}
64+
TEST_PACKAGES: ${{ inputs.test-packages }}
65+
RACE_DETECTION: ${{ inputs.race-detection }}
66+
TS_DEBUG_DISCO: "true"
67+
LC_CTYPE: "en_US.UTF-8"
68+
LC_ALL: "en_US.UTF-8"
69+
run: |
70+
set -euo pipefail
71+
72+
if [[ ${RACE_DETECTION} == true ]]; then
73+
gotestsum --junitfile="gotests.xml" --packages="${TEST_PACKAGES}" -- \
74+
-race \
75+
-parallel "${TEST_NUM_PARALLEL_TESTS}" \
76+
-p "${TEST_NUM_PARALLEL_PACKAGES}"
77+
else
78+
make test
79+
fi

.github/workflows/ci.yaml

Lines changed: 108 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -327,7 +327,7 @@ jobs:
327327
test-go-pg:
328328
# make sure to adjust NUM_PARALLEL_PACKAGES and NUM_PARALLEL_TESTS below
329329
# when changing runner sizes
330-
runs-on: ${{ matrix.os == 'ubuntu-latest' && github.repository_owner == 'coder' && 'depot-ubuntu-22.04-8' || matrix.os && matrix.os == 'macos-latest' && github.repository_owner == 'coder' && 'depot-macos-latest' || matrix.os == 'windows-2022' && github.repository_owner == 'coder' && 'depot-windows-2022-16' || matrix.os }}
330+
runs-on: ${{ matrix.os == 'ubuntu-latest' && github.repository_owner == 'coder' && 'depot-ubuntu-22.04-16' || matrix.os && matrix.os == 'macos-latest' && github.repository_owner == 'coder' && 'depot-macos-latest' || matrix.os == 'windows-2022' && github.repository_owner == 'coder' && 'depot-windows-2022-32' || matrix.os }}
331331
needs: changes
332332
if: needs.changes.outputs.go == 'true' || needs.changes.outputs.ci == 'true' || github.ref == 'refs/heads/main'
333333
# This timeout must be greater than the timeout set by `go test` in
@@ -336,6 +336,7 @@ jobs:
336336
# even if some of the preceding steps are slow.
337337
timeout-minutes: 25
338338
strategy:
339+
fail-fast: false
339340
matrix:
340341
os:
341342
- ubuntu-latest
@@ -416,82 +417,83 @@ jobs:
416417
find . -type f ! -path ./.git/\*\* | mtimehash
417418
find . -type d ! -path ./.git/\*\* -exec touch -t 200601010000 {} +
418419
419-
- name: Test with PostgreSQL Database
420-
env:
421-
POSTGRES_VERSION: "13"
422-
TS_DEBUG_DISCO: "true"
423-
LC_CTYPE: "en_US.UTF-8"
424-
LC_ALL: "en_US.UTF-8"
420+
- name: Normalize Terraform Path for Caching
425421
shell: bash
422+
# Terraform gets installed in a random directory, so we need to normalize
423+
# the path or many cached tests will be invalidated.
426424
run: |
427-
set -o errexit
428-
set -o pipefail
429-
430-
if [ "$RUNNER_OS" == "Windows" ]; then
431-
# Create a temp dir on the R: ramdisk drive for Windows. The default
432-
# C: drive is extremely slow: https://github.com/actions/runner-images/issues/8755
433-
mkdir -p "R:/temp/embedded-pg"
434-
go run scripts/embedded-pg/main.go -path "R:/temp/embedded-pg" -cache "${EMBEDDED_PG_CACHE_DIR}"
435-
elif [ "$RUNNER_OS" == "macOS" ]; then
436-
# Postgres runs faster on a ramdisk on macOS too
437-
mkdir -p /tmp/tmpfs
438-
sudo mount_tmpfs -o noowners -s 8g /tmp/tmpfs
439-
go run scripts/embedded-pg/main.go -path /tmp/tmpfs/embedded-pg -cache "${EMBEDDED_PG_CACHE_DIR}"
440-
elif [ "$RUNNER_OS" == "Linux" ]; then
441-
make test-postgres-docker
442-
fi
425+
mkdir -p "$RUNNER_TEMP/sym"
426+
source scripts/normalize_path.sh
427+
normalize_path_with_symlinks "$RUNNER_TEMP/sym" "$(dirname "$(which terraform)")"
443428
444-
# if macOS, install google-chrome for scaletests
445-
# As another concern, should we really have this kind of external dependency
446-
# requirement on standard CI?
447-
if [ "${RUNNER_OS}" == "macOS" ]; then
448-
brew install google-chrome
449-
fi
429+
- name: Setup RAM disk for Embedded Postgres (Windows)
430+
if: runner.os == 'Windows'
431+
shell: bash
432+
# The default C: drive is extremely slow:
433+
# https://github.com/actions/runner-images/issues/8755
434+
run: mkdir -p "R:/temp/embedded-pg"
450435

451-
# macOS will output "The default interactive shell is now zsh"
452-
# intermittently in CI...
453-
if [ "${RUNNER_OS}" == "macOS" ]; then
454-
touch ~/.bash_profile && echo "export BASH_SILENCE_DEPRECATION_WARNING=1" >> ~/.bash_profile
455-
fi
436+
- name: Setup RAM disk for Embedded Postgres (macOS)
437+
if: runner.os == 'macOS'
438+
shell: bash
439+
run: |
440+
# Postgres runs faster on a ramdisk on macOS.
441+
mkdir -p /tmp/tmpfs
442+
sudo mount_tmpfs -o noowners -s 8g /tmp/tmpfs
456443
457-
if [ "${RUNNER_OS}" == "Windows" ]; then
458-
# Our Windows runners have 16 cores.
459-
# On Windows Postgres chokes up when we have 16x16=256 tests
460-
# running in parallel, and dbtestutil.NewDB starts to take more than
461-
# 10s to complete sometimes causing test timeouts. With 16x8=128 tests
462-
# Postgres tends not to choke.
463-
export TEST_NUM_PARALLEL_PACKAGES=8
464-
export TEST_NUM_PARALLEL_TESTS=16
465-
# Only the CLI and Agent are officially supported on Windows and the rest are too flaky
466-
export TEST_PACKAGES="./cli/... ./enterprise/cli/... ./agent/..."
467-
elif [ "${RUNNER_OS}" == "macOS" ]; then
468-
# Our macOS runners have 8 cores. We set NUM_PARALLEL_TESTS to 16
469-
# because the tests complete faster and Postgres doesn't choke. It seems
470-
# that macOS's tmpfs is faster than the one on Windows.
471-
export TEST_NUM_PARALLEL_PACKAGES=8
472-
export TEST_NUM_PARALLEL_TESTS=16
473-
# Only the CLI and Agent are officially supported on macOS and the rest are too flaky
474-
export TEST_PACKAGES="./cli/... ./enterprise/cli/... ./agent/..."
475-
elif [ "${RUNNER_OS}" == "Linux" ]; then
476-
# Our Linux runners have 8 cores.
477-
export TEST_NUM_PARALLEL_PACKAGES=8
478-
export TEST_NUM_PARALLEL_TESTS=8
479-
fi
444+
# Install google-chrome for scaletests.
445+
# As another concern, should we really have this kind of external dependency
446+
# requirement on standard CI?
447+
brew install google-chrome
480448
481-
# by default, run tests with cache
482-
if [ "${GITHUB_REF}" == "refs/heads/main" ]; then
483-
# on main, run tests without cache
484-
export TEST_COUNT="1"
485-
fi
449+
# macOS will output "The default interactive shell is now zsh" intermittently in CI.
450+
touch ~/.bash_profile && echo "export BASH_SILENCE_DEPRECATION_WARNING=1" >> ~/.bash_profile
486451
487-
mkdir -p "$RUNNER_TEMP/sym"
488-
source scripts/normalize_path.sh
489-
# terraform gets installed in a random directory, so we need to normalize
490-
# the path to the terraform binary or a bunch of cached tests will be
491-
# invalidated. See scripts/normalize_path.sh for more details.
492-
normalize_path_with_symlinks "$RUNNER_TEMP/sym" "$(dirname "$(which terraform)")"
452+
- name: Test with PostgreSQL Database (Linux)
453+
if: runner.os == 'Linux'
454+
uses: ./.github/actions/test-go-pg
455+
with:
456+
postgres-version: "13"
457+
# Our Linux runners have 16 cores.
458+
test-parallelism-packages: "16"
459+
test-parallelism-tests: "8"
460+
# By default, run tests with cache for improved speed (possibly at the expense of correctness).
461+
# On main, run tests without cache for the inverse.
462+
test-count: ${{ github.ref == 'refs/heads/main' && '1' || '' }}
493463

494-
make test
464+
- name: Test with PostgreSQL Database (macOS)
465+
if: runner.os == 'macOS'
466+
uses: ./.github/actions/test-go-pg
467+
with:
468+
postgres-version: "13"
469+
# Our macOS runners have 8 cores.
470+
# Even though this parallelism seems high, we've observed relatively low flakiness in the past.
471+
# See https://github.com/coder/coder/pull/21091#discussion_r2609891540.
472+
test-parallelism-packages: "8"
473+
test-parallelism-tests: "16"
474+
# By default, run tests with cache for improved speed (possibly at the expense of correctness).
475+
# On main, run tests without cache for the inverse.
476+
test-count: ${{ github.ref == 'refs/heads/main' && '1' || '' }}
477+
# Only the CLI and Agent are officially supported on macOS; the rest are too flaky.
478+
test-packages: "./cli/... ./enterprise/cli/... ./agent/..."
479+
embedded-pg-path: "/tmp/tmpfs/embedded-pg"
480+
embedded-pg-cache: ${{ steps.embedded-pg-cache.outputs.embedded-pg-cache }}
481+
482+
- name: Test with PostgreSQL Database (Windows)
483+
if: runner.os == 'Windows'
484+
uses: ./.github/actions/test-go-pg
485+
with:
486+
postgres-version: "13"
487+
# Our Windows runners have 32 cores.
488+
test-parallelism-packages: "32"
489+
test-parallelism-tests: "16"
490+
# By default, run tests with cache for improved speed (possibly at the expense of correctness).
491+
# On main, run tests without cache for the inverse.
492+
test-count: ${{ github.ref == 'refs/heads/main' && '1' || '' }}
493+
# Only the CLI and Agent are officially supported on Windows; the rest are too flaky.
494+
test-packages: "./cli/... ./enterprise/cli/... ./agent/..."
495+
embedded-pg-path: "R:/temp/embedded-pg"
496+
embedded-pg-cache: ${{ steps.embedded-pg-cache.outputs.embedded-pg-cache }}
495497

496498
- name: Upload failed test db dumps
497499
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
@@ -521,7 +523,7 @@ jobs:
521523
api-key: ${{ secrets.DATADOG_API_KEY }}
522524

523525
test-go-pg-17:
524-
runs-on: ${{ github.repository_owner == 'coder' && 'depot-ubuntu-22.04-8' || 'ubuntu-latest' }}
526+
runs-on: ${{ github.repository_owner == 'coder' && 'depot-ubuntu-22.04-16' || 'ubuntu-latest' }}
525527
needs:
526528
- changes
527529
if: needs.changes.outputs.go == 'true' || needs.changes.outputs.ci == 'true' || github.ref == 'refs/heads/main'
@@ -554,12 +556,25 @@ jobs:
554556
with:
555557
key-prefix: test-go-pg-17-${{ runner.os }}-${{ runner.arch }}
556558

557-
- name: Test with PostgreSQL Database
558-
env:
559-
POSTGRES_VERSION: "17"
560-
TS_DEBUG_DISCO: "true"
559+
- name: Normalize Terraform Path for Caching
560+
shell: bash
561+
# Terraform gets installed in a random directory, so we need to normalize
562+
# the path or many cached tests will be invalidated.
561563
run: |
562-
make test-postgres
564+
mkdir -p "$RUNNER_TEMP/sym"
565+
source scripts/normalize_path.sh
566+
normalize_path_with_symlinks "$RUNNER_TEMP/sym" "$(dirname "$(which terraform)")"
567+
568+
- name: Test with PostgreSQL Database
569+
uses: ./.github/actions/test-go-pg
570+
with:
571+
postgres-version: "17"
572+
# Our Linux runners have 16 cores.
573+
test-parallelism-packages: "16"
574+
test-parallelism-tests: "8"
575+
# By default, run tests with cache for improved speed (possibly at the expense of correctness).
576+
# On main, run tests without cache for the inverse.
577+
test-count: ${{ github.ref == 'refs/heads/main' && '1' || '' }}
563578

564579
- name: Upload Test Cache
565580
uses: ./.github/actions/test-cache/upload
@@ -575,7 +590,7 @@ jobs:
575590
api-key: ${{ secrets.DATADOG_API_KEY }}
576591

577592
test-go-race-pg:
578-
runs-on: ${{ github.repository_owner == 'coder' && 'depot-ubuntu-22.04-16' || 'ubuntu-latest' }}
593+
runs-on: ${{ github.repository_owner == 'coder' && 'depot-ubuntu-22.04-32' || 'ubuntu-latest' }}
579594
needs: changes
580595
if: needs.changes.outputs.go == 'true' || needs.changes.outputs.ci == 'true' || github.ref == 'refs/heads/main'
581596
timeout-minutes: 25
@@ -603,16 +618,28 @@ jobs:
603618
with:
604619
key-prefix: test-go-race-pg-${{ runner.os }}-${{ runner.arch }}
605620

621+
- name: Normalize Terraform Path for Caching
622+
shell: bash
623+
# Terraform gets installed in a random directory, so we need to normalize
624+
# the path or many cached tests will be invalidated.
625+
run: |
626+
mkdir -p "$RUNNER_TEMP/sym"
627+
source scripts/normalize_path.sh
628+
normalize_path_with_symlinks "$RUNNER_TEMP/sym" "$(dirname "$(which terraform)")"
629+
606630
# We run race tests with reduced parallelism because they use more CPU and we were finding
607631
# instances where tests appear to hang for multiple seconds, resulting in flaky tests when
608632
# short timeouts are used.
609633
# c.f. discussion on https://github.com/coder/coder/pull/15106
634+
# Our Linux runners have 32 cores, but we reduce parallelism since race detection adds a lot of overhead.
635+
# We aim to have parallelism match CPU count (8*4=32) to avoid making flakes worse.
610636
- name: Run Tests
611-
env:
612-
POSTGRES_VERSION: "17"
613-
run: |
614-
make test-postgres-docker
615-
gotestsum --junitfile="gotests.xml" --packages="./..." -- -race -parallel 4 -p 4
637+
uses: ./.github/actions/test-go-pg
638+
with:
639+
postgres-version: "17"
640+
test-parallelism-packages: "8"
641+
test-parallelism-tests: "4"
642+
race-detection: "true"
616643

617644
- name: Upload Test Cache
618645
uses: ./.github/actions/test-cache/upload

0 commit comments

Comments
 (0)