Skip to content

Add kernel keyring troubleshooting to v2 Docker-in-workspaces documentation #19093

@blink-so

Description

@blink-so

Problem

Users running long-lived workspaces with envbox and frequent Docker commands encounter this error:

error: failed to start container: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: unable to join session keyring: unable to create session key: disk quota exceeded: unknown

This issue occurs when Linux kernel keyring limits are exceeded. Each Docker container creates session keys via runc, and envbox with sysbox creates additional keyrings. The default kernel limits are:

  • kernel.keys.maxkeys = 200 (max number of keys)
  • kernel.keys.maxbytes = 20,000 (max total key data)

Proposed Solution

Add a troubleshooting section to the v2 Docker-in-workspaces documentation (similar to what existed in v1 docs) that includes:

Root Cause Explanation

  • Kernel keyring limits being exceeded
  • How Docker containers and envbox contribute to keyring usage

Solutions

  1. Increase kernel limits on nodes:
sudo sysctl -w kernel.keys.maxkeys=20000
sudo sysctl -w kernel.keys.maxbytes=500000
  1. Make permanent via sysctl config:
echo 'kernel.keys.maxkeys=20000' >> /etc/sysctl.d/99-keys.conf
echo 'kernel.keys.maxbytes=500000' >> /etc/sysctl.d/99-keys.conf
sysctl -p /etc/sysctl.d/99-keys.conf
  1. For Karpenter users, add to EC2NodeClass userData:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  userData: |
    #!/bin/bash
    /etc/eks/bootstrap.sh your-cluster-name
    
    # Increase kernel keyring limits
    echo 'kernel.keys.maxkeys=20000' >> /etc/sysctl.d/99-keys.conf
    echo 'kernel.keys.maxbytes=500000' >> /etc/sysctl.d/99-keys.conf
    sysctl -p /etc/sysctl.d/99-keys.conf
  1. Deploy via DaemonSet (reference existing v1 documentation)

  2. Manual keyring clearing:

keyctl clear @u  # clears user keyring

Impact

This affects users running Docker-in-Docker setups with envbox, especially in Kubernetes environments where workspaces run for extended periods. Node recycling currently "fixes" the issue by resetting kernel state, but proper kernel configuration prevents the problem entirely.

References

  • Original user report and troubleshooting discussion
  • This issue exists in both v1 and v2 since it's a kernel-level limitation
  • Similar documentation existed in Coder v1 troubleshooting guides

Co-authored-by: Chris Racioppo
Co-authored-by: Nick Spangler

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsArea: coder.com/docs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions