Prevent fork from happening in getnameinfo #13990

k0kubun · 2025-07-23T17:50:28Z

This PR applies the same fork-safety lock as getaddrinfo #10864 to getnameinfo.

Given that do_getaddrinfo and do_getnameinfo have a very similar structure and one of them needs a fork-safety lock, it seems likely that we need the same thing for the other one.

We're seeing segfaults on rb_getnameinfo at Shopify. I'm not sure if this change fixes the situation, but it helps us investigate the issue by ruling out the possibility of fork during rb_getnameinfo.

byroot · 2025-07-23T19:06:50Z

ext/socket/raddrinfo.c

@@ -752,7 +758,7 @@ rb_getnameinfo(const struct sockaddr *sa, socklen_t salen,
    }

    pthread_t th;
-    if (raddrinfo_pthread_create(&th, do_getnameinfo, arg) != 0) {
+    if (raddrinfo_pthread_create(&th, fork_safe_do_getnameinfo, arg) != 0) {


Do you have the crash info handy? IIRC you said the crash was in raddrinfo_pthread_create?

I'm wondering if the lock should be higher or not. For rb_getaddrinfo the lock is above the thread spawn and wait.

Do you have the crash info handy? IIRC you said the crash was in raddrinfo_pthread_create?

The backtrace is at https://github.com/shop/issues-ruby-infrastructure/issues/5#issuecomment-3098309280. It died on pthread_detach after raddrinfo_pthread_create. So,

I'm wondering if the lock should be higher or not.

I agree that it needs to be higher, but

For rb_getaddrinfo the lock is above the thread spawn and wait.

What rb_getaddrinfo locks today seems to be at the same level as what this PR does. For Linux, GETADDRINFO_IMPL would be 2, and the rb_getaddrinfo for 2 is

ruby/ext/socket/raddrinfo.c

Lines 531 to 537 in 78820e8

if (raddrinfo_pthread_create(&th, fork_safe_do_getaddrinfo, arg) != 0) {

int err = errno;

free_getaddrinfo_arg(arg);

errno = err;

return EAI_SYSTEM;

}

pthread_detach(th);

which takes the fork lock in the function used by the spawned thread.

Should we fix both rb_getaddrinfo and rb_getnameinfo to take the lock before spawning a thread and then release it at least after the detach?

Right, rereading the ticket, the concern in #10864 / [Feature #20590] was that getaddrinfo synchronize a mutex, so the goal was to avoid forking while it is held. And the issue it was trying to avoid is deadlocked child, not a crash in pthread.

I'm OK with moving the lock up so the pthread functions are covered, as I don't think it makes a meqaningful difference, but I think we should at least have a theory that would explain this crash.

The backtrace is at shop/issues-ruby-infrastructure#5 (comment).

For outsiders:

(gdb) bt #0 0x00007d12e809cb2c in pthread_kill () at /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007d12e804327e in raise () at /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007d12e86731c8 in ruby_default_signal (sig=<optimized out>) at signal.c:422 #3 0x00007d12e8434a00 in rb_bug_for_fatal_signal (default_sighandler=0x0, sig=sig@entry=11, ctx=ctx@entry=0x7d12e6f492c0, fmt=fmt@entry=0x7d12e88d5e6c "Segmentation fault at %p") at error.c:1134 #4 0x00007d12e8672446 in sigsegv (sig=11, info=0x7d12e6f493f0, ctx=0x7d12e6f492c0) at signal.c:933 #5 0x00007d12e8043330 in <signal handler called> () at /lib/x86_64-linux-gnu/libc.so.6 #6 0x00007d12e809bc90 in pthread_detach () at /lib/x86_64-linux-gnu/libc.so.6 #7 0x00007d12c5fb011f in rb_getnameinfo (sa=sa@entry=0x7d11ce7f1420, salen=16, host=host@entry=0x7fffdd5d8500 "\340\361-\326\021}", hostlen=hostlen@entry=1024, serv=serv@entry=0x7fffdd5d8900 "", servlen=servlen@entry=1024, flags=3) at raddrinfo.c:735 #8 0x00007d12c5fb0698 in addrinfo_getnameinfo (argc=argc@entry=1, argv=argv@entry=0x7fffdd5d8d40, self=self@entry=137513716759200) at raddrinfo.c:2399 #9 0x00007d12c5fb07a8 in addrinfo_ip_address (self=137513716759200) at raddrinfo.c:2457 #10 0x00007d12e8708ddf in vm_call_cfunc_with_frame_ (ec=0x7d12e786d550, reg_cfp=0x7d12e7983e38, calling=<optimized out>, argc=<optimized out>, argv=<optimized out>, stack_bottom=<optimized out>) at /tmp/ruby-build/ruby-3.4.4/vm_insnhelper.c:3794 #11 0x00007d12e871249c in vm_sendish (method_explorer=<optimized out>, block_handler=<optimized out>, cd=<optimized out>, reg_cfp=<optimized out>, ec=<optimized out>) at /tmp/ruby-build/ruby-3.4.4/vm_callinfo.h:415 #12 vm_exec_core (ec=0x7d1120a1f6c0, ec@entry=0x7d12e786d550) at /tmp/ruby-build/ruby-3.4.4/insns.def:898 #13 0x00007d12e87198b9 in rb_vm_exec (ec=0x7d12e786d550) at vm.c:2595

│ 711 int │ 712 rb_getnameinfo(const struct sockaddr *sa, socklen_t salen, │ 713 char *host, size_t hostlen, │ 714 char *serv, size_t servlen, int flags) │ 715 { │ 716 int retry; │ 717 struct getnameinfo_arg *arg; │ 718 int err, gni_errno = 0; │ 719 │ 720 start: │ 721 retry = 0; │ 722 │ 723 arg = allocate_getnameinfo_arg(sa, salen, hostlen, servlen, flags); │ 724 if (!arg) { │ 725 return EAI_MEMORY; │ 726 } │ 727 │ 728 pthread_t th; │ 729 if (raddrinfo_pthread_create(&th, do_getnameinfo, arg) != 0) { │ 730 int err = errno; │ 731 free_getnameinfo_arg(arg); │ 732 errno = err; │ 733 return EAI_SYSTEM; │ 734 } │ > 735 pthread_detach(th); │ 736

but I think we should at least have a theory that would explain this crash.

Looking at that backtrace, I don't quite see how fork could be the cause. Unless I'm misreading the code.

th is on the stack, so if somehow the GVL was released and another thread called fork, we wouldn't reach pthread_detach.

Unless it's pthread_create that has been broken by fork(2), and now in the child it's incapable of spawning threads?

I was thinking, a signal handler is invoked before pthread_detach(), it happens to start on the thread running getnameinfo, it also calls fork, and the child process (doesn't exit and) moves on to call pthread_detach().

I guess that was an unreasonable guess. But this is all I can come up with for fork.

Other than that, my guess is that a thread without GVL or a signal handler somehow discovers and detaches that thread. It doesn't seem like a good guess either, but otherwise pthread_detach() wouldn't crash.

Prevent fork from happening in getnameinfo

80dd749

k0kubun requested a review from byroot July 23, 2025 17:50

byroot reviewed Jul 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent fork from happening in getnameinfo #13990

Prevent fork from happening in getnameinfo #13990

k0kubun commented Jul 23, 2025

Uh oh!

byroot Jul 23, 2025

Uh oh!

k0kubun Jul 23, 2025

Uh oh!

byroot Jul 24, 2025

Uh oh!

casperisfine Jul 24, 2025

Uh oh!

byroot Jul 24, 2025

Uh oh!

k0kubun Jul 24, 2025

Uh oh!

Uh oh!

	if (raddrinfo_pthread_create(&th, fork_safe_do_getaddrinfo, arg) != 0) {
	int err = errno;
	free_getaddrinfo_arg(arg);
	errno = err;
	return EAI_SYSTEM;
	}
	pthread_detach(th);

Prevent fork from happening in getnameinfo #13990

Are you sure you want to change the base?

Prevent fork from happening in getnameinfo #13990

Conversation

k0kubun commented Jul 23, 2025

Uh oh!

byroot Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

k0kubun Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

byroot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

casperisfine Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

byroot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

k0kubun Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!