Skip to content

Conversation

@johnsheu
Copy link

@johnsheu johnsheu commented Jul 12, 2025

wait_for() and wait_until() for condition_variable and condition_variable_any take their wait duration or timeout parameters by const-reference. This can lead to unexpected behavior if the referent changes while the wait_for() or wait_until() blocks.

For condition_variable_any, this is an actual data race if the timeout parameter is protected by the lock, as the implementation accesses the timeout outside of the lock.

Fix this by saving a local copy of the duration or timeout while the condition variable is waiting.

Fixes #148328

wait_for() and wait_until() for condition_variable and
condition_variable_any take their wait duration or timeout parameters
by const-reference.  This can lead to unexpected behavior if the
referent changes while the wait_for() or wait_until() blocks.

For condition_variable_any, this is an actual data race if the timeout
parameter is protected by the lock, as the implementation accesses the
timeout outside of the lock.

Fix this by saving a local copy of the duration or timeout while the
condition variable is waiting.
@johnsheu johnsheu requested a review from a team as a code owner July 12, 2025 02:44
@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Jul 12, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 12, 2025

@llvm/pr-subscribers-libcxx

Author: John Sheu (johnsheu)

Changes

wait_for() and wait_until() for condition_variable and condition_variable_any take their wait duration or timeout parameters by const-reference. This can lead to unexpected behavior if the referent changes while the wait_for() or wait_until() blocks.

For condition_variable_any, this is an actual data race if the timeout parameter is protected by the lock, as the implementation accesses the timeout outside of the lock.

Fix this by saving a local copy of the duration or timeout while the condition variable is waiting.


Full diff: https://github.com/llvm/llvm-project/pull/148330.diff

2 Files Affected:

  • (modified) libcxx/include/__condition_variable/condition_variable.h (+10-7)
  • (modified) libcxx/include/condition_variable (+6-3)
diff --git a/libcxx/include/__condition_variable/condition_variable.h b/libcxx/include/__condition_variable/condition_variable.h
index 1e8edd5dcb009..07f81c532127a 100644
--- a/libcxx/include/__condition_variable/condition_variable.h
+++ b/libcxx/include/__condition_variable/condition_variable.h
@@ -118,21 +118,23 @@ class _LIBCPP_EXPORTED_FROM_ABI condition_variable {
     using namespace chrono;
     using __clock_tp_ns = time_point<_Clock, nanoseconds>;
 
-    typename _Clock::time_point __now = _Clock::now();
+    const typename _Clock::time_point __now = _Clock::now();
     if (__t <= __now)
       return cv_status::timeout;
+    const typename _Clock::time_point __t_local = __t;
 
-    __clock_tp_ns __t_ns = __clock_tp_ns(std::__safe_nanosecond_cast(__t.time_since_epoch()));
+    __clock_tp_ns __t_ns = __clock_tp_ns(std::__safe_nanosecond_cast(__t_local.time_since_epoch()));
 
     __do_timed_wait(__lk, __t_ns);
-    return _Clock::now() < __t ? cv_status::no_timeout : cv_status::timeout;
+    return _Clock::now() < __t_local ? cv_status::no_timeout : cv_status::timeout;
   }
 
   template <class _Clock, class _Duration, class _Predicate>
   _LIBCPP_HIDE_FROM_ABI bool
   wait_until(unique_lock<mutex>& __lk, const chrono::time_point<_Clock, _Duration>& __t, _Predicate __pred) {
+    const typename _Clock::time_point __t_local = __t;
     while (!__pred()) {
-      if (wait_until(__lk, __t) == cv_status::timeout)
+      if (wait_until(__lk, __t_local) == cv_status::timeout)
         return __pred();
     }
     return true;
@@ -144,7 +146,8 @@ class _LIBCPP_EXPORTED_FROM_ABI condition_variable {
     if (__d <= __d.zero())
       return cv_status::timeout;
     using __ns_rep                   = nanoseconds::rep;
-    steady_clock::time_point __c_now = steady_clock::now();
+    const steady_clock::time_point __c_now  = steady_clock::now();
+    const duration<_Rep, _Period> __d_local = __d;
 
 #  if _LIBCPP_HAS_COND_CLOCKWAIT
     using __clock_tp_ns     = time_point<steady_clock, nanoseconds>;
@@ -154,7 +157,7 @@ class _LIBCPP_EXPORTED_FROM_ABI condition_variable {
     __ns_rep __now_count_ns = std::__safe_nanosecond_cast(system_clock::now().time_since_epoch()).count();
 #  endif
 
-    __ns_rep __d_ns_count = std::__safe_nanosecond_cast(__d).count();
+    __ns_rep __d_ns_count = std::__safe_nanosecond_cast(__d_local).count();
 
     if (__now_count_ns > numeric_limits<__ns_rep>::max() - __d_ns_count) {
       __do_timed_wait(__lk, __clock_tp_ns::max());
@@ -162,7 +165,7 @@ class _LIBCPP_EXPORTED_FROM_ABI condition_variable {
       __do_timed_wait(__lk, __clock_tp_ns(nanoseconds(__now_count_ns + __d_ns_count)));
     }
 
-    return steady_clock::now() - __c_now < __d ? cv_status::no_timeout : cv_status::timeout;
+    return steady_clock::now() - __c_now < __d_local ? cv_status::no_timeout : cv_status::timeout;
   }
 
   template <class _Rep, class _Period, class _Predicate>
diff --git a/libcxx/include/condition_variable b/libcxx/include/condition_variable
index 99c74b02807ae..57e995f5224b1 100644
--- a/libcxx/include/condition_variable
+++ b/libcxx/include/condition_variable
@@ -188,9 +188,10 @@ public:
   _LIBCPP_HIDE_FROM_ABI cv_status wait_until(_Lock& __lock, const chrono::time_point<_Clock, _Duration>& __t) {
     shared_ptr<mutex> __mut = __mut_;
     unique_lock<mutex> __lk(*__mut);
+    const chrono::time_point<_Clock, _Duration> __t_local = __t;
     __unlock_guard<_Lock> __unlock(__lock);
     lock_guard<unique_lock<mutex> > __lx(__lk, adopt_lock_t());
-    return __cv_.wait_until(__lk, __t);
+    return __cv_.wait_until(__lk, __t_local);
   } // __mut_.unlock(), __lock.lock()
 
   template <class _Lock, class _Clock, class _Duration, class _Predicate>
@@ -240,8 +241,9 @@ inline void condition_variable_any::wait(_Lock& __lock, _Predicate __pred) {
 template <class _Lock, class _Clock, class _Duration, class _Predicate>
 inline bool
 condition_variable_any::wait_until(_Lock& __lock, const chrono::time_point<_Clock, _Duration>& __t, _Predicate __pred) {
+  const chrono::time_point<_Clock, _Duration> __t_local = __t;
   while (!__pred())
-    if (wait_until(__lock, __t) == cv_status::timeout)
+    if (wait_until(__lock, __t_local) == cv_status::timeout)
       return __pred();
   return true;
 }
@@ -313,6 +315,7 @@ bool condition_variable_any::wait_until(
 
   shared_ptr<mutex> __mut = __mut_;
   stop_callback __cb(__stoken, [this] { notify_all(); });
+  const chrono::time_point<_Clock, _Duration> __abs_time_local = __abs_time;
 
   while (true) {
     if (__pred())
@@ -326,7 +329,7 @@ bool condition_variable_any::wait_until(
     unique_lock<mutex> __internal_lock2(
         std::move(__internal_lock)); // switch unlock order between __internal_lock and __user_lock
 
-    if (__cv_.wait_until(__internal_lock2, __abs_time) == cv_status::timeout)
+    if (__cv_.wait_until(__internal_lock2, __abs_time_local) == cv_status::timeout)
       break;
   } // __internal_lock2.unlock(), __user_lock.lock()
   return __pred();

@zwuis
Copy link
Contributor

zwuis commented Jul 12, 2025

Can you explain why programmers using libc++ don't need to save a local copy to avoid data race, but libc++ should?

@philnik777
Copy link
Contributor

This looks to me like it's just obfuscating a data race.

@johnsheu
Copy link
Author

Consider this program.

  1. Is it well-formed?
  2. What should it output?
#include <chrono>
#include <condition_variable>
#include <iostream>
#include <latch>
#include <mutex>
#include <thread>

int main(int argc, char** argv) {
  std::mutex mutex;
  std::condition_variable cv;
  std::latch latch(1);
  auto timeout = std::chrono::steady_clock::time_point::max();

  std::thread thread1([&](){
    std::unique_lock lock(mutex);
    latch.count_down();

    const auto status = cv.wait_until(lock, timeout);
    switch (status) {
      case std::cv_status::no_timeout:
        std::cout << "no_timeout" << std::endl;
        break;
      case std::cv_status::timeout:
        std::cout << "timeout" << std::endl;
        break;
    }
  });

  std::thread thread2([&](){
    latch.wait();
    std::unique_lock lock(mutex);

    cv.notify_one();
    timeout = std::chrono::steady_clock::time_point::min();
  });

  thread1.join();
  thread2.join();
  return 0;
}

Copy link
Member

@huixie90 huixie90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think Libc++ is the right place for this kind of things. If you believe there needs to be copies, you should ask SG1 or file LWG issue for making it pass by value.

@johnsheu
Copy link
Author

johnsheu commented Jul 12, 2025

To answer (2) above first:

sheu@sheu-mac ~ % clang++ -o cv_wait_until cv_wait_until.cc  -std=c++20 && ./cv_wait_until
timeout

Even though timeout is time_point::max() at the point that cv.wait_until() is invoked (which should cause no timeout on wakeup), the later change to time_point::min() causes the wait to report that the timeout occurred. Note that from the point of view of the programmer, timeout is itself properly locked.

To answer (1), the program is well-formed in the case that std::condition_variable is used, it just has unexpected behavior. However, in the case that std::condition_variable_any is used instead, the program is ill-formed, as the implementation actually unlocks the lock and then accesses timeout, causing a data race. This could be considered an actual bug in libc++ implementation.

@johnsheu
Copy link
Author

The relevant lines in the std::condition_variable_any implementation are highlighted here:

  template <class _Lock, class _Clock, class _Duration>
  _LIBCPP_HIDE_FROM_ABI cv_status wait_until(_Lock& __lock, const chrono::time_point<_Clock, _Duration>& __t) {
    shared_ptr<mutex> __mut = __mut_;
    unique_lock<mutex> __lk(*__mut);

    // User-supplied lock is unlocked here
    __unlock_guard<_Lock> __unlock(__lock);
    lock_guard<unique_lock<mutex> > __lx(__lk, adopt_lock_t());

    // User-supplied reference parameter is used here; bad news if it was protected by the above lock
    return __cv_.wait_until(__lk, __t);
  } // __mut_.unlock(), __lock.lock()

@ldionne
Copy link
Member

ldionne commented Jul 14, 2025

I tend to agree with other reviewers -- the API is specified to take a const&, so you're not allowed to assume that we make a copy locally. At the same time, I'm kind of conflicted because the user code does appear to be correct: accesses to timeout are guarded by a lock as one would expect.

Can you file a LWG issue for this? I'd be OK with moving ahead with a fix if WG21 agrees that this is not a wording bug, and that implementations are expected to make this work.

@johnsheu
Copy link
Author

Can you file a LWG issue for this? I'd be OK with moving ahead with a fix if WG21 agrees that this is not a wording bug, and that implementations are expected to make this work.

Sure, I think that's a reasonable route for the case with std::condition_variable. In this case, the implementation still accesses timeout under lock, so while it internally unlocks/locks and thus timeout can change between the beginning and the end of the call, it's still accessed under the lock and thus protected.. The behavior is surprising, but not "incorrect".

I think the situation with std::condition_variable_any is arguably "actually incorrect", since it accesses timeout while not under lock. I'd just like to make sure that the reviewers here are also considering this situation.

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a discussion with @huixie90 about this. We both agree that the spec should explicitly take these time_points by value instead of by reference. Hui will file a LWG issue about it. I would like to do this and consider it a DR -- it's not an ABI break for us and it seems like the simplest and most self-documenting way to handle this. CC @jwakely

Let's wait until we have a LWG issue and see whether there's appetite for moving to by-value.

_LIBCPP_HIDE_FROM_ABI cv_status wait_until(_Lock& __lock, const chrono::time_point<_Clock, _Duration>& __t) {
shared_ptr<mutex> __mut = __mut_;
unique_lock<mutex> __lk(*__mut);
const chrono::time_point<_Clock, _Duration> __t_local = __t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a bug right now. I wonder if maybe it would be conforming to change wait_until to take by value today, without waiting for the Standard to say so. I think it might be conforming, assuming there is no way for that change to be observable (and I don't see how taking by value instead of const& is observable).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "observable" here? The specification is for the call to wait_until() to take the timeout by-const-ref, so taking it by-value would be nonconforming. Note though that since time_point is also specified to have a copy constructor, changing the specification take it directly by-value would not break existing code.

If you mean taking the timeout by-const-ref and then making a value copy internally: I think that would be conforming, and would be unconditionally possible due to the time_point copy constructor. However, it would make an observable difference from current behavior:

  • Current behavior: reducing timeout while call is blocked may result in "spurious" timeout
  • New behavior: reducing timeout while call is blocked will not change timeout; value at the time the call was made will be used

As far as I can tell, both behaviors are technically permissible under the standard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "observable" here? The specification is for the call to wait_until() to take the timeout by-const-ref, so taking it by-value would be nonconforming.

Why? How do you observe the difference? "non-conforming" means that you somehow don't follow the standard. https://eel.is/c++draft/intro.abstract#1 gives us explicit permission to change things as long as they are not observable by users (aka the as-if rule).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "observable" here? The specification is for the call to wait_until() to take the timeout by-const-ref, so taking it by-value would be nonconforming.

Why? How do you observe the difference? "non-conforming" means that you somehow don't follow the standard. https://eel.is/c++draft/intro.abstract#1 gives us explicit permission to change things as long as they are not observable by users (aka the as-if rule).

Again, which proposed change are you referring to? The change to the specification to wait_until() to take the parameter by-value, or the change that internally makes a copy and leaves the external specification unchanged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A program-defined rep type can observe a copy, but it can't tell whether it happens when initializing the function argument or inside the function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking the parameter by-value is directly nonconforming, as the standard calls for the parameter to be taken by-const-ref. This is a change that we cannot do without an amendment to the standard.

As @jwakely points out, taking the parameter by-const-ref and then making a copy internally, in addition to having the observable behavior of different timeout handling, also has a user-observable additional copy.

Copy link
Contributor

@jwakely jwakely Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But those functions already need to be able to make new values, e.g. after converting from a timeout with nanosecond resolution to microsecond resolution, or some other resolution that matches the clock being used. My point is that if they're allowed to convert to another resolution, they can also copy it, and if they can copy it in the function body, they can do it on entry to initialize a parameter.

I think changing the signature is conforming.

You keep insisting it's not, but you seem to just be basing that on "but the standard says it's a reference" and the point is that you can't tell the difference, because copies are already permitted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You keep insisting it's not, but you seem to just be basing that on "but the standard says it's a reference" and the point is that you can't tell the difference, because copies are already permitted.

You're correct, my argument is purely that "the standard says it's a reference", and I assume that to be conforming it must be exactly so. Is there wording in the standard itself that allows this kind of variance? If that's the case I'd be happy to update pull request to take the parameter by-value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, my argument is purely that "the standard says it's a reference", and I assume that to be conforming it must be exactly so. Is there wording in the standard itself that allows this kind of variance? If that's the case I'd be happy to update pull request to take the parameter by-value.

Yes. As I already said, https://eel.is/c++draft/intro.abstract#1 allows for this. As long as there is no way for the user to observe the difference, it's OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also depends on the rule about addressable functions, so that users can't rely on the type of function pointer if they take its address

@johnsheu
Copy link
Author

I just had a discussion with @huixie90 about this. We both agree that the spec should explicitly take these time_points by value instead of by reference. Hui will file a LWG issue about it. I would like to do this and consider it a DR -- it's not an ABI break for us and it seems like the simplest and most self-documenting way to handle this. CC @jwakely

Let's wait until we have a LWG issue and see whether there's appetite for moving to by-value.

Thank you for the update! I agree also with your conclusions, obviously!

I'm unfamiliar with the LWG working process and would greatly appreciate it if you could keep me in the loop as a learning experience. Thank you!

@huixie90
Copy link
Member

I have filed an LWG issue here
https://cplusplus.github.io/LWG/issue4301

Let’s see the response from the reflector first

@huixie90
Copy link
Member

Hi @johnsheu , just FYI this LWG issue is Tentatively Ready and it will be voted in two weeks time in Kona

@frederick-vs-ja
Copy link
Contributor

This PR should also fix #171341. @johnsheu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

std::condition_variable and std::condition_variable_any

8 participants