-
Notifications
You must be signed in to change notification settings - Fork 5.1k
[8.0] Update CI OSes #115502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/8.0-staging
Are you sure you want to change the base?
[8.0] Update CI OSes #115502
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the CI pipeline configuration for Helix queues by modifying the OS image definitions used in various Linux job conditions.
- Added new entries for AzureLinux.3.0.Amd64.Open
- Adjusted OS image selections for different conditional branches
- Reordered some entries, including reintroducing the Centos.9.Amd64.Open image in one branch
Comments suppressed due to low confidence (2)
eng/pipelines/libraries/helix-queues-setup.yml:62
- The OS image tag in this line uses 'open' in lowercase, while other similar entries use 'Open'. Standardize the casing to ensure consistency.
- (AzureLinux.3.0.Amd64.Open)Ubuntu.2204.Amd64.open@mcr.microsoft.com/dotnet-buildtools/prereqs:azurelinux-3.0-helix-amd64
eng/pipelines/libraries/helix-queues-setup.yml:71
- The casing of 'open' in the OS image tag does not match the uppercase pattern seen in other entries. Consider using 'Open' to maintain consistency.
- (AzureLinux.3.0.Amd64.Open)Ubuntu.2204.Amd64.open@mcr.microsoft.com/dotnet-buildtools/prereqs:azurelinux-3.0-helix-amd64
/azp run runtime-libraries-coreclr outerloop-linux |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-libraries-coreclr outerloop-linux |
Azure Pipelines successfully started running 1 pipeline(s). |
These failures all look existing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/azp run runtime-libraries-coreclr outerloop-linux |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 1 pipeline(s). |
Linux-x64 extra platforms has this failure. It doesn't seem to be in the rolling build. I'll run it again to see if it is just flakiness, since the error is a timeout. Everything else looks like failures in others branches or that seem to be in multiple OSes. Note: Azure Linux 3 is/was tested in extra platforms before and after this PR. However, it transitioned from being container-based to VM based (which we also did in
Console log: 'System.Net.Sockets.Tests' from job d391c3db-4624-42f9-ad04-d66836807c35 workitem e364940e-0f90-4302-82c3-e866a22447ee (azurelinux.3.amd64.open.svc) executed on machine a0004S4 running Linux-6.6.92.2-1.azl3-x86_64-with-glibc2.38
+ ./RunTests.sh --runtime-path /datadisks/disk1/work/A9B908EB/p
----- start Tue Jul 1 09:09:24 PM UTC 2025 =============== To repro directly: =====================================================
pushd .
/datadisks/disk1/work/A9B908EB/p/dotnet exec --runtimeconfig System.Net.Sockets.Tests.runtimeconfig.json --depsfile System.Net.Sockets.Tests.deps.json xunit.console.dll System.Net.Sockets.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing
popd
===========================================================================================================
/datadisks/disk1/work/A9B908EB/w/9D8108B8/e /datadisks/disk1/work/A9B908EB/w/9D8108B8/e
Discovering: System.Net.Sockets.Tests (method display = ClassAndMethod, method display options = None)
Discovered: System.Net.Sockets.Tests (found 1447 of 1812 test cases)
Starting: System.Net.Sockets.Tests (parallel test collections = on, max threads = 2)
System.Net.Sockets.Tests.CreateSocket.Ctor_Raw_Supported_Success [SKIP]
Condition(s) not met: "SupportsRawSockets"
System.Net.Sockets.Tests.SocketOptionNameTest.MulticastInterface_Set_AnyInterface_Succeeds [FAIL]
System.TimeoutException : The operation has timed out.
Stack Trace:
/_/src/libraries/System.Net.Sockets/tests/FunctionalTests/SocketOptionNameTest.cs(106,0): at System.Net.Sockets.Tests.SocketOptionNameTest.MulticastInterface_Set_Helper(Int32 interfaceIndex)
/_/src/libraries/System.Net.Sockets/tests/FunctionalTests/SocketOptionNameTest.cs(72,0): at System.Net.Sockets.Tests.SocketOptionNameTest.MulticastInterface_Set_AnyInterface_Succeeds()
--- End of stack trace from previous location ---
System.Net.Sockets.Tests.SocketOptionNameTest.MulticastInterface_Set_IPv6_AnyInterface_Succeeds [FAIL]
System.TimeoutException : The operation has timed out.
Stack Trace:
/_/src/libraries/System.Net.Sockets/tests/FunctionalTests/SocketOptionNameTest.cs(213,0): at System.Net.Sockets.Tests.SocketOptionNameTest.MulticastInterface_Set_IPv6_Helper(Int32 interfaceIndex)
/_/src/libraries/System.Net.Sockets/tests/FunctionalTests/SocketOptionNameTest.cs(136,0): at System.Net.Sockets.Tests.SocketOptionNameTest.MulticastInterface_Set_IPv6_AnyInterface_Succeeds()
--- End of stack trace from previous location ---
Finished: System.Net.Sockets.Tests
=== TEST EXECUTION SUMMARY ===
System.Net.Sockets.Tests Total: 2302, Errors: 0, Failed: 2, Skipped: 1, Time: 63.735s @dotnet/ncl |
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
Same issue is there on re-run. https://github.com/dotnet/runtime/pull/115502/checks?check_run_id=45177352432 |
it may be OS configuration. Multicast is not that common. We may either investigate environment differences and/or make the test conditional. |
Except it seems to be passing on Azure Linux 3 already. This is from yesterday's rolling run, including Azure Linux 3 (just in container not VM). |
right. But there is single kernel where routing and configuration happen. It is different between docker and VM. e.g. AZLinux in docker does not use kernel from AZLinux and also the configuration is likely different. I don't think that would be difficult to fix but fundamentally they are two different environments. |
Got it. Yes of course. So, this is the first time these tests are seeing the AL3 kernel, which is indeed the goal. I just double checked ... all the containers are (prior to this change) using the Ubuntu 22.04 VM/kernel. |
I am not seeing this failure in I made this change to double-validate: #117439 |
Scratch that. I can see this in This suggests to me (A) that we should change It seems like we're seeing a different in behavior for this test running in VM rather than container. It seems like the failure is in the former case and not the latter (even with the VM being the host for the container). That doesn't make obvious sense, so there must be something additional at play. |
it looks like azurelinux.3 images have firewall rules to filter traffic.
When I disable them, all the networking tests pass
|
we can try to detect that but it may be tricky. It seems like disabling the firewall may be the best option. It would probably take till next week to get the updates out. Any thought on thins @dotnet/ncl ? |
Thanks for investigating!
Per: #115415 (comment) We cannot change the IP tables settings. We need to make some change to the tests. We can set a ENV in the Azure Linux helix images to signal a generic test configuration (not When we make the change, let's make it in |
I'm not sure that comment is applicable. We can disable the tests. But I see no benefit for customers. I personally see more valuable making sure multicast works when system allows it. The other way is not interesting IMHO e.g. - test it breaks when blocked by firewall. |
I think it applies the same. It is possible if we disable the firewall that some tests (at some point) will "false positive pass" in that configuration. It plays both ways. My take is that we test in the default configuration. If we get sufficient signal, we can fund a second Azure Linux configuration. This feature is still being tested, generally. We can accept that we are not getting coverage for this feature on Azure Linux (in absence of more user signal). |
We want our tests to be passing on the default OS configuration (for supported OSes at least). If the test is not compatible with given OS default configuration, it should be disabled. It can be disabled by either detecting the incompatible configuration (preferred - example of prior art There are number of options for non-default configurations with different cost/benefits tradeoffs. In this case, I think it is fine to depend on indirect coverage via other distros. |
@wfurt @dotnet/ncl Ping. Can we make resolving this issue, in |
BTW there are other modification in the Helix code - like increasing # of file descriptors. Should we also change that back to OS default? |
Thanks for asking. No. Slightly hypocritical, but we need that for our tests to run at all. Context: dotnet/dnceng#5728 (comment) I am hoping that AL4 matches the other distros. |
@richlander #117694 is merged in main. Lmk if you want this in 9.0 and 8.0 and I'll put up a backports. |
Related: