Skip to content

Performance regression in RubyHash for large hashes (~8-10x slower than JRuby 1.7) #9113

@tbsvttr

Description

@tbsvttr

Summary

JRuby 10 shows significant performance regressions compared to JRuby 1.7 when working with large hashes (>100k entries). The regression primarily affects integer keys and to a lesser extent dynamic strings. Frozen string keys are NOT affected and actually perform better on JRuby 10.

Environment

  • JRuby 10: 10.0.2.0 (Ruby 3.4.2) + Java Temurin 21.0.9 LTS
  • JRuby 1.7: 1.7.27 (Ruby 1.9.3) + Java Corretto 8
  • MRI: 3.4.7 (baseline)
  • Platform: macOS arm64 (Darwin 25.1.0)

Regression by Key Type

Key Type MRI 3.4 JRuby 1.7 JRuby 10 JRuby 10 vs 1.7
integer 0.04s 0.03s 0.27s ~8-10x slower
dynamic string 0.20s 0.58s 0.71s ~1.2x slower
symbol 0.53s 0.83s 0.89s ~1.1x slower
frozen string 0.06s 0.42s 0.34s 1.25x faster
Java HashMap (int) - 0.05s 0.02s fast ✓

Hash#keys Regression (scales with size, integer keys)

hash = Hash[(1..n).map { |i| [i, i] }]
100.times { hash.keys }
Entries MRI 3.4 JRuby 1.7 JRuby 10 JRuby 10 vs 1.7
1k 0.01ms 0.08ms 0.09ms ~1x
10k 0.01ms 0.12ms 0.14ms ~1.1x
100k 0.13ms 0.55ms 1.24ms 2.3x
500k 0.34ms 5.0ms 31.5ms 6.3x

Key Observations

  1. Frozen string keys are NOT affected - actually 1.25x faster on JRuby 10
  2. Integer keys most affected - ~8-10x slower than JRuby 1.7
  3. Java HashMap is fast (~0.02s for 500k int puts) - JVM is not the issue
  4. Small hashes unaffected - regression only appears with large hashes (>100k)
  5. Regression scales with size - worse for larger hashes
  6. Both read and write affected - Hash#keys, Hash#values, Hash#[]=

Minimal Reproducers

Integer key regression:

require 'benchmark'

hash = {}
time = Benchmark.realtime { 500_000.times { |i| hash[i] = i } }
puts "Ruby Hash: #{time.round(4)}s"

if RUBY_ENGINE == 'jruby'
  map = java.util.HashMap.new
  time = Benchmark.realtime { 500_000.times { |i| map.put(i, i) } }
  puts "Java HashMap: #{time.round(4)}s"
end

Frozen string keys (NOT affected):

require 'benchmark'

keys = (0...500_000).map { |i| "key#{i}".freeze }
hash = {}
time = Benchmark.realtime { keys.each_with_index { |k, i| hash[k] = i } }
puts "Frozen string keys: #{time.round(4)}s"

Hypothesis

The regression appears specific to integer key handling in RubyHash. The fact that:

  • Java HashMap with integer keys is fast
  • Frozen string keys perform well
  • The regression scales with hash size

...suggests the issue may be in how RubyHash handles Fixnum/Integer hash codes or bucket distribution when growing large hashes.

Impact

Affects code using large hashes with integer keys (lookup tables, caches, ID-based indexing). Code using frozen string keys (common pattern with frozen_string_literal: true) is NOT affected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions