Performance regression in RubyHash for large hashes (~8-10x slower than JRuby 1.7) #9113

Open

Milestone

opened

Summary

JRuby 10 shows significant performance regressions compared to JRuby 1.7 when working with large hashes (>100k entries). The regression primarily affects integer keys and to a lesser extent dynamic strings. Frozen string keys are NOT affected and actually perform better on JRuby 10.

Environment

JRuby 10: 10.0.2.0 (Ruby 3.4.2) + Java Temurin 21.0.9 LTS
JRuby 1.7: 1.7.27 (Ruby 1.9.3) + Java Corretto 8
MRI: 3.4.7 (baseline)
Platform: macOS arm64 (Darwin 25.1.0)

Regression by Key Type

Key Type	MRI 3.4	JRuby 1.7	JRuby 10	JRuby 10 vs 1.7
integer	0.04s	0.03s	0.27s	~8-10x slower
dynamic string	0.20s	0.58s	0.71s	~1.2x slower
symbol	0.53s	0.83s	0.89s	~1.1x slower
frozen string	0.06s	0.42s	0.34s	1.25x faster ✓
Java HashMap (int)	-	0.05s	0.02s	fast ✓

Hash#keys Regression (scales with size, integer keys)

hash = Hash[(1..n).map { |i| [i, i] }]
100.times { hash.keys }

Entries	MRI 3.4	JRuby 1.7	JRuby 10	JRuby 10 vs 1.7
1k	0.01ms	0.08ms	0.09ms	~1x
10k	0.01ms	0.12ms	0.14ms	~1.1x
100k	0.13ms	0.55ms	1.24ms	2.3x
500k	0.34ms	5.0ms	31.5ms	6.3x

Key Observations

Frozen string keys are NOT affected - actually 1.25x faster on JRuby 10
Integer keys most affected - ~8-10x slower than JRuby 1.7
Java HashMap is fast (~0.02s for 500k int puts) - JVM is not the issue
Small hashes unaffected - regression only appears with large hashes (>100k)
Regression scales with size - worse for larger hashes
Both read and write affected - Hash#keys, Hash#values, Hash#[]=

Minimal Reproducers

Integer key regression:

require 'benchmark'

hash = {}
time = Benchmark.realtime { 500_000.times { |i| hash[i] = i } }
puts "Ruby Hash: #{time.round(4)}s"

if RUBY_ENGINE == 'jruby'
  map = java.util.HashMap.new
  time = Benchmark.realtime { 500_000.times { |i| map.put(i, i) } }
  puts "Java HashMap: #{time.round(4)}s"
end

Frozen string keys (NOT affected):

require 'benchmark'

keys = (0...500_000).map { |i| "key#{i}".freeze }
hash = {}
time = Benchmark.realtime { keys.each_with_index { |k, i| hash[k] = i } }
puts "Frozen string keys: #{time.round(4)}s"

Hypothesis

The regression appears specific to integer key handling in RubyHash. The fact that:

Java HashMap with integer keys is fast
Frozen string keys perform well
The regression scales with hash size

...suggests the issue may be in how RubyHash handles Fixnum/Integer hash codes or bucket distribution when growing large hashes.

Impact

Affects code using large hashes with integer keys (lookup tables, caches, ID-based indexing). Code using frozen string keys (common pattern with frozen_string_literal: true) is NOT affected.

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Type

No type

Projects

No projects

Milestone

JRuby 10.0.3.0No due date

Relationships

None yet

Development

No branches or pull requests