DEV: Add `Migrations::SetStore` to work with nested sets of data #33593

gschlager · 2025-07-13T23:27:52Z

There are concrete implementations for a simple set, a key-value store, and nested sets with 2 or 3 keys. The API stays the same for all implementations and the performances is more or less the same as without the wrapper (at least with YJIT enabled).

s3lase · 2025-07-15T15:55:47Z

migrations/lib/common/set_store/key_value_set.rb

+    end
+
+    def include?(key, value)
+      h = @store[key] or return false


This will cause new entries to be created for missing keys. I think include? shouldn't have such a side effect. Perhaps a @store.key?(key) might be better here and the other implementations?

Thanks for catching that. That snuck in in a last-minute refactor.

s3lase · 2025-07-15T18:48:34Z

migrations/lib/common/set_store/three_key_set.rb

+    end
+
+    def bulk_add(records)
+      records.each { |record| @store[record[0]][record[1]][record[2]].add(record[3]) }


Tiny nit: Since our source records are often hierarchical and can be sorted by key, caching intermediate hash lookups might speed things up. I saw a ~2.4x improvement in testing. The downside is that performance worsens with unsorted records. So if we optimize this, we’ll need to ensure requires_set queries are always ordered, or pay the cost of sorting here first, which might offset the gains

That's a good idea. I think we can require records to be sorted. And a quick benchmark of my implementation shows that even for unsorted data, the runtime stays roughly the same.

s3lase · 2025-07-16T13:46:06Z

I've updated the requires_set implementation to use it #33626

There are concrete implementations for a simple set, a key-value store, and nested sets with 2 or 3 keys. The API stays the same for all implementations and the performances is more or less the same as without the wrapper (at least with YJIT enabled).

* Replaced Hash default proc with `||=` for better performance and to prevent unintended key creation when using the `[]` operator. This change slightly reduces readability, but the performance gain justifies it. * Optimized `bulk_add` by assuming input `records` are sorted by keys. Performance for unsorted input remains roughly unchanged.

gschlager · 2025-07-22T20:06:21Z

I pushed a commit with performance improvements:

Replaced Hash default proc with ||= for better performance and to prevent unintended key creation when using the [] operator. This change slightly reduces readability, but the performance gain justifies it.
Optimized bulk_add by assuming input records are sorted by keys. Performance for unsorted input remains roughly unchanged.

github-actions bot added the migrations-tooling PR which includes changes to migrations tooling label Jul 13, 2025

gschlager mentioned this pull request Jul 13, 2025

DEV: Add category_users converter and importer steps #33367

Merged

s3lase reviewed Jul 15, 2025

View reviewed changes

gschlager added 2 commits July 22, 2025 22:02

gschlager force-pushed the mt/set_store branch from 7273bea to ba6fc04 Compare July 22, 2025 20:02

gschlager requested a review from s3lase July 23, 2025 20:55

s3lase approved these changes Jul 24, 2025

View reviewed changes

gschlager merged commit 53d7a75 into main Jul 24, 2025
4 of 5 checks passed

gschlager deleted the mt/set_store branch July 24, 2025 10:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DEV: Add `Migrations::SetStore` to work with nested sets of data #33593

DEV: Add `Migrations::SetStore` to work with nested sets of data #33593

gschlager commented Jul 13, 2025

Uh oh!

s3lase Jul 15, 2025

Uh oh!

gschlager Jul 22, 2025

Uh oh!

s3lase Jul 15, 2025

Uh oh!

gschlager Jul 22, 2025

Uh oh!

s3lase commented Jul 16, 2025

Uh oh!

gschlager commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

DEV: Add Migrations::SetStore to work with nested sets of data #33593

DEV: Add Migrations::SetStore to work with nested sets of data #33593

Conversation

gschlager commented Jul 13, 2025

Uh oh!

s3lase Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gschlager Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

s3lase Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gschlager Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

s3lase commented Jul 16, 2025

Uh oh!

gschlager commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

DEV: Add `Migrations::SetStore` to work with nested sets of data #33593

DEV: Add `Migrations::SetStore` to work with nested sets of data #33593