Skip to content

DEV: Add Migrations::SetStore to work with nested sets of data #33593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 24, 2025

Conversation

gschlager
Copy link
Member

There are concrete implementations for a simple set, a key-value store, and nested sets with 2 or 3 keys. The API stays the same for all implementations and the performances is more or less the same as without the wrapper (at least with YJIT enabled).

@github-actions github-actions bot added the migrations-tooling PR which includes changes to migrations tooling label Jul 13, 2025
end

def include?(key, value)
h = @store[key] or return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause new entries to be created for missing keys. I think include? shouldn't have such a side effect. Perhaps a @store.key?(key) might be better here and the other implementations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that. That snuck in in a last-minute refactor.

end

def bulk_add(records)
records.each { |record| @store[record[0]][record[1]][record[2]].add(record[3]) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit: Since our source records are often hierarchical and can be sorted by key, caching intermediate hash lookups might speed things up. I saw a ~2.4x improvement in testing. The downside is that performance worsens with unsorted records. So if we optimize this, we’ll need to ensure requires_set queries are always ordered, or pay the cost of sorting here first, which might offset the gains

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I think we can require records to be sorted. And a quick benchmark of my implementation shows that even for unsorted data, the runtime stays roughly the same.

@s3lase
Copy link
Contributor

s3lase commented Jul 16, 2025

I've updated the requires_set implementation to use it #33626

There are concrete implementations for a simple set, a key-value store, and nested sets with 2 or 3 keys. The API stays the same for all implementations and the performances is more or less the same as without the wrapper (at least with YJIT enabled).
* Replaced Hash default proc with `||=` for better performance and to prevent unintended key creation when using the `[]` operator. This change slightly reduces readability, but the performance gain justifies it.

* Optimized `bulk_add` by assuming input `records` are sorted by keys. Performance for unsorted input remains roughly unchanged.
@gschlager
Copy link
Member Author

I pushed a commit with performance improvements:

  • Replaced Hash default proc with ||= for better performance and to prevent unintended key creation when using the [] operator. This change slightly reduces readability, but the performance gain justifies it.

  • Optimized bulk_add by assuming input records are sorted by keys. Performance for unsorted input remains roughly unchanged.

@gschlager gschlager requested a review from s3lase July 23, 2025 20:55
@gschlager gschlager merged commit 53d7a75 into main Jul 24, 2025
4 of 5 checks passed
@gschlager gschlager deleted the mt/set_store branch July 24, 2025 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
migrations-tooling PR which includes changes to migrations tooling
Development

Successfully merging this pull request may close these issues.

2 participants