-
Notifications
You must be signed in to change notification settings - Fork 8.6k
DEV: Add Migrations::SetStore
to work with nested sets of data
#33593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
end | ||
|
||
def include?(key, value) | ||
h = @store[key] or return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause new entries to be created for missing keys. I think include? shouldn't have such a side effect. Perhaps a @store.key?(key) might be better here and the other implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that. That snuck in in a last-minute refactor.
end | ||
|
||
def bulk_add(records) | ||
records.each { |record| @store[record[0]][record[1]][record[2]].add(record[3]) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit: Since our source records are often hierarchical and can be sorted by key, caching intermediate hash lookups might speed things up. I saw a ~2.4x improvement in testing. The downside is that performance worsens with unsorted records. So if we optimize this, we’ll need to ensure requires_set
queries are always ordered, or pay the cost of sorting here first, which might offset the gains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I think we can require records
to be sorted. And a quick benchmark of my implementation shows that even for unsorted data, the runtime stays roughly the same.
I've updated the |
There are concrete implementations for a simple set, a key-value store, and nested sets with 2 or 3 keys. The API stays the same for all implementations and the performances is more or less the same as without the wrapper (at least with YJIT enabled).
* Replaced Hash default proc with `||=` for better performance and to prevent unintended key creation when using the `[]` operator. This change slightly reduces readability, but the performance gain justifies it. * Optimized `bulk_add` by assuming input `records` are sorted by keys. Performance for unsorted input remains roughly unchanged.
I pushed a commit with performance improvements:
|
There are concrete implementations for a simple set, a key-value store, and nested sets with 2 or 3 keys. The API stays the same for all implementations and the performances is more or less the same as without the wrapper (at least with YJIT enabled).