Skip to content

Fixes: #12711 Added sparse vector support in Oracle #12712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

suraj-ora-2020
Copy link
Contributor

Description

A Sparse vector is a vector which has zero value for most of its dimensions.
Sparse vectors are supported in Oracle Database 23.7 and later.

This PR extends vector support by adding functionality for sparse vectors in Oracle

Fixes #12711

Checklist

This pull request is:

  • A documentation / typographical / small typing error fix
    • Good to go, no issue or tests are needed
  • A short code fix
    • please include the issue number, and create an issue if none exists, which
      must include a complete example of the issue. one line code fixes without an
      issue and demonstration will not be accepted.
    • Please include: Fixes: #<issue number> in the commit message
    • please include tests. one line code fixes without tests will not be accepted.
  • A new feature implementation
    • please include the issue number, and create an issue if none exists, which must
      include a complete example of how the feature would look.
    • Please include: Fixes: #<issue number> in the commit message
    • please include tests.

Have a nice day!

@CaselIT
Copy link
Member

CaselIT commented Jul 3, 2025

Hi, can you fix the link error? after that I'll add it to gerrit to

thanks.

You can install pre-commit to run the linters

@suraj-ora-2020 suraj-ora-2020 force-pushed the sparse_support1 branch 3 times, most recently from f4ae7d1 to d052522 Compare July 3, 2025 12:24
@suraj-ora-2020
Copy link
Contributor Author

I have fixed them. Thanks.

@CaselIT CaselIT requested a review from sqla-tester July 3, 2025 12:35
Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of CaselIT to try to get revision d052522 of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

New Gerrit review created for change d052522: https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/6023

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federico Caselli (CaselIT) wrote:

left a suggestion regarding an improvement on the typing of the sparse array.

Also a question considering typing etc? Wouldn't this make more sense as another type? So VECTOR could be typed as VECTOR(TypeEngine[list[Any]]) and an ipotetical different class SparseVector could be typed as SparseVector(TypeEngine[SparseValue])?

again just a suggestion

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/6023


from sqlalchemy import insert, select

sparse_val = (10, [1, 2], array.array("d", [23.45, 221.22]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federico Caselli (CaselIT) wrote:

we are adding python side objects to ease usage of more advantage types, like Range or BitString in postgresql. links https://docs.sqlalchemy.org/en/20/dialects/postgresql.html#sqlalchemy.dialects.postgresql.Range and https://docs.sqlalchemy.org/en/21/dialects/postgresql.html#sqlalchemy.dialects.postgresql.BitString

Maybe we could provide a class for this? Or a named tuple? Something like

class SparseValue(NamedTuple):
  num_dimensions: int
  indexes: Sequence[int]
  values: array.array[Any] | np.ndarray  # assuming this is supported by the driver

This could be used also as the return type of sparse array columns?

Threat this as a suggestion, it's likely ok as is

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/6023

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CaselIT Including the SparseVector object in sqlalchemy.dialects.oracle seems reasonable to me.

Also importing the value directly from oracledb also feels like a sensible approach—I’d go ahead with that change. The only consideration is that, with this setup, users will need to explicitly import oracledb and then create the object using oracledb.SparseVector().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably alias that in the oracle dialect module.

@zzzeek would you be ok doing a try/expect at module lever to try and import that oracledb class and otherwise use a function that would error when invoked?
I know that's not how we usually do this, but it may make sense in this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MMMmmmm I'm a pretty strong -1 on having SQLAlchemy users need to import symbols from the oracledb DBAPI module directly. as you noted, this is exactly the thing we've been trying to move away from in the few cases where it was happening.

I would much prefer if we made our own SparseVector object in sqlalchemy.dialects.oracle , which would likely look very similar to oracledb's. But I think having our own object presented to the user is best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a value object returned per-row? even then I'd be +1 marshall

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's a value object returned per row.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CaselIT Based on the points raised, I see a couple of possible approaches we could take. Before proceeding, I wanted to check with you on which approach you’d prefer I follow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the easiest thing it likely to just make a simple class in sqlalchemy and marshal to-from that in the type, like mike suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've uploaded the latest changes based on your suggestion. Could you please review and let me know if anything needs to be updated?

@sqla-tester
Copy link
Collaborator

Federico Caselli (CaselIT) wrote:

code review left on gerrit

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/6023

@suraj-ora-2020
Copy link
Contributor Author

Federico Caselli (CaselIT) wrote:

left a suggestion regarding an improvement on the typing of the sparse array.

Also a question considering typing etc? Wouldn't this make more sense as another type? So VECTOR could be typed as VECTOR(TypeEngine[list[Any]]) and an ipotetical different class SparseVector could be typed as SparseVector(TypeEngine[SparseValue])?

again just a suggestion

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/6023

I feel it might be more consistent to keep Vector and SparseVector together, since SparseVector is essentially a format or variant of Vector, rather than a separate type. Treating it separately could potentially lead to redundancy or fragmentation. Would love to hear your thoughts on this approach.

@CaselIT
Copy link
Member

CaselIT commented Jul 10, 2025

It's ok for me, it was just an idea if useful from a typing prospective. We can always add such "useful for typing" subclasses later on if we get requests for it

@suraj-ora-2020 suraj-ora-2020 force-pushed the sparse_support1 branch 2 times, most recently from 4392553 to 9b542b8 Compare July 14, 2025 16:36
Fixes: sqlalchemy#12711 Added sparse vector support in Oracle

Fixes: sqlalchemy#12711 Added sparse vector support in Oracle

Fixes: sqlalchemy#12711 Added sparse vector support in Oracle

Fixes: sqlalchemy#12711 Added sparse vector support in Oracle

Fixes: sqlalchemy#12711 Added sparse vector support in Oracle

Fixes: sqlalchemy#12711 Added sparse vector support in Oracle
Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of CaselIT to try to get revision 0be516b of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

Patchset 0be516b added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/6023

@suraj-ora-2020 suraj-ora-2020 requested review from CaselIT and zzzeek July 22, 2025 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for sparse vector in oracle
4 participants