Skip to content

FEAT: Bulk Copy Wrapper #108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: jahnvi/bulk_copy_main_apis
Choose a base branch
from

Conversation

jahnvi480
Copy link
Contributor

@jahnvi480 jahnvi480 commented Jun 25, 2025

ADO Work Item Reference

AB#34946


Summary

This pull request introduces significant enhancements to the MSSQL Python library by integrating Bulk Copy Program (BCP) functionality. This includes the creation of a new BCPWrapper class, updates to the connection handling, and modifications to the build system to support the new functionality. Below is a summary of the most important changes:

BCP Integration

  • Added a new BCPWrapper class in mssql_python/pybind/bcp/bcp_wrapper.h to encapsulate BCP operations, including methods for initialization, column format definition, data binding, and execution. The class also ensures proper resource cleanup.
  • Exposed BCPWrapper to Python via pybind11 in mssql_python/pybind/ddbc_bindings.cpp, allowing Python users to perform bulk copy operations programmatically.

Connection Enhancements

  • Updated Connection and ConnectionHandle classes in mssql_python/pybind/connection/connection.h to provide access to the database connection handle (getDbcHandle) and the associated Connection object (getConnection). These changes support the integration of BCP functionality. [1] [2]
  • Modified Connection::applyAttrsBefore in mssql_python/pybind/connection/connection.cpp to handle the SQL_COPT_SS_BCP attribute, enabling BCP operations before connecting.

Build System Updates

  • Updated CMakeLists.txt in mssql_python/pybind to include the new bcp_wrapper.cpp file and its header directory, ensuring proper compilation of the BCP functionality. [1] [2]

Driver and API Integration

  • Added function pointer declarations and definitions for BCP-related APIs (e.g., bcp_initW, bcp_control, bcp_exec) in mssql_python/pybind/ddbc_bindings.h and mssql_python/pybind/ddbc_bindings.cpp. These APIs are dynamically loaded from the ODBC driver during runtime. [1] [2]
  • Updated the driver loading logic in LoadDriverOrThrowException to include BCP function pointers, ensuring they are available for use.

Code Cleanup and Minor Fixes

  • Removed redundant self.wrapper.finish() call in mssql_python/bcp_main.py as the new BCPWrapper class handles resource cleanup internally.
  • Corrected a comment in SQLGetData_wrap to accurately describe the conversion of timestamp fractions.

Copy link
Contributor

@Ramudaykumar Ramudaykumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no explicit thread safety; if the same BCPWrapper object is used from multiple threads, state flags and ODBC calls could race.

Some methods (e.g., bcp_control int overload) throw exceptions on error, others (e.g., bcp_control wstring overload, read_format_file) just return SQL_ERROR. For a pybind11-exposed wrapper, it's better to throw exceptions for all error states, so Python users get errors, not silent failures.

Recommendation:
Add a note/documentation about thread-safety, or add a mutex if needed.
Always throw std::runtime_error on unrecoverable errors.

@github-actions github-actions bot added the pr-size: large Substantial code update label Jul 2, 2025
@github-actions github-actions bot added pr-size: large Substantial code update and removed pr-size: large Substantial code update labels Jul 4, 2025
@github-actions github-actions bot added pr-size: large Substantial code update and removed pr-size: large Substantial code update labels Jul 4, 2025
// Handle binary data
std::string binValue = data.cast<std::string>();
auto buffer = std::shared_ptr<unsigned char[]>(new unsigned char[binValue.length()]);
memcpy(buffer.get(), binValue.data(), binValue.length());

Check notice

Code scanning / devskim

There are a number of conditions in which memcpy can introduce a vulnerability (mismatched buffer sizes, null pointers, etc.). More secure alternitives perform additional validation of the source and destination buffer Note

Problematic C function detected (memcpy)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-size: large Substantial code update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants