Skip to content

docs(cluster): enhance DBSCAN docstrings with clearer parameter guida… #31835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sape94
Copy link

@sape94 sape94 commented Jul 25, 2025

…nce and algorithm description

Reference Issues/PRs

N/A - Documentation improvement

What does this implement/fix? Explain your changes.

This PR improves the documentation quality of the DBSCAN clustering algorithm by enhancing docstrings with clearer explanations and practical guidance for users.

Key improvements:

Enhanced algorithm description: Added comprehensive explanation of what DBSCAN does, its advantages over K-means (no need to specify cluster count, finds arbitrary shapes, identifies outliers), and when to use it.
Improved parameter descriptions with practical guidance:

eps: Added guidance on how parameter values affect clustering results (smaller values → more clusters)
min_samples: Clarified relationship between values and cluster density
algorithm: Explained what 'auto' does (attempts to decide most appropriate algorithm)
leaf_size: Added explanation of speed vs construction time trade-offs
p: Added concrete examples (p=1 for Manhattan, p=2 for Euclidean distance)
X: Added clarification that precomputed distance matrices must be square and symmetric
n_jobs: Fixed minor typo ("precomputed distance" → "precomputed distances")

Better return value descriptions:

labels: Clarified that non-negative integers indicate cluster membership
Enhanced fit_predict method description to explain efficiency benefits over calling fit(X).labels_

Consistency improvements: Made parameter descriptions more consistent between the dbscan function and DBSCAN class.

These changes make the API more accessible to users, especially those new to density-based clustering, by providing clearer guidance on parameter selection and expected behavior.

Any other comments?

The changes are purely documentation improvements with no functional changes to the algorithm implementation. All existing functionality and API behavior remain unchanged. The improvements should help users better understand how to tune DBSCAN parameters for their specific use cases.

Copy link

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 26b8fa0. Link to the linter CI: here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant