docs(cluster): enhance DBSCAN docstrings with clearer parameter guida… #31835
+35
−11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…nce and algorithm description
Reference Issues/PRs
N/A - Documentation improvement
What does this implement/fix? Explain your changes.
This PR improves the documentation quality of the DBSCAN clustering algorithm by enhancing docstrings with clearer explanations and practical guidance for users.
Key improvements:
Enhanced algorithm description: Added comprehensive explanation of what DBSCAN does, its advantages over K-means (no need to specify cluster count, finds arbitrary shapes, identifies outliers), and when to use it.
Improved parameter descriptions with practical guidance:
eps: Added guidance on how parameter values affect clustering results (smaller values → more clusters)
min_samples: Clarified relationship between values and cluster density
algorithm: Explained what 'auto' does (attempts to decide most appropriate algorithm)
leaf_size: Added explanation of speed vs construction time trade-offs
p: Added concrete examples (p=1 for Manhattan, p=2 for Euclidean distance)
X: Added clarification that precomputed distance matrices must be square and symmetric
n_jobs: Fixed minor typo ("precomputed distance" → "precomputed distances")
Better return value descriptions:
labels: Clarified that non-negative integers indicate cluster membership
Enhanced fit_predict method description to explain efficiency benefits over calling fit(X).labels_
Consistency improvements: Made parameter descriptions more consistent between the dbscan function and DBSCAN class.
These changes make the API more accessible to users, especially those new to density-based clustering, by providing clearer guidance on parameter selection and expected behavior.
Any other comments?
The changes are purely documentation improvements with no functional changes to the algorithm implementation. All existing functionality and API behavior remain unchanged. The improvements should help users better understand how to tune DBSCAN parameters for their specific use cases.