Questions tagged [k-means]
k-means is a method to partition data into clusters by finding a specified number of means, k, s.t. when data are assigned to clusters w/ the nearest mean, the w/i cluster sum of squares is minimized
1,053 questions
0
votes
0
answers
30
views
Selecting the number of K-means clusters during the feature selection
I'm using longitudinal data to answer many questions, some of them which aren't fitted for longitudinal data: I just wish I could use a single observation of each of them. My variables are mostly ...
0
votes
0
answers
57
views
How to peform clustering on heavily right skewed data and zero inflated data
I am currently working on clustering continuous variables (such as AOV, RPV, and conversions(conversion/visits)). The variables are heavily right skewed with long tails and one variable is dominated ...
1
vote
0
answers
54
views
Are equal and diagonal variance matrices implicitly assumed in k-means clustering?
When applying k-means clustering, I understand that the goal is to partition the dataset by assigning each point to its nearest cluster center. However, I’ve come across statements that k-means can be ...
1
vote
0
answers
72
views
"How to validate if a dataset has natural clusters?"
I've recently learnt unsupervised learning methods such as KMeans and DBSCAN.
While working on this dataset, I applied KMeans clustering but faced the following issues: The Elbow Method showed no ...
3
votes
2
answers
577
views
How can I apply KMeans clustering if all variables are highly uncorrelated
I'm applying K-Means clustering to a dataset of ship voyages. The goal is to group voyages into performance-based clusters like cost-efficient, underperforming, etc.
I have 12 features in total:
10 ...
0
votes
0
answers
85
views
K-means clustering 1D proof for intervals
I am supposed to prove that given sorted data points such that $X_1 \leq X_2 \leq \dots X_n$ in an optimal cluster assignment each cluster corresponds to some interval of points.
Or in other words - ...
0
votes
1
answer
104
views
How to get a smaller number of optimal K in K-means clustering
I want to obtain a small optimal value of $k$ (with $k ≤ 5$) for k-means clustering on a dataset of size $5000$. I have used the BIC and the Gap statistic to determine the optimal number of clusters, ...
2
votes
0
answers
87
views
K-means cost function
In Elements of Statistical Learning (ESL), they state in equation (14.31) that the k-Means objective function is
$$W(C) = \sum_{k=1}^KN_k \sum_{C(i)=k} ||x_i - \bar{x}_k||^2$$
where $K$ is the number ...
1
vote
0
answers
63
views
How to cluster/handle different length coefficient vectors of B-splines in K-means clustering
I want to cluster a dataset based on blood pressure (BP) measurements taken at 3 or 4-time points. For that, I’m modeling each individual’s BP trajectory using quadratic B-splines (via ...
1
vote
0
answers
105
views
Insights into median of means estimators in DiD designs
I’m currently working on a research project focused on robust estimators in Difference-in-Differences (DiD) designs, specifically in the classic two-period, two-group setup. My main interest is in ...
2
votes
0
answers
80
views
How to cluster based on x and y coordinates
I am trying to identify rows in groups of points using clustering algorithms. The bigger picture problem I'm trying to solve is to identify shelves given x and y coordinates of products. I can cluster ...
0
votes
0
answers
51
views
Identify predictors for clustering output?
I have a dataset with variables collected years ago, and many variables collected this year as outcome variables. I want to combine all the variables collected this year to get one outcome, e.g. ...
1
vote
0
answers
57
views
Question about running k means cluster analysis
In a previous analysis I had 3 groups of subjects - group x with 35 subjects, control group y with 25 subjects, and control group z with 25 subjects. For each group I have levels of 6 different ...
1
vote
0
answers
89
views
Question on using the elbow method for calculating ideal number of clusters for k means cluster analysis
Newb to cluster analysis here. I have a group of 35 subjects. For all of the subjects I have data for different measures of IQ (verbal, math, etc) and different biomarkers. There are 6 IQ measures in ...
1
vote
1
answer
62
views
Is this the right approach to cluster using many different evaluations on the same dimension?
I'm working on a project where I want to sort political parties into two groups. I want to do so using the answers of many respondents in a survey who indicated for each party where they see them on a ...