Questions tagged [computer-vision]
Questions related to image representation, segmentation, visual object categorization and image processing algorithms in general.
479 questions
0
votes
0
answers
40
views
Looking for suggestion on point cloud analysis for computer vision
I have a point cloud and the point cloud consists of room-like blocks where one wall is open of this rectangular box to make this as an open as entrance. The point cloud is extracted from the CAD ...
0
votes
0
answers
27
views
Is it a bad idea to use Transformer models on long-tailed datasets?
I’m working on a video classification task with a long-tailed dataset where a few classes have many samples while most classes have very few.
More specifically, my dataset has around 9k samples and 3....
1
vote
0
answers
18
views
Assembling boxes on pallets (YOLOv8n), problems with detection
I'm developing a computer vision solution to count boxes in fractional palettes.
Problem: inconsistent detections. I don't know if it's due to a lack of data, annotations, architecture, or ...
1
vote
0
answers
40
views
Why aren't Data Augmentations used when training autoencoder or neural-based compressors for images / video?
I was curious if anyone happens to know why data augmentations (like color jitter, random cropping, etc) appear to not be always used when training autoencoder or neural-based compressors for images ...
0
votes
0
answers
51
views
I want to build a few shot object detector. is YOLO a good choice?
I am currently working on the problem of creating a virtual library of toys. The preferred flow- user uploads a short video/series of photos and then, from the next session, the model can certainly ...
0
votes
0
answers
51
views
kernel to detect straight lines and edges in Canny edge images
I have run canny edge detector on an image that contains doors, beds, etc, in whose perimeters I am interested in. So in the Canny outputs I have, I can clearly see the edges of these objects clearly. ...
1
vote
0
answers
33
views
Trying to classify sequences of video frames as a particular posture for quadrupeds
I am fairly new to machine learning and I've been tinkering with transformers for a short while now.
I have written a transformer architecture that should in my opinion be able to understand why this ...
1
vote
0
answers
37
views
Is it possible to train a neural network to reconstruct total image of an object based on partial image [closed]
Lets say that I want to train a network where the input is an image of a small part of an object. For eg: image of a building with corners and some part of exterior walls and some part of roof. I want ...
0
votes
0
answers
35
views
Is Cochran's Q test appropriate for comparing a few VLMs on a dataset with more than one question per image?
I have a dataset of N images, k questions per image, Y/N answer for each question. I want to compare the accuracy of m VLMs (Vision Language Models) over this dataset: these are models, similar to the ...
1
vote
0
answers
104
views
Reconstruct images with CLIP image embedding
I recently started working on a project that solely uses the semantic knowledge of image embedding that is encoded from a CLIP-based model (e.g., SigLIP) to reconstruct a semantically similar image.
...
0
votes
0
answers
61
views
Model comparison and experimentation for a thesis result
We are conducting a study to compare the accuracy of two computer vision models:
Model A: Trained on a non-augmented dataset of 11,200 real-world images.
Model B: Trained on an augmented dataset ...
0
votes
0
answers
65
views
Why does the image classification model perform worse when augmenting only minority class
I have a problem of data imbalance (1:10 ratio) for image classification tasks.
To cope with it, I tried different imbalance training strategies, including weighted loss function, different loss ...
4
votes
1
answer
272
views
Does shuffling the training data cause information leakage in a time-series model with image sequences?
I am working on a predictive model for solar power production based on image sequences captured at 10-minute intervals. A single example my model receives as input consists of a sequence of images. My ...
0
votes
0
answers
100
views
Captioning an average of image set
I'm looking for a captioning model that would be able to describe a group of images in a single sentence. Alternatively, I need a way to conceptually average a group of images before feeding that &...
1
vote
0
answers
41
views
How to perform user assisted image segmentation using Gaussian Mixture Models?
I have a general idea of Gaussian Mixture Models. My understanding:
GMM is a way of clustering data points which, unlike K means clustering, soft assigns them under different distributions by ...