0
$\begingroup$

I'm looking for a captioning model that would be able to describe a group of images in a single sentence. Alternatively, I need a way to conceptually average a group of images before feeding that "concept" (presumably a feature vector) to a regular captioning model.

Why?

For Lora training evaluation. It would be useful to test the trained generation model on a prompt that would fit the dataset as a whole instead of selecting captions of single images or trying to find what's common among them. Moreover, this would also allow to produce a single negative prompt to test how model behaves on out of scope prompts.

What I've done so far: I've modified an existing CLIP+BLIP interrogator to work with image sets (it can also produce negatives). However, while CLIP captioning allows averaging image features before using them to choose the best caption, it's far less accurate than captions produced by BLIP, which only works with single images. I need a model that would take in feature vectors like CLIP so I can preprocess them.

$\endgroup$
5
  • $\begingroup$ Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. $\endgroup$ Commented Nov 10, 2024 at 19:25
  • $\begingroup$ outline: have a model to turn each picture into a feature vector. have another model to ingest that set of feature vectors, and emit text. $\endgroup$ Commented Nov 15, 2024 at 14:30
  • $\begingroup$ see also: ai.stackexchange.com/questions/47010/… $\endgroup$ Commented Nov 17, 2024 at 12:54
  • $\begingroup$ You're just repeating what I said, i need a model which would be able to do that. So far they don't ingest feature vectors, only images. $\endgroup$ Commented Jan 14 at 7:44
  • 1
    $\begingroup$ autoencoders turn unlabeled images into vectors. Grouping similar vectors is clustering. If you want to differentiate between groups, you could train a contrastive model like triplet-loss to get feature vectors. $\endgroup$ Commented Jan 15 at 1:18

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.