The AI model was trained with a huge dataset of images paired with word tags, and each tag was mapped to a distinct object in an image. Then the researchers fine tuned the pre trained model for captioning on the already captioned images dataset. The training process helped the model to learn how to compose a sentence. The new AI model leverages the visual vocabulary to generate captions for images containing novel objects accurately. While the researchers admit the AI isnt perfect, but its two...

Read the full article at WinBeta