TL;DR: In this article, a method for automatically generating an image labeling sentence is presented. But the method comprises the following steps: extracting features of a given image to obtain image local features and image global features; finding a plurality of nearest neighbor training images in the training data set; obtaining multiple annotation statements; performing conversion processing on the annotation statement corresponding to the maximum average similarity to obtain a reference annotation statement vector; initializing the hidden layer state of the previous time step.
Abstract: The embodiment of the invention discloses a method for automatically generating an image labeling sentence. The method comprises the following steps: extracting features of a given image to obtain image local features and image global features; finding a plurality of nearest neighbor training images in the training data set; obtaining multiple annotation statements; performing conversion processing on the annotation statement corresponding to the maximum average similarity to obtain a reference annotation statement vector; initializing the hidden layer state of the previous time step; and iteratively generating an image annotation statement including a plurality of image annotation terms. The embodiment of the invention effectively improves the quality of the automatically generated imageannotation statement, and the generated image annotation statement is more in line with the human speech standard.