As large quantities of multimodal data are being uploaded incessantly onto the Internet, the existing cross-modal retrieval methods can hardly be adapted to the ever-increasing data. The most direct solution is to conduct retraining of or microscopic adjustment to the retrieval models on a regular basis in light of the accumulated data. However, retraining of and microscopic adjustment to models may lead to the invalidity of the features of previously retrieved models, and the retrieval of features will incur huge computing expenses. With reference to the human capacity of continuous learning, the team of Research Fellow Zhang Huaiwen from the College of Computer Science (College of Software) of IMU proposed a mechanism of continuous online learning to formalize the challenge of data increase faced by cross-modal retrieval system and innovatively developed a method of continuous cross-modal retrieval based on the mechanism of continuous online learning.
Fig. 1: Model framework
In the process of continuous learning of incrementing data, method of cross-modal correlation consistency is proposed, solution is put forward to address the challenge of deviation of correlation in continuous online learning through studying relationship among different conversation samples, method of semantic representation coordination is put forward, and the semantic confusion in cross-conversation multi-modal representation is alleviated through absorption of cross-conversation sample knowledge. Results of experiments indicate that the project has provided an effective method of continuous cross-modal retrieval, significantly upgraded the performance of retrieval of cross-modal retrieval models in handling ever-increasing multi-modal data, and at the same time furnished a new direction of research of cross-modal retrieval.
The thesis has been adopted by ACM International Conference on Multimedia (ACM MM 2023), a Class A multimedia international conference recommended by China Computer Federation with the title C2MR: Continual Cross-Modal Retrieval for Streaming Multi-modal Data. The thesis has Research Fellow Zhang Huaiwen and Doctoral Student Yang Yang enrolled in 2021 of the College of Computer Science (College of Software) of IMU as the first and second authors respectively. The project is sponsored by the Youth Program of the National Natural Science Foundation of China, the Steed Plan of IMU and the Program of Young Talents in Science and Technology in Colleges and Universities of Inner Mongolia Autonomous Region.