According to Meta, it is the first artificial intelligence model capable of linking information from six modalities.
Meta unveiled a new artificial intelligence tool called ImageBind, which will be used to generate images, but not only through text, since audio that is shared on the platform could be used.
According to Meta, it is the first artificial intelligence model capable of linking information from six modalities. For example, using ImageBind, Meta’s Make A Scene could create images from audio, such as creating an image based on the sounds of a rainforest or a bustling market.
Other future possibilities include more precise ways to recognize, connect and moderate content, and to drive creative design, such as generating richer media more seamlessly and creating broader multi-modal search capabilities.
ImageBind is part of Meta’s efforts to create multi-modal AI systems that learn from all possible types of data around them.
As the number of modalities increases, ImageBind opens the floodgates for researchers to try to develop new holistic systems, such as combining 3D sensors and IMUs to design or experience immersive virtual worlds.
ImageBind could also provide a rich way to explore memories: search for images, videos, audio files, or text messages using a combination of text, audio, and image,” Meta noted.
Meta explained that by recognizing the relationships between these modalities (images and video, audio, text, depth, thermal and inertial measurement units (IMU)), this breakthrough helps advance artificial intelligence by allowing machines to better analyze many shapes. different pieces of information together.