On Monday, researchers at Nvidia unveiled a new artificial intelligence (AI) model that has the ability to move objects within an image. The program, called DiffUHaul, can transport an object from one location to another without changing the image’s shape or backdrop since it can spatially grasp the context of the image. This technique’s distinctive feature is that it is training-free, which means that no pre-training data was used in its development. The business demonstrated the new technique at the Asia 2024 meeting of the Special Interest Group on Computer Graphics and Interactive Techniques (SIGGRAPH).
Researchers from Nvidia outlined the new AI tool in a study. The technology was created in partnership with Reichman University, Tel Aviv University, and The Hebrew University of Jerusalem. The researchers’ goal with the new tool was to address a major difficulty with AI picture generating models: the inability to spatially aware move elements within an image.
The study emphasizes how AI models’ lack of spatial reasoning has kept this specific editing task a barrier for AI researchers. Because they don’t comprehend how a movement in a 2D world would be interpreted spatially, existing visual models are only able to comprehend the context of a picture.
Nvidia says this problem can be resolved with DiffUHaul. The tool’s denoising step makes use of attention masking, which is based on picture diffusion architecture. To maintain the appearance of the high-level object, this is done. BlobGEN is a novel method that incorporates spatial awareness into the AI tool. Furthermore, using the localized model in the appropriate location, new methods were employed to reconstruct actual images.
The AI will be able to physically reposition the object and change the background in response to a text prompt that users input on the front end that highlights the object they wish to modify. The AI editing tool’s ability to comprehend the shape changes that accompany spatial movement was not evident in the company’s demonstrations. For example, the shape of an airborne balloon is altered when it is transported to the ground. However, due to a lack of training, the AI might not be able to recognize that.
DiffUHaul uses BlobGEN for spatial understanding, which enables robust object dragging without fine-tuning.
To learn more on this, visit to learn and read more about this.
Discover more from TechBooky
Subscribe to get the latest posts sent to your email.