During Google’s annual developer conference, Google I/O, Demis Hassabis, the head of Google DeepMind and the leader of Google’s AI efforts unveiled a version of what he envisions as the universal AI assistant. It is a real-time, multimodal AI assistant that can view the world, remember what items are and where you left them, and assist you with nearly anything, it’s known by Google as Project Astra. An Astra user at Google’s London office asks the system to identify a portion of a speaker, locate their missing glasses, review code, and more in an exceedingly amazing demo video that Hassabis assures is not manipulated or doctored in any way. Everything functions almost instantly and conversationally.
Among the several announcements made by Gemini during this year’s I/O is Astra. The Gemini 1.5 Flash is a new model that is intended to perform typical tasks like summarizing and captioning more quickly. With a text prompt, Veo, another new model, can produce video. It’s also said that the Gemini Nano, which is meant to be used locally on gadgets like your phone, is faster than before. Google says the model is more adept at following instructions than ever before, and the context window for Gemini Pro—which indicates how much information the model can evaluate in a particular query—is doubling to 2 million tokens. Google is moving quickly to get both the models and their user experience in front of users.
According to Hassabis, the future of AI will be more about what the models can do for you than it will be about the models themselves. The main theme of that novel is agents—bots that carry out tasks on your behalf in addition to conversing with you. “We have a longer history with agents than with generalized models,” he remarks, citing the roughly ten-year-old AlphaGo system that played games. He believes that although some of those agents will be extremely straightforward instruments for doing tasks, others will function more as companions and collaborators. At some point, he believes, “It might even come down to personal preference and understanding your context.”
According to Hassabis, the future of AI will be more about what the models can do for you than it will be about the models themselves. The main theme of that novel is agents—bots that carry out tasks on your behalf in addition to conversing with you. “We have a longer history with agents than with generalized models,” he remarks, citing the roughly ten-year-old AlphaGo system that played games. He believes that although some of those agents will be extremely straightforward instruments for doing tasks, others will function more as companions and collaborators. At some point, he believes, “It might even come down to personal preference and understanding your context.”
Several of Google’s AI-related announcements at I/O focused on expanding and streamlining the Gemini app. With the Gemini Live product, you can easily engage in back-and-forth talks with a voice-only assistant. The assistant can interrupt lengthy chats or refer back to previous sections of the conversation. With Google Lens, you can now search the web by recording a video and describing it. Gemini’s big context window, which allows it to access a vast amount of information at once, is largely responsible for this, according to Hassabis, and it’s essential for creating a natural and comfortable interaction between you and your assistant.
By the way, do you know who concurs with that assessment? OpenAI has been discussing AI tools for some time now. It recently demonstrated a product that looked a lot like Gemini Live. Google and OpenAI seem to have similar ideas about how AI might impact people’s lives and be used in the future, as they are vying for the same market share.
What precisely is the purpose of those assistants, and how will users put them to use? Even Hassabis is unsure about the answer. Google is currently concentrating on trip planning; to that end, it developed a new tool that allows you to use Gemini to create an itinerary for your vacation, which you can subsequently amend in tandem with the assistant. Many more features along those lines will eventually be added. Although he believes that phones and glasses will be essential tools for these agents, Hassabis concedes that “there is probably room for some exciting form factors.”
Astra is currently in the early stages of development and is just one possible interface for a system similar to Gemini. The DeepMind team is still investigating the optimal ways to integrate multimodal models and strike a compromise between extremely large general models and smaller, more targeted ones.
We still live very much in the “speeds and feeds” era of AI, when parameter sizes are our obsession and every incremental model counts. But Hassabis predicts that we will start asking other questions about AI quite soon. Questions concerning the capabilities, methods, and ways in which these assistants can improve our lives. Although technology is far from perfect it is improving rapidly.
Discover more from TechBooky
Subscribe to get the latest posts sent to your email.