ChatGPT’s real-time video functionality enables the AI to process visual data by accessing a smartphone’s camera. Back in May, GPT-4o gave a sneak peek with the feature when it demonstrated the ability to observe a game and explain what was going on in a conversational fashion.
Last Thursday, OpenAI launched ChatGPT’s Advanced Voice Mode with Vision functionality. All ChatGPT Plus, Team, and Pro subscribers will have access to the feature, which enables the artificial intelligence (AI) chatbot to use the smartphone’s camera to gather visual data about the user’s surroundings which now includes real-time video and screen sharing capabilities on its Advanced Voice Mode. The capability was shown by the AI company during the “OpenAI’s 12 of Shipmas” event. The functionality also uses GPT-4o’s capabilities to give speech responses in real time based on what the camera is showing. During ChatGPT’s Spring Updates event in May, the company first revealed its vision.
Advanced Voice is rolling out to all Plus and Team users in the ChatGPT app over the course of the week.
While you’ve been patiently waiting, we’ve added Custom Instructions, Memory, five new voices, and improved accents.
It can also say “Sorry I’m late” in over 50 languages. pic.twitter.com/APOqqhXtDg
— OpenAI (@OpenAI) September 24, 2024
On the sixth day of OpenAI’s 12-day feature release plan, the new ChatGPT feature went live. The complete versions of the o1 model, the Sora model for video production, and a new Canvas tool have all been made available thus far by the AI company. Users can now allow the AI to view their environment and ask queries based on it by using the Advanced Voice mode with Vision capability.
Users can ask ChatGPT for advice or help with a challenging arithmetic issue by sharing their screen with it. On the bottom left, next to the ChatGPT chat bar, they will notice a video icon that they can click to begin the video. They can hit the three-dot menu and select “Share Screen” if they wish to share their screen.
Users can also show the AI their refrigerator and request recipes, or they can show the AI their closet and request dress suggestions. Additionally, they can pose inquiries to the AI while pointing out a landmark outside. When combined with the chatbot’s emotive Advanced Voice mode and minimal latency, this function facilitates natural language interaction between users.
The ability of GPT-4o to observe a game and explain what was happening in a conversational fashion was the first hint at the functionality back in May. However, OpenAI allegedly thought the vision component wasn’t ready for production, thus it was repeatedly delayed.
Although some ChatGPT users were able to access Advanced Voice Mode in September, it was still without vision then.
During a presentation, the OpenAI team members presented multiple persons and engaged with the chatbot while the camera was turned on. After that, even when those individuals weren’t on the screen, the AI could respond to a quiz about them. This demonstrates that the vision mode has memory as well, however the duration of the memory was not specified by the manufacturer.
ChatGPT Teams, Plus, and Pro users may now access the feature on iOS and Android mobile apps. ChatGPT Advanced Voice with Vision will be available to Enterprise and Edu members will be able to utilize it starting in early 2025, possibly January. However, a few EU nations (Switzerland, Iceland, Norway, and Liechtenstein) will not have access to sophisticated voice mode, but the majority of Plus and Pro users will.
When the feature is available to users, they can tap on the Advanced Voice icon in the ChatGPT mobile app. They will now see a video option in the new interface, and tapping it will allow the AI to access the user’s camera stream. Additionally, by tapping the three dots menu, you may access the Screenshare feature.
The AI will be able to see the user’s device and whatever screens or apps they use thanks to the screenshare capability. In this manner, the chatbot may also assist users with questions and problems pertaining to smartphones. Notably, OpenAI stated that the capability will be available to all Team subscribers in the upcoming week in the most recent iteration of the ChatGPT mobile app.
Google also revealed a similar feature a few days ago as part of Project Astra, which is currently undergoing Android testing.
To commemorate the holiday season, OpenAI also introduced “Santa Mode” as a default voice in ChatGPT.
Discover more from TechBooky
Subscribe to get the latest posts sent to your email.