• Cryptocurrency
  • Earnings
  • Enterprise
  • About TechBooky
  • Submit Article
  • Advertise Here
  • Contact Us
TechBooky
  • African
  • AI
  • Metaverse
  • Gadgets
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
  • African
  • AI
  • Metaverse
  • Gadgets
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
TechBooky
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Home Artificial Intelligence

OpenAI Launches New Audio Models for Agentic Workflows

Akinola Ajibola by Akinola Ajibola
March 22, 2025
in Artificial Intelligence
Share on FacebookShare on Twitter

“With releases like Operator, Deep Research, Computer-Using Agents, and the Responses API with built-in tools, we’ve invested in advancing the intelligence, capabilities, and usefulness of text-based agents—or systems that independently accomplish tasks on behalf of users—over the past few months.” But for agents to be truly useful, people must be able to have deeper, more intuitive interactions with agents beyond just text—using natural spoken language to communicate effectively.

New audio models with enhanced accuracy and dependability were released by OpenAI on Thursday through the application programming interface (API). Three new artificial intelligence (AI) models for text-to-speech (TTS) and speech-to-text transcription were launched by the San Francisco-based AI company. According to the business, developers will be able to create apps with agentic workflows thanks to these models. Additionally, it said that companies may use the API to automate tasks similar to customer service. Interestingly, the company’s GPT-4o and GPT-4o small AI models serve as the foundation for the new models.

OpenAI is introducing new speech-to-text and text-to-speech audio models in the API today, which will enable the development of more potent, adaptable, and intelligent voice agents that provide tangible benefits. Our most recent voice-to-text models surpass current solutions in accuracy and dependability, setting a new bar for the state of the art, particularly in difficult situations with accents, loud surroundings, and variable speech rates. The models are particularly well-suited for use cases like customer call centres, meeting note transcription, and more because of these enhancements, which also raise transcription reliability.

In a blog post, the AI firm outlined the new API-specific AI models. The business stated that throughout the years it has developed numerous AI agents such as Operator, Deep Research, Computer-Using Agents, and the Responses API with built-in tools. It did add, though, that agents’ full potential won’t be realized until they are able to communicate and function intuitively in contexts other than text.

Three new audio models are available. The speech-to-text models are GPT-4o-transcribe and GPT-4o-mini-transcribe, whereas the GPT-4o-mini-tts is a TTS model as the name implies. According to OpenAI, these models perform better than the company’s current Whisper models, which were introduced in 2022. The new models, however, are not open-source like the earlier ones.

The AI company claimed that the GPT-4o-transcribe exhibits enhanced “word error rate” (WER) performance on the Few-shot Learning Evaluation of Universal Representations of Speech (FLEURS) benchmark, which evaluates AI models on multilingual speech in 100 different languages. According to OpenAI, the enhancements were brought about by focused training methods including reinforcement learning (RL) and in-depth mis-training using high-quality audio datasets.

Even in difficult situations including loud surroundings, strong accents, and different speaking rates, these speech-to-text algorithms are able to record audio.

Significant enhancements are also included in the GPT-4o-mini-tts model, which notably only provides artificial and preset voices. According to the AI company, the models can speak with customizable inflections, intonations, and emotional expressiveness, allowing developers to create applications that can be used for a variety of tasks, such as customer service and creative storytelling.

The GPT-4o-based audio model will cost $40 per million input tokens and $80 per million output tokens, according to OpenAI’s API pricing page. However, the GPT-4o mini-based audio models will cost $10 for every million input tokens and $20 for every million output tokens.

Developers may now access all of the audio models using the API. To assist users in creating speech agents, OpenAI is now making available an interface with its Agents software development kit (SDK).

OpenAI tells more about the technical innovations which is behind the models

  • Utilizing Real Audio Datasets for Pretraining

In order to maximize model performance, our new audio models are heavily pretrained on specific audio-centric datasets, building on the GPT‑4o and GPT‑4o-mini architectures. This focused method allows for outstanding performance on a variety of audio-related activities and offers a greater understanding of speech subtleties.

  • Sophisticated Techniques for Distillation

By improving our distillation methods, we are able to transfer knowledge from our biggest audio models to more manageable, smaller models. By utilizing sophisticated self-play techniques, our distillation datasets successfully replicate authentic user-assistant interactions by capturing realistic conversational dynamics. This enables our smaller models to provide outstanding responsiveness and conversational quality.

  • The Concept of Reinforcement Learning

We’ve included a reinforcement learning (RL)-heavy paradigm for our speech-to-text models, achieving state-of-the-art transcription accuracy. Our voice-to-text solutions are incredibly competitive in difficult speech recognition settings because of this technology, which significantly increases precision and decreases hallucination.

These advancements mark a step forward in the field of audio modelling, fusing cutting-edge techniques with useful improvements to improve voice application performance.

Related Posts:

  • -1x-1 (3)
    OpenAI Launches Tools for Building Corporate AI Agents
  • 2024-10-29t164225z_1_lynxmpek9s0q0_rtroptp_3_openai-funding-startups
    OpenAI Plans AI Agents for Computer Automation
  • OpenAI-Rethinks-Approach-Amid-Slower-‘GPT-Improvements
    ChatGPT Updates Signal OpenAI's Push Toward AI Agents
  • gettyimages-2205145445
    Oracle Lets Companies Build AI Agents Without Coding
  • Microsoft-datacenter-cold-aisle-server-racks-for-the-AMD-MI300X
    Microsoft Prepares for OpenAI's GPT-5 Launch
  • 1743007911191
    Microsoft Adds 'Deep Reasoning' to Copilot AI for…
  • Apple-Intelligence-860×488
    Gemini and ChatGPT Lead Apple by Two Years in AI Race
  • screenshot-2024-05-14-at-1-42-51pm
    Google Unveils Project Astra, The Future Of AI Assistants

Discover more from TechBooky

Subscribe to get the latest posts sent to your email.

Tags: AIaudio modelsopenai
Akinola Ajibola

Akinola Ajibola

BROWSE BY CATEGORIES

Select Category

    Receive top tech news directly in your inbox

    subscription from
    Loading

    Freshly Squeezed

    • AI Helps Google One Reach 150 Million Subscribers May 16, 2025
    • FT Lists Paymenow, TymeBank & Omnisient Among Africa’s Fastest-Growing Firms May 16, 2025
    • MoonPay and Mastercard Partner to Advance Stablecoin Payments May 16, 2025
    • Google Gemini Advanced Users Can Now Link to GitHub May 16, 2025
    • TikTok Accused of Violating EU Internet Content Rules May 15, 2025
    • Activists and Users Criticize NCC & Telcos Over Customer Penalties May 15, 2025

    Browse Archives

    May 2025
    MTWTFSS
     1234
    567891011
    12131415161718
    19202122232425
    262728293031 
    « Apr    

    Quick Links

    • About TechBooky
    • Advertise Here
    • Contact us
    • Submit Article
    • Privacy Policy

    Recent News

    AI Helps Google One Reach 150 Million Subscribers

    AI Helps Google One Reach 150 Million Subscribers

    May 16, 2025
    FT Lists Paymenow, TymeBank & Omnisient Among Africa’s Fastest-Growing Firms

    FT Lists Paymenow, TymeBank & Omnisient Among Africa’s Fastest-Growing Firms

    May 16, 2025
    MoonPay and Mastercard Partner to Advance Stablecoin Payments

    MoonPay and Mastercard Partner to Advance Stablecoin Payments

    May 16, 2025
    Google Gemini Advanced Users Can Now Link to GitHub

    Google Gemini Advanced Users Can Now Link to GitHub

    May 16, 2025
    TikTok Accused of Violating EU Internet Content Rules

    TikTok Accused of Violating EU Internet Content Rules

    May 15, 2025
    Activists and Users Criticize NCC & Telcos Over Customer Penalties

    Activists and Users Criticize NCC & Telcos Over Customer Penalties

    May 15, 2025
    • Login

    © 2021 Design By Tech Booky Elite

    Generic selectors
    Exact matches only
    Search in title
    Search in content
    Post Type Selectors
    • African
    • Artificial Intelligence
    • Gadgets
    • Metaverse
    • Tips
    • About TechBooky
    • Advertise Here
    • Submit Article
    • Contact us

    © 2021 Design By Tech Booky Elite

    Discover more from TechBooky

    Subscribe now to keep reading and get access to the full archive.

    Continue reading

    We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.Ok