Microsoft Research has introduced Muse, a new AI model that can create controller actions and images for video games.
On Wednesday, Microsoft researchers unveiled a new artificial intelligence (AI) model that can create 3D gameplay environments. The model, called the World and Human Action Model (WHAM) or Muse, was created by the tech giant’s Research Game Intelligence and Teachable AI Experiences (Tai X) teams in partnership with Xbox Games Studios’ Ninja Theory. According to the company, the large language model (LLM) can produce controller actions and game visuals to assist creatives in game development, as well as aiding game designers in the ideation stage.
Muse was developed in collaboration with Xbox Games Studios’ Ninja Theory and Microsoft’s Game Intelligence and Teachable AI Experiences teams. They used a lot of gameplay data from the Xbox game Bleeding Edge to train it. The model’s ability to produce intricate gaming sequences that remain steady for several minutes demonstrates how well it comprehends physics and game dynamics.
The Redmond-based tech giant described the Muse AI model in a blog post; it is currently a research product, but the company said it is open-sourcing the model’s weights and sample data for the WHAM Demonstrator, a concept prototype of a visual interface to interact with the AI model, so developers can test it out on Azure AI Foundry. A paper that describes the model’s technical aspects has been published in the Nature journal.
Microsoft is giving model weights, sample data, and a WHAM Demonstrator tool to facilitate researchers’ work on new projects. Researchers can utilize and expand upon these resources on Azure AI Foundry.
It is challenging to train a model in such a complicated field. A significant quantity of human gameplay data was gathered by Microsoft researchers from the 2020 Ninja Theory game Bleeding Edge. Seven years of human gaming, or one billion picture action pairs, were used to train the LLM. It is claimed that the information was gathered in an ethical manner and is exclusively utilized for study.
One of the biggest challenges, according to the researchers, was scaling up the model training. Muse was first trained on a group of Nvidia V100 GPUs before being expanded to many Nvidia H100 GPUs.
When it comes to functionality, the Muse AI model can take both visual and textual cues. Furthermore, controller actions may be used to significantly improve a game environment once it has been created. In response to the user’s moves, the AI creates new surroundings that are compatible with the rest of the game and the original prompt.
Since it is a unique AI model, its capabilities cannot be accurately assessed by standard benchmark tests. The LLM has been internally validated on parameters including consistency, variety, and persistence, the researchers noted. The outputs have been restricted to a resolution of just 300x180p because the model is primarily focused on research.
The team experienced problems when trying to scale up model training and make the most of inputs from the LLM community. They focused on a multidisciplinary approach, incorporating game designers from diverse backgrounds to help shape the technology from the outset.
Muse can generate a range of gameplay options from a single starting point, incorporate user modifications into the gameplay, and generate gameplay sequences that adhere to the game’s rules. It might help with game concepts and result in new AI-based gaming experiences.
Based on a 3D knowledge of game environments, this AI can produce game images and controller actions, as described in a Nature publication by Katja Hofmann, Senior Principal Research Manager and head of the Game Intelligence team.
Muse was trained on over 1 billion pictures and controller actions from Ninja Theory’s Bleeding Edge using WHAM-1.6B with 1.6 billion parameters, which is comparable to more than 7 years of nonstop human gameplay.
Microsoft Research and Ninja Theory, both located in Cambridge, UK, partnered to create Muse. The model, which was trained on the 2020 4v4 online game Bleeding Edge, employs gameplay data (visuals and controller actions at 300×180 px resolution) that has been ethically acquired with user agreement under the End User License Agreement.
“I’m incredibly proud of our teams and the milestone we have achieved,” said Katja Hofmann, emphasizing Muse’s capacity to pick up complex game formats. Microsoft is making the weights, example data, and WHAM Demonstrator—a visual interface for working with WHAM models—available on Azure AI Foundry publicly available to encourage more study.
Complex gameplay sequences up to two minutes in length are expertly generated by Muse. Demos show it predicting game progression in “world model mode” after being prompted with 10 frames (1 second) of human gaming and controller inputs.
The better it captures the game, the more its output resembles Bleeding Edge dynamics. Examples demonstrate both visual (such as various hoverboards) and behavioural (such as camera adjustments and path options) variety.
Muse, which is assessed for consistency, diversity, and persistence, makes sure that gaming adheres to physics, provides variation, and accommodates user modifications—such as introducing a character into a scene, which it then organically incorporates.
Discover more from TechBooky
Subscribe to get the latest posts sent to your email.