Skip to main content
News

Microsoft Launches MAI Models — Building Its Own AI Stack to Reduce Dependence on OpenAI

Microsoft has released three in-house AI models — MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 — developed by Mustafa Suleyman’s MAI Superintelligence team, while also moving forward with a $10B investment in Japan. Together, these moves clearly signal that Microsoft is building a full-stack AI strategy to reduce reliance on OpenAI.

4 Apr 202612 minTechCrunch
MicrosoftAIMAIFoundational ModelsCloudEnterprise AI

Introduction — When Microsoft Decides to “Build” Instead of “Buy from OpenAI”

If you work on Azure, use Microsoft 365, or are planning your AI strategy on the Microsoft stack, this is one of the most important stories of the week.

On April 2, 2026, Microsoft announced three foundational AI models developed entirely in-house under the MAI (Microsoft AI) brand. These are not from OpenAI, not licensed from another provider, but built by the MAI Superintelligence team led by Mustafa Suleyman, the DeepMind co-founder who now serves as CEO of Microsoft AI.

This marks Microsoft’s most significant turning point in AI since it first began investing in OpenAI.


Why Does Microsoft Need to “Build Its Own”?

Over the past three years, Microsoft has invested more than $13 billion in OpenAI and made OpenAI models the core of everything from Copilot to Azure OpenAI Service.

But that strategy has come with increasingly obvious risks:

  • Single point of dependency — if OpenAI runs into problems, Microsoft is affected too
  • Pricing control — Microsoft does not control OpenAI model pricing
  • Competitive tension — OpenAI is now building products that compete directly with Microsoft, such as ChatGPT Enterprise
  • Speed of innovation — Microsoft has had to wait for OpenAI to release new models instead of setting its own roadmap

In February 2026, Mustafa Suleyman made it clear that Microsoft was moving toward “AI self-sufficiency” — the ability to develop advanced AI models on its own without depending on any single partner.

Suleyman said Microsoft needs to build its own foundational models with gigawatt-scale computing power.

And on April 2, 2026, those words became reality.


The 3 MAI Models — Built for Real Users, Not Just Benchmarks

What stands out about the MAI models is their design philosophy. Suleyman said the team approached model development with a clear goal: “putting humans at the center” — designing models around how people actually communicate, rather than simply chasing leaderboard scores.

MAI-Transcribe-1 — Speech Recognition in 25 Languages, More Accurate Than Whisper

Spec Details
Type Speech Recognition (Speech-to-Text)
Languages 25 languages
Benchmark Ranked #1 on the FLEURS benchmark — WER 3.8%
Comparison Outperformed Whisper-large-v3 in 14 of 25 languages, beat Gemini 3.1 Flash in 11 of 14 languages
Speed 2.5x faster than the previous Azure Fast model
GPU Cost Around 50% cheaper than leading competitors
Price $0.36 per audio hour

For organizations transcribing meetings, call center conversations, or podcasts, these numbers matter — both in terms of accuracy and cost.

MAI-Voice-1 — Generate 60 Seconds of High-Quality Speech in 1 Second

Spec Details
Type Speech Generation (Text-to-Speech)
Speed Generates 60 seconds of audio in under 1 second on a single GPU
Personal Voice Can clone a voice from a 10-second sample
Use Cases Powers Copilot Audio Expressions and podcast features
Price $22 per 1 million characters

At this level of speed, real-time voice AI in business applications becomes genuinely practical — without waiting 5–10 seconds for the system to generate a spoken response.

MAI-Image-2 — Image Generation Ranked Among the World’s Top 3

Spec Details
Type Text-to-Image Generation
Benchmark Ranked #3 on the Arena.ai leaderboard (image model families)
Strengths Photorealistic generation, in-image text rendering, complex layouts
Use Cases Rolling out in Bing and PowerPoint
Price $5 per 1M text input tokens / $33 per 1M image output tokens

The fact that MAI-Image-2 is being integrated directly into PowerPoint is a perfect example of where Microsoft has an advantage over OpenAI: connecting models directly into products people already use every day.


MAI Superintelligence — The Team Behind It All

All three models were developed by MAI Superintelligence, an AI research team established in November 2025 under Mustafa Suleyman’s leadership.

Why does Suleyman matter?

  • DeepMind co-founder — helped build one of the world’s most influential AI companies, now part of Google
  • CEO of Inflection AI — built the Pi chatbot before joining Microsoft
  • CEO of Microsoft AI — now oversees both AI products and model development

Microsoft bringing in Suleyman and setting up the Superintelligence team was a strong signal that the company was serious about building its own AI stack — not merely experimenting.

Just six months after the team was formed, MAI Superintelligence has already launched three models. That is Big Tech-level execution speed.


Microsoft Foundry — A New Platform That Brings Everything Together

All three MAI models are available through Microsoft Foundry — the new name for Azure AI Studio — a platform that now brings together more than 1,900 AI models from multiple developers, including OpenAI, Anthropic Claude, Meta, Mistral, DeepSeek, NVIDIA, and others.

What makes Microsoft Foundry different:

  • Model Catalog — quickly choose the right model for a specific use case
  • Benchmarking — compare model performance side by side
  • Enterprise Security — full governance and compliance capabilities
  • Serverless Deployment — pay-as-you-go usage without provisioning infrastructure

Microsoft has also launched MAI Playground (playground.microsoft.ai), where users can try MAI models directly. It is currently available first in the United States.


Pricing Strategy — Undercutting Competitors to Win Customers

In a market crowded with AI models, Microsoft is using competitive pricing as one of MAI’s main selling points:

Model Price Pricing Advantage
MAI-Transcribe-1 $0.36/hour Around 50% lower GPU cost than competitors
MAI-Voice-1 $22/1M chars Extremely fast generation = lower compute usage
MAI-Image-2 $5/$33 per 1M tokens Competitive with DALL-E 3 and Midjourney

The strategy is clear: pull Azure customers toward MAI instead of OpenAI by offering lower pricing and tighter integration.


$10 Billion in Japan — Another Piece of the Puzzle

Just one day after announcing the MAI models, Microsoft also unveiled a $10 billion investment plan in Japan for 2026–2029. Brad Smith, Microsoft’s Vice Chair and President, traveled to Tokyo to announce it in person.

The 3 Pillars of the Investment

  1. Technology — expand AI infrastructure inside Japan in partnership with Sakura Internet and SoftBank
  2. Trust — deepen cybersecurity collaboration with national institutions in Japan
  3. Talent — train more than 1 million engineers and developers by 2030

Why Japan?

  • Japan is the largest enterprise market in Asia
  • Strong demand for data sovereignty — data must stay within the country
  • Strong local partners already in place — Fujitsu, Hitachi, NEC, NTT Data, SoftBank

The market responded immediately: Sakura Internet shares jumped 20% after the announcement.

What the Market Should Read Between the Lines

This $10B investment in Japan is not just about data centers — it is really about laying the regional infrastructure for MAI models. If Microsoft truly wants to reduce dependence on OpenAI, it needs its own compute power in every major region.


The Bigger Picture — Microsoft Is Building a “Full AI Stack”

If you look at each piece separately — MAI models, Foundry, the Japan investment — it may seem like routine news.

But taken together, the strategy becomes very clear:

Layer 1: Compute Infrastructure
├── Azure data centers worldwide
├── $10B in Japan + investments in SEA, India, and the EU
└── GPU partners (Sakura Internet, SoftBank)

Layer 2: Foundational Models
├── MAI models (Transcribe-1, Voice-1, Image-2)
├── OpenAI models (GPT, DALL-E, Whisper) ← still in use
└── Partner models (Claude, Llama, Mistral)

Layer 3: Platform
├── Microsoft Foundry (model catalog + deployment)
├── Azure AI Services
└── MAI Playground

Layer 4: Products
├── Microsoft 365 Copilot
├── Bing
├── PowerPoint, Word, Teams
└── GitHub Copilot

Microsoft is building every layer so it can operate more independently — while still keeping the platform open to partner models. This is a very smart multi-model strategy:

  • It does not cut ties with OpenAI overnight, because Microsoft still needs GPT for text generation
  • But it is gradually building its own alternatives across every modality
  • It gives Azure customers freedom of choice while making MAI the default more and more over time

What’s Still Missing — A Text Generation Model

One notable detail is that all three MAI models do not include a large language model for text generation.

That means Microsoft still depends on OpenAI for GPT — the core engine behind Copilot.

Suleyman has acknowledged this directly, saying Microsoft “still cannot build the very largest models” today, but is expanding compute capacity so it can do so in 2026.

This is the final missing piece of the puzzle. If Microsoft can successfully build its own LLM, its dependence on OpenAI will drop significantly.


What This Means for Azure Customers and Thai Enterprises

For organizations already using Azure

  1. More options — no need to rely on OpenAI models alone
  2. Potentially lower costs — competition between MAI and OpenAI on Azure should push prices down
  3. Tighter integration — MAI models are designed specifically to work with Microsoft products
  4. Multi-model flexibility — choose the best model for each use case through Foundry

For Thai organizations using the Microsoft stack

  • Call Center / Customer Service — MAI-Transcribe-1 supports multiple languages at roughly 50% lower cost
  • Content Creation — MAI-Image-2 in PowerPoint can accelerate presentation creation
  • Internal Communications — MAI-Voice-1 can generate voiceovers for training videos or company announcements
  • Data Sovereignty — Microsoft’s investments in Asia (Japan $10B, Thailand $1B+) mean data infrastructure is getting closer to home

Things to Watch Out For

  • MAI models are still in public preview — not all are generally available yet
  • MAI Playground is currently launching in the US first — APAC users will have to wait
  • It is still unclear whether MAI-Transcribe-1 supports Thai language specifically (it supports 25 languages, but the full list has not yet been disclosed)

A Leadership Lesson — “Never Depend on a Single Vendor”

What Microsoft is doing in response to OpenAI offers an important lesson for every organization:

Even a company that invested $13 billion in a partner still needs to build its own alternatives.

For Thai enterprises, that lesson translates into four practical takeaways:

  1. Do not rely on a single AI vendor — design systems so models can be swapped out
  2. Invest in an abstraction layer — separate business logic from the underlying AI model
  3. Watch the pricing war closely — AI prices are falling fast; the long-term contract you signed yesterday may already be too expensive tomorrow
  4. Prepare a multi-model strategy — use the best model for each task instead of forcing one model to do everything

Conclusion — The Beginning of “AI Independence”

Microsoft’s MAI models are the beginning, not the end, of its effort to reduce dependence on OpenAI:

  • The first 3 models — Transcribe-1, Voice-1, and Image-2 cover speech, voice, and image
  • An LLM is still missing — but Suleyman says it is coming in 2026
  • $10B in Japan plus global investments — building the compute infrastructure to support it
  • Microsoft Foundry — the platform that brings all models together in one place, both MAI and partner offerings

For organizations already using the Microsoft stack, this is good news: more choice, lower costs, and tighter integration.

But for OpenAI, it is a clear warning sign that its largest customer is preparing an exit path.


Ready to Build Your AI Strategy on the Microsoft Stack?

The Enersys team has experience helping Thai organizations plan AI strategy on Azure — from selecting the right models and designing multi-model architectures to deploying systems into production.

Whether you are just getting started or looking for ways to optimize AI costs, we can help.

Talk to the Enersys team for free


References

"Empowering Innovation,
Transforming Futures."

ติดต่อเราเพื่อทำให้โปรเจกต์ของคุณเป็นจริง