Introduction — When Frontier AI Becomes Free and Runs on Your Phone
Imagine a top-3 AI model in the world that doesn’t need to send data to the cloud, doesn’t require monthly API fees, and doesn’t create constant anxiety around data privacy — and it runs directly on the phone in your hand.
That’s not a vision for five years from now. It’s what Google Gemma 4 is making possible in April 2026.
On April 2, 2026, Google announced Gemma 4 — its most powerful open AI model family yet, spanning 4 sizes from 2B to 31B parameters. It supports full multimodal capabilities across text, image, video, and audio, and now comes under the Apache 2.0 license, giving businesses full freedom for commercial use.
Most importantly, the smaller models are designed to run directly on Android devices through the newly available AICore Developer Preview.
For Thai enterprises, startups, and developers looking to build AI into products, this is a game changer worth paying attention to.
What Is Gemma 4? — Four Sizes for Every Use Case
Gemma 4 is built on the same core technology as Gemini 3 (Google’s flagship model), but optimized to be small enough to run on a wide range of hardware — from smartphones to servers.
The 4 Core Model Sizes
| Model |
Total Parameters |
Active Parameters |
Context Window |
Type |
| E2B |
5.1B |
2.3B |
128K |
Dense |
| E4B |
8B |
4.5B |
128K |
Dense |
| 26B |
26B |
4B |
256K |
MoE (Mixture of Experts) |
| 31B |
31B |
31B |
256K |
Dense |
One of the most interesting details is the “E” (Effective) label — it refers to the number of parameters actually activated during inference. Google designed these models with more total parameters than are used at runtime, helping improve quality without consuming too much RAM or battery.
The 26B model uses a Mixture of Experts (MoE) architecture, with 26B total parameters but only 4B activated per request — delivering performance close to the 31B model while using far fewer resources.
Why Gemma 4 Matters — The Numbers Speak for Themselves
Key Benchmark Results
The 31B Dense model in Gemma 4 ranks #3 in the world on the Arena AI text leaderboard, with an Elo score of around 1452. The 26B MoE model ranks #6 with a score of 1441, despite using only 4B active parameters.
Notable benchmark results:
| Benchmark |
31B |
26B (MoE) |
E4B |
E2B |
| MMLU Pro (general knowledge) |
85.2% |
82.6% |
69.4% |
60.0% |
| AIME 2026 (math) |
89.2% |
88.3% |
42.5% |
37.5% |
| LiveCodeBench v6 (coding) |
80.0% |
77.1% |
52.0% |
44.0% |
| MMMU Pro (multimodal) |
76.9% |
73.8% |
52.6% |
44.2% |
The 26B model can outperform models more than 20 times larger. This is what Google describes as breakthrough “intelligence-per-parameter.”
Full Multimodal Capability in a Single Model Family
Gemma 4 is not just a text model — it can understand multiple modalities together.
What Each Model Size Can Do
| Capability |
E2B |
E4B |
26B |
31B |
| Text |
Yes |
Yes |
Yes |
Yes |
| Image (OCR, charts, photos) |
Yes |
Yes |
Yes |
Yes |
| Video |
Yes |
Yes |
Yes |
Yes |
| Audio (speech, transcription) |
Yes |
Yes |
No |
No |
An important detail: the smaller models (E2B, E4B) support audio, while the larger ones do not. That’s because Google specifically designed the smaller models for on-device use cases where they can process microphone input directly.
What You Can Do Right Away
- OCR — Extract text from images, receipts, and documents
- Chart Understanding — Analyze graphs, charts, and dashboard visuals
- GUI Detection — Recognize elements on an app screen
- Audio Transcription — Convert speech to text
- Video Understanding — Analyze video content
- Bounding Box Detection — Identify object locations in images
And with E2B or E4B running on-device, all of this can happen without sending data to the cloud.
Support for 140+ Languages — Including Thai
Gemma 4 was trained from the ground up to support more than 140 languages, including Thai. This is not just “basic support” — it’s native multilingual training, which helps the model better understand context and meaning across languages.
For Thai businesses building AI products for Thai users, that matters a lot. You don’t need to fine-tune from scratch or force prompts into English first.
Apache 2.0 — A Major Licensing Shift
One of the most important — and potentially overlooked — changes is that Gemma 4 has moved from the custom Gemma license to full Apache 2.0.
Why This Matters
| Issue |
Previous Gemma License |
New Apache 2.0 |
| Commercial use |
Allowed (with terms) |
Allowed freely |
| Modification |
Allowed |
Allowed |
| Redistribution |
Restricted |
Fully open |
| Use in paid products |
Must review terms |
Ready to use |
| Compatibility with other OSS |
Potential issues |
Highly compatible |
Apache 2.0 is one of the licenses that enterprises trust most. In many companies, legal teams can approve Apache 2.0 software immediately without a lengthy review process.
For startups and software vendors embedding AI into products, this is effectively Google saying: go ahead and ship it.
Agentic AI — Building AI Agents That Think and Act
Gemma 4 isn’t designed only for answering questions. It’s built to support AI agents that can:
- Reason — Use chain-of-thought style reasoning through thinking tokens
- Call tools — Support built-in function calling for external APIs
- Make decisions — Evaluate tool outputs and decide what to do next
- Handle workflows — Go beyond one-shot Q&A and execute multi-step tasks
Example Use Cases
- Customer Service Agent — Answer customer questions, check backend systems, and respond automatically, all on-device without sending customer data externally
- Field Service Assistant — A technician takes a photo of a machine, and the model analyzes it to recommend repair steps, even without internet access
- Document Processing — Read invoices, receipts, and business documents and extract key information, all processed on-device to protect privacy
“Once AI agents can run on-device, use cases previously blocked by latency, privacy, and connectivity concerns suddenly become practical.”
AICore Developer Preview — Available on Android Today
Google has introduced AICore Developer Preview, allowing developers to start testing Gemma 4 directly on supported Android devices.
What You Get with AICore Developer Preview
- Direct access to E2B and E4B models on supported devices
- ML Kit GenAI Prompt API for easier AI feature development through standard APIs
- Support for hardware accelerators from Google, MediaTek, and Qualcomm
- A path toward Gemini Nano 4, which will ship on flagship Android devices later this year
Big Performance Gains
Compared with previous generations:
- Up to 4x faster inference
- Up to 60% lower battery usage
- E2B is up to 3x faster than E4B for use cases that need quick responses
Gemma 4 and Gemini Nano 4
A key relationship to understand: Gemma 4 is the foundation of Gemini Nano 4. Code written today using Gemma 4 through ML Kit will run directly on Gemini Nano 4 on new flagship devices launching later this year.
This gives developers a chance to start building on-device AI features now, without waiting for the next hardware cycle.
A Strong Community — 400 Million Downloads
Since the first Gemma release, developers worldwide have downloaded Gemma models more than 400 million times and created over 100,000 customized variants — what Google calls the “Gemmaverse.”
Why that matters:
- A large ecosystem means it’s easier to find help and solutions
- Many variants means there are already fine-tuned models for specialized domains
- Day-0 support from major inference engines including transformers, llama.cpp, MLX, ONNX, and more
What This Means for Thai Developers and Businesses
Gemma 4 is more than just another AI product announcement — it changes the economics and practical reality of AI in several important ways.
1. AI Costs Drop to Near Zero for Inference
When models run on-device, there are no API fees, no cloud compute charges, and no per-token pricing. For startups worried about the unit economics of AI features, this is a real solution.
2. Data Privacy Is No Longer a Barrier
For industries handling sensitive data — finance, healthcare, legal — on-device AI means the data never leaves the device. That aligns naturally with PDPA requirements and data residency concerns.
3. Offline AI Is Now Real
Construction sites, factories, and remote locations with unreliable internet can still use AI effectively, because everything runs locally on the device.
4. Thai Language Support Is Ready from Day One
There’s no need to fine-tune Thai language capability from scratch. Gemma 4 supports 140+ languages natively, reducing both time and cost for teams building AI products for the Thai market.
5. Apache 2.0 Means You Can Ship Faster
Most corporate legal teams in Thailand are already familiar with Apache 2.0. There’s no need for extended review of unusual licensing terms or hidden restrictions.
How Thai Organizations Can Get Started
Short Term (Start Immediately)
- Test Gemma 4 E2B/E4B through AICore Developer Preview on Android devices
- Identify use cases that need privacy, low latency, or offline capability
- Assess your data — even the best model still needs high-quality input data
Mid Term (Q2–Q3 2026)
- Fine-tune for specific domains using your organization’s own data
- Build prototypes of on-device AI agents for the selected use cases
- Prepare for Gemini Nano 4 — code written for Gemma 4 will transfer directly
Long Term (H2 2026+)
- Deploy on-device AI as a core feature inside the organization’s mobile app
- Create AI-first experiences — once AI lives on every device, user experience changes fundamentally
Strategic View — The Bigger Picture
Gemma 4 is part of a much larger shift in the AI industry: moving intelligence from the cloud to the edge and the device itself.
- Apple Intelligence is bringing more AI on-device to iPhone
- Qualcomm is pushing on-device AI through Snapdragon
- Google is responding with Gemma 4 + AICore + Gemini Nano 4
What’s happening is clear: AI is becoming a foundational layer of mobile computing, just like GPS, cameras, and internet connectivity once did. Every app will have AI inside it, and increasingly, that AI will run directly on-device rather than depending on the cloud.
For Thai organizations, the question is no longer “Should we use on-device AI?” It’s now “When do we start preparing for it?”
Every month you wait is a month your competitors can spend building AI features you still don’t have.
Comparing Gemma 4 with Other Open Models
| Model |
Size |
Multimodal |
Audio |
On-Device |
License |
LMArena Score |
| Gemma 4 31B |
31B |
Yes |
No |
Server |
Apache 2.0 |
~1452 |
| Gemma 4 E2B |
2.3B active |
Yes |
Yes |
Yes |
Apache 2.0 |
— |
| Llama 3.3 70B |
70B |
Text only |
No |
No |
Llama License |
~1440+ |
| Qwen 2.5 32B |
32B |
Yes |
No |
No |
Apache 2.0 |
~1430+ |
| Phi-4 14B |
14B |
Yes |
No |
Yes (partial) |
MIT |
~1380+ |
Gemma 4’s advantage is clear: multimodal + audio + on-device + Apache 2.0 in a single compact model family. No major competitor currently offers that full combination in smaller models.
Conclusion — What to Remember
Google Gemma 4 marks a major turning point in the open AI ecosystem:
- 4 sizes (E2B, E4B, 26B MoE, 31B Dense) spanning mobile to server use cases
- 31B ranks #3 globally on the Arena AI text leaderboard
- Full multimodal support — text, image, video, and audio (in smaller models)
- 140+ languages, including Thai
- Apache 2.0 — ready for commercial use with no special restrictions
- AICore Developer Preview — available now on Android
- 400 million downloads and 100,000+ variants in the Gemmaverse
When frontier-grade AI is free, runs on phones, and comes without licensing friction, the excuses for not adopting AI are disappearing fast.
Ready to Build On-Device AI for Your Business?
The Enersys team has experience helping Thai organizations build AI solutions — from selecting the right model and designing the architecture to deploying production-ready systems.
Whether you’re exploring on-device AI, AI agents for customer service, or privacy-preserving document processing, we can help.
Get a free consultation with the Enersys team
References