Grok Imagine 1.5 Preview: xAI’s New AI Video Model Explained
If you’ve been waiting for xAI to drop a serious AI video generator, the wait is over. Grok Imagine 1.5 Preview is xAI’s newest image-to-video model, and it’s already sitting at the top of major benchmark leaderboards while undercutting OpenAI’s deprecated Sora 2 Pro by roughly 86% on price.
I dug through the official xAI docs, the Artificial Analysis leaderboard, xAI’s model documentation, and reporting from WIRED, TechCrunch, the BBC, and The Decoder. Here’s what Grok Imagine 1.5 can do, how it compares to the alternatives, and where it’s still rough.
What is Grok Imagine 1.5 Preview?
Grok Imagine 1.5 Preview is xAI’s image-to-video AI model that turns a still image into a 6-to-15-second clip with audio, dialogue, and lip-sync generated in the same pass. It launched as an API preview on May 30, 2026 (model alias grok-imagine-video-1.5-2026-05-30) and immediately topped the Artificial Analysis Image-to-Video Arena with a +52 Elo jump.
The release landed at a wild moment. OpenAI had just killed Sora. Google Veo 3.1 was expensive. ByteDance’s Seedance 2.0 was climbing. xAI stepped into the vacuum with something fast, cheap, and surprisingly competitive.
Pull quote: Grok Imagine 1.5 hit #1 on the Artificial Analysis Image-to-Video Arena with a 1404 Elo score on May 31, 2026 — a 52-point jump over the previous version in roughly five months. Source: Artificial Analysis.
What’s New in 1.5 vs 1.0?
Version 1.5 is a meaningful upgrade. The headline changes: sharper motion consistency, better native audio, and a faster variant.
- Better camera work. Pans, dollies, and tracking shots read as directed instead of procedurally generated.
- Cleaner dialogue and lip-sync. The 1.0 version’s speech felt mechanical. 1.5 has natural pausing and intonation.
- Less quality loss on extensions. You can chain “Extend from Frame” clips more reliably.
- Video 1.5 Fast. A speed-optimized variant that generates a 6-second 720p clip in roughly 25 seconds — about 40% faster than 1.0.
All of this runs on Aurora, xAI’s in-house autoregressive mixture-of-experts architecture that processes frames sequentially instead of in parallel like diffusion-based competitors.
Aurora Architecture: Why It Actually Behaves Differently
Aurora is xAI’s autoregressive mixture-of-experts (MoE) model that generates each frame in sequence, conditioned on all frames that came before it. Most video models use diffusion, which denoises all frames in parallel.
That difference matters. With diffusion, there’s no baked-in causal structure between frames, so motion can drift and characters can wobble. With Aurora, every frame is generated with the full clip history as context, so a camera pan started in frame one carries through frame sixty naturally.
The training scale is also wild: Aurora was trained on roughly 110,000 NVIDIA GB200 GPUs at xAI’s Memphis Colossus supercluster. Most labs train on a fraction of that. That compute gap is a real advantage.
The practical output: native audio in a single inference pass. Dialogue, sound effects, ambient sound, and background music are generated alongside the video frames, not stitched together afterward. That’s a real production win for anyone who’s spent hours syncing audio to AI clips.
Grok Imagine 1.5 Specs and Pricing
Here’s the official spec sheet straight from xAI’s developer documentation:
| Spec | Value |
|---|---|
| Model name | grok-imagine-video-1.5-preview |
| Alias | grok-imagine-video-1.5-2026-05-30 |
| Resolution | 480p (drafts) or 720p (output) |
| Frame rate | 24 fps |
| Clip length | 6 to 15 seconds |
| Aspect ratios | 7 supported (16:9, 9:16, 1:1, 4:3, 3:4, 3:2, 2:3) |
| Audio | Native, generated in same pass |
| Input formats | JPG, JPEG, PNG, WEBP, GIF, AVIF |
| Output format | H.264 MP4 |
| Rate limit | 60 requests per minute |
| Regions | us-east-1, eu-west-1, us-west-2 |
API pricing is per second of generated video:
- 480p: $0.08/sec → $4.80 per minute
- 720p: $0.14/sec → $8.40 per minute
- Image input: $0.01 per input image
For consumer access, SuperGrok is $30/month and gives you 720p output up to 15 seconds. SuperGrok Lite at $10/month locks you to 480p and 6-second clips. You can also try Grok Imagine for free at grok.com/imagine, but with tight generation quotas.
How to Access Grok Imagine 1.5
You’ve got three paths in:
- API (developers). Grab an xAI API key at console.x.ai and call
grok-imagine-video-1.5-previewvia the xAI SDK or REST. - Web app. Go to grok.com/imagine and use the Imagine Studio UI directly.
- Mobile. Open the Grok iOS or Android app. Video 1.5 Fast is the consumer-grade variant in mobile.
The model is still labeled “Preview,” which means behavior and limits can change. Plan around that if you’re shipping to production.
Grok Imagine vs Sora, Veo, Runway, and Kling
Now the part everyone actually cares about. How does it stack up against the rest?
Grok Imagine 1.5 vs Sora 2
OpenAI discontinued the Sora consumer app on April 26, 2026, and the Sora 2 API sunsets September 24, 2026, according to OpenAI’s help center. The BBC reported Sora made only $1.4 million in net in-app revenue over its lifetime while burning roughly $1 million per day to operate.
Before it died, Sora 2 Pro at the 1024p widescreen tier cost about $0.50/sec — roughly $30 per minute. Grok Imagine 1.5 at 720p costs $0.14/sec, or $8.40 per minute. That’s an 86% discount for output that’s now ranked higher on the Artificial Analysis leaderboard.
The catch: Sora 2 Pro went up to 1080p. Grok Imagine caps at 720p.
Grok Imagine 1.5 vs Google Veo 3.1
Google’s Veo 3.1 Lite launched March 31, 2026, with Fast and Quality tiers above it. Veo 3.1 runs roughly $0.40/sec for Quality ($24/min), $0.15/sec for Fast ($9/min), and $0.05/sec for Lite ($3/min). Veo offers 1080p and 60-second clips at higher tiers. Grok Imagine is cheaper but capped lower on resolution and length.
Grok Imagine 1.5 vs Runway Gen-4
Runway’s Gen-4 series starts at $15/month for the Standard plan with 625 credits and runs about $0.20/sec at the API for 720p. It has a more mature editing suite with timeline-based post-production tools, which makes it better for branded series work. Grok Imagine is faster and cheaper per clip.
Grok Imagine 1.5 vs Kling 3.0
Kling 3.0 launched February 2026 with up to 4K output, 30-60 fps, native multilingual audio, and granular camera control, according to Kling’s official release. It runs about $0.12/sec at 720p. Kling gives you more resolution and frame rate flexibility; Grok Imagine wins on speed and motion coherence.
Quick Comparison Table
| Model | Max Resolution | Length | API $/sec (720p) | Native Audio | Arena Rank |
|---|---|---|---|---|---|
| Grok Imagine 1.5 Preview | 720p | 15s | $0.14 | Yes | #1 (1467 Elo, arena.ai) |
| Google Veo 3.1 Quality | 1080p | 60s+ | $0.40 | Yes | #6 |
| Runway Gen-4.5 | 1080p | 10s | ~$0.20 | No (separate) | #40 |
| Kling 3.0 Pro | 1080p (4K option) | 15s | $0.12 | Yes | #12 |
| ByteDance Seedance 2.0 | 1080p | 20s | $0.10 | Yes | #2 |
| OpenAI Sora 2 Pro (deprecated) | 1080p | 20s | $0.50 | Yes | Discontinued |
Where Grok Imagine 1.5 Actually Wins
The model is purpose-built for short-form video. If that’s what you make, it’s the best option right now.
- Social clips. TikTok, Reels, Shorts — all sit inside the 6-to-15-second sweet spot.
- Concept testing. Generation speed of 5-30 seconds means you can test 5 hooks in the time Runway finishes one.
- Image-to-video anchoring. Upload a product shot and the model preserves composition, subject identity, and lighting while animating forward.
- Native audio. No post-production stitching. The clip arrives with dialogue, SFX, and music already synced.
xAI also offers a SuperGrok Heavy tier at $300/month with access to Grok Build and higher generation limits, according to Metronome’s 2026 pricing tracker.
Where It Falls Short
Every model has limits. Here are Grok Imagine 1.5’s:
- Capped at 720p. If you need 1080p or higher for broadcast or large-format deliverables, this isn’t your tool yet. xAI says a higher-resolution Pro Mode is on the roadmap but hasn’t committed to a date.
- 15-second max per generation. Longer sequences need the Extend from Frame feature. Community testing shows visible quality degradation after two or three chained extensions.
- Image-to-video only in current API. If you need text-to-video with no input image, you can’t use Grok Imagine 1.5 for that workflow.
- Frame rate locked at 24 fps. Fine for cinematic content, but too slow for gaming or 60 fps sports.
- Content moderation concerns. More on this below.
The Safety Question You Should Be Asking
I’d be doing you a disservice if I didn’t mention this. Grok Imagine has a content moderation history that you need to understand before building on it.
In late 2025 and early January 2026, Grok’s image generation on X was used at scale to produce non-consensual sexualized imagery, including images of minors. A class-action lawsuit was filed in March 2026, and a coalition urged the U.S. government to suspend Grok from federal agencies in a letter reported by TechCrunch on February 2, 2026.
Then on June 11, 2026, WIRED published an investigation finding Grok.com was still hosting “nudified” deepfake videos of celebrities and at least one U.S. politician. SpaceX, xAI’s parent, set aside $530 million for Grok-related legal claims ahead of its IPO.
xAI has restricted image generation to paid subscribers, refined content classifiers, and added technical blocks. The company prohibits non-consensual intimate imagery in its terms. For production workflows, especially in education or healthcare, plan for stricter content filtering on your end and read the xAI acceptable use policy before shipping.
What’s Next for xAI’s Video Stack
The leaderboard has reshuffled at least four times since January 2026. Grok Imagine 1.0 held the top in January. ByteDance’s Seedance 2.0 took over in February. Alibaba’s HappyHorse-1.0 jumped to #1 in April. Grok Imagine 1.5 reclaimed it in May. The top spot turns over every 8-10 weeks.
That pace is the real story. No single model is a durable moat. The teams that win long-term are the ones building routing layers that can swap models without rebuilding their product.
For solo creators and small teams, Grok Imagine 1.5 is the best cheap, fast, image-to-video AI generator available today. Pair it with a clear content moderation workflow and keep an eye on the leaderboard. Whatever’s #1 in August probably doesn’t exist yet.
Frequently Asked Questions
What is Grok Imagine 1.5? xAI’s image-to-video model that turns a still image into a 6-15 second clip at 720p with native audio. Launched May 30, 2026. Currently #1 on the Artificial Analysis Image-to-Video Arena.
How much does it cost? API is $0.08/sec at 480p and $0.14/sec at 720p, plus $0.01 per input image. SuperGrok consumer access is $30/month for 720p and up to 15-second clips.
Can it do text-to-video? Not in the current API release. Image-to-video only. You’d need to generate an image first.
Is it better than Sora 2? Yes, by current leaderboard rankings, and it’s also 86% cheaper. But Sora 2 Pro offered 1080p output, which Grok Imagine 1.5 doesn’t match. Sora’s consumer app is discontinued as of April 26, 2026.
What is the Aurora architecture? Aurora is xAI’s autoregressive MoE model. It generates each frame sequentially, conditioned on all previous frames. That’s different from diffusion-based competitors like Veo and Runway. The result is tighter motion coherence and built-in native audio.
Last updated: June 3, 2026. Pricing and feature data verified against xAI developer docs, Artificial Analysis, and arena.ai.