Step 3.7 Flash Pricing & Features Guide 2026

AIUnpacker Editorial

AIUnpacker

Jun 5, 2026Updated Jun 5, 202611m read

Jun 5, 2026Updated Jun 5, 2026

11 min2,459 words

Key Takeaways

Everything you need to know about Step 3.7 Flash: pricing tiers, reasoning level configurations, context window limits, and where this speed demon actually shines.

Summarize with AI

11 min → 30 sec

ChatGPT

OpenAI

Gemini

Google

Perplexity

AI Search

Editorial Disclosure & Affiliate Notice

This content is published for informational and educational purposes only. It is not intended as a substitute for professional, legal, financial, or medical advice. AIUnpacker is funded by sponsorships, affiliate commissions, and display advertising — nothing here is free to produce. When you buy through our links, we may earn a commission at no extra cost to you. Our editorial picks are never influenced by compensation.

For educational purposes only. Nothing here should be taken as a guarantee, recommendation, or professional recommendation.
AI-assisted editing. Drafts are produced with AI assistance and reviewed by our human editorial team.
Opinions are our own. Also, we are not affiliated with most tools we cover unless explicitly stated.
Information may be outdated. Verify pricing, features, and policies directly with the vendor.
Last reviewed: June 5, 2026. Published June 5, 2026.

Read more on our About page, Terms and Editorial Policy.

What Is Step 3.7 Flash?

StepFun dropped Step 3.7 Flash on May 28, 2026, and it’s already doing serious numbers - 578 billion tokens processed per week on OpenRouter alone.

It’s a 198B-parameter sparse Mixture-of-Experts vision-language model. 196B parameters in the language backbone. 1.8B in the vision encoder. But here’s the clever bit: it only fires up about 11B parameters per token. That’s how you get big-model smarts on a Flash inference budget.

StepFun calls it their “flagship multimodal reasoning model.” I’d call it the model you reach for when you need real speed without sacrificing the ability to see, think, and act across text, images, and video - all in one API call.

Step 3.7 Flash Pricing: What You’ll Actually Pay

Let’s get straight to the numbers. There’s no subscription plan to decode - StepFun charges per-token, pay-as-you-go.

API Pricing Table

Token Type	Global (USD)	China Platform (RMB)
Input (cache miss)	$0.20 / 1M tokens	¥1.35 / 1M tokens
Input (cache hit)	$0.04 / 1M tokens	¥0.27 / 1M tokens
Output	$1.15 / 1M tokens	¥8.10 / 1M tokens

Cache hits give you an 80% discount on input. If you’re sending repetitive system prompts or large context windows, that adds up fast.

How Pricing Compares

For context, Step 3.5 Flash - the text-only predecessor - costs $0.09/M input and $0.30/M output on OpenRouter. So you’re paying a premium for the vision encoder and stronger agentic performance. Compared to frontier closed models, though, Step 3.7 Flash is aggressively cheap. StepFun’s own analysis shows their Advisor Mode setup reaches 97% of Claude Opus 4.6’s coding performance at roughly one-ninth the per-task cost ($0.19 vs $1.76 per SWE-bench task).

Rate Limits by Spend Tier

StepFun uses progressive rate limiting based on how much you’ve topped up.

Tier	Cumulative Top-Up	Concurrent Requests	RPM	TPM
V0	¥0	5	10	5,000,000
V1	¥100	100	1,000	20,000,000
V2	¥500	200	5,000	30,000,000
V3	¥2,000	400	10,000	40,000,000
V4	¥5,000	1,000	20,000	50,000,000
V5	¥10,000	10,000	200,000	100,000,000

V0 is fine for testing. V1 gets you into real production territory. V5 is where you’re running agent swarms.

Additional Costs

Images get billed at roughly 400 tokens per image in low detail mode. high mode scales with resolution. Video files are tokenized based on duration and resolution. StepFun also charges separately for web search (¥0.04 per call) and file storage (¥0.50/GB/day).

Context Window: 256K Tokens

Step 3.7 Flash supports a 256,000-token context window. That’s enough to ingest entire codebases, 500-page documents, or multi-hour conversation histories in one go.

On the AA-LCR long-context benchmark, it scores 63.9% avg@16 accuracy - not chart-topping (GPT 5.5 hits 74.3%), but solid for a Flash-tier model and well ahead of Step 3.5 Flash’s 45.5%.

Practically, you can dump a massive financial report, a full GitHub repo, or a 2-hour meeting transcript into the context window and have the model work with it holistically. No chunking required.

Reasoning Levels: Low, Medium, High

This is one of Step 3.7 Flash’s smartest design choices. You get three selectable reasoning intensities that let you trade speed for depth on a per-request basis.

The Three Levels

Level	Best For	What Happens
Low	Simple Q&A, summarization, rewriting, info extraction	Model skips deep reasoning. Lightning-fast. Cheapest per-token since output is shorter.
Medium (default)	General reasoning, multi-step tasks	Balanced. Good for most daily use. Default setting.
High	Complex math, planning, code analysis, architecture decisions	Model does deep chain-of-thought before answering. Slower. More output tokens.

How to Set It

Chat Completions API (OpenAI-compatible):

curl https://api.stepfun.com/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer $STEPFUN_API_KEY" \
 -d '{
 "model": "step-3.7-flash",
 "messages": [{"role": "user", "content": "Explain backpropagation mathematically."}],
 "reasoning_effort": "high",
 "max_tokens": 2048
 }'

Messages API (Anthropic-compatible):

curl https://api.stepfun.com/v1/messages \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer $STEPFUN_API_KEY" \
 -d '{
 "model": "step-3.7-flash",
 "max_tokens": 2048,
 "messages": [{"role": "user", "content": "Draft an architecture plan for a microservices migration."}],
 "output_config": {"effort": "high"}
 }'

The model streams its internal reasoning process via the reasoning field (or reasoning_content if you pass reasoning_format="deepseek-style"). That’s gold for debugging why the model made a particular decision.

When to Use Which

I keep it on medium for coding, data extraction, and routine agent tasks. Bump to high when I need the model to think hard - complex refactors, math proofs, architecture proposals. Drop to low for classification, formatting, or simple rewrites where speed matters more than depth.

Key Features That Actually Matter

Native Multimodal: Images and Video, No Extra Models

Step 3.7 Flash handles images and video natively. You don’t need a separate vision model or a vision MCP server. Drop an image URL (or Base64) directly into your message, and the model sees it.

Supported image formats: JPG/JPEG, PNG, WebP, static GIF. Supported video: MP4, MOV, MKV (up to 128 MB, recommended under 5 minutes).

This matters enormously for agent workflows. When you’re using Claude Code, KiloCode, or Hermes Agent with Step 3.7 Flash as the backend, you can paste screenshots, UI mockups, or error dialogs directly into the conversation. The model processes them in the same inference pass.

On vision benchmarks, it’s genuinely strong: 79.2 on SimpleVQA (first place among tested models), 95.3 on V\ with Python tool* (frontier parity), and 61.87 on Android Daily GUI benchmark.

Tool Calling and Agent Framework Compatibility

Step 3.7 Flash was engineered for agents. It works with Claude Code, KiloCode, Hermes Agent, OpenClaw, Cline, Roo Code, Open Code, Zed, Cherry Studio, and Goose - essentially every major agent harness and coding assistant.

The tool-calling benchmarks back this up: 67.1 on ClawEval-1.1 (first place, significantly ahead of the next competitor at 59.8), 49.5 on Toolathlon, and an average of 67.08% across six agent harnesses on StepFun’s internal Step-SWE-Bench.

The model is less likely to drift, less prone to hallucinating tool schemas, and better at recovering from failures mid-trajectory than its predecessor.

Advisor Mode: Frontier Quality at Flash Prices

This is the feature that sold me. Step 3.7 Flash can run as the primary executor in a dual-model setup, calling a larger “advisor” model only at inflection points - planning, recovering from repeated failures, complex decision points.

The result: 97% of Claude Opus 4.6’s SWE-bench performance at $0.19 per task vs $1.76.

You keep most of the run at Flash prices and only burn expensive tokens when the model genuinely needs deeper reasoning. StepFun built this as their implementation of Anthropic’s advisor strategy, and it works.

Search Enhancement

Step 3.7 Flash has built-in web search and visual search that it can invoke autonomously. On search-heavy benchmarks: 75.82% on BrowseComp, 92.82% F1 on DeepSearchQA, 71.68% on ResearchRubrics.

What’s notable is the visual search capability. It can recognize long-tail entities and freshly emerged concepts that text-only search misses. In practice, this means you can show it a photo of an obscure product, and it’ll go find the manufacturer and specs on its own.

Speed: Up to 400 Tokens/Second

The model hits throughput of up to 400 tokens per second on StepFun’s API infrastructure. With the NVFP4 quantized checkpoint and Multi-Token Prediction (MTP) on a GB200 TP=4 setup, it reaches 8,229 tokens per second at concurrency 64 - a 1.45x speedup over the non-MTP variant.

For context, at 400 tok/s, the model outputs about 15,000 words per minute. You’re reading responses faster than you can scroll.

API Access: How to Get Started

Authentication

Get an API key from the StepFun Open Platform: platform.stepfun.com/interface-key (China) or platform.stepfun.ai (Global).

API Endpoints

Two regional platforms with separate base URLs:

Region	Platform	Base URL
Global	platform.stepfun.ai	`https://api.stepfun.ai/v1`
China	platform.stepfun.com	`https://api.stepfun.com/v1`

For Step Plan subscribers (includes model routing between DeepSeek V4 Pro and Step 3.5 Flash), the Chat Completions endpoint is https://api.stepfun.com/step_plan/v1, and the Messages API endpoint is https://api.stepfun.com/step_plan.

Protocol Compatibility

OpenAI SDK: Use base_url="https://api.stepfun.com/v1" with the standard OpenAI Python/JS client
Anthropic SDK: Use the Messages API at https://api.stepfun.com/v1/messages for native Anthropic protocol support
OpenRouter: Available at standard OpenRouter endpoints with model ID stepfun/step-3.7-flash

Simple Python Example

from openai import OpenAI

client = OpenAI(
 api_key="YOUR_STEP_API_KEY",
 base_url="https://api.stepfun.com/v1",
)

response = client.chat.completions.create(
 model="step-3.7-flash",
 messages=[{
 "role": "user",
 "content": [
 {"type": "text", "text": "What does this chart tell you about Q3 revenue?"},
 {"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}}
 ]
 }],
 reasoning_effort="high",
)

print(response.choices.message.content)

Local Deployment

Step 3.7 Flash is open-source under Apache 2.0. You can run it locally via:

vLLM (StepFun provides prebuilt Docker images: vllm/vllm-openai:stepfun37)
SGLang (Docker: lmsysorg/sglang:dev-step-3.7-flash)
HuggingFace Transformers (v5.0+ required)
llama.cpp (GGUF quants available)

Hardware requirements: ~120 GB unified memory/VRAM minimum, 128 GB recommended. Runs on Mac Studio, NVIDIA DGX Station, AMD Ryzen AI Max+ 395 systems, or standard data center GPUs with tensor parallelism.

Available via: StepFun Open Platform, OpenRouter, NVIDIA NIM, HuggingFace, ModelScope. Coming soon to DeepInfra, Fireworks AI, and Modal.

Real-World Use Cases

Real-Time Coding and Agentic Software Engineering

This is where Step 3.7 Flash earns its keep. Plug it into Claude Code, KiloCode, or Hermes Agent, and you’ve got an autonomous coding assistant that sees your UI, reads your error messages, and writes patches that pass tests.

On SWE-Bench Pro, it scores 56.3 - second only to Claude Opus 4.7 (64.3) among tested models, and ahead of both DeepSeek V4 Flash (55.6) and Gemini 3.5 Flash (55.1). Across six agent harnesses, it averages 67.08% on Step-SWE-Bench.

I use it for: multi-file refactors, bug hunting through unfamiliar repos, generating test suites from requirement docs, and the endless boilerplate that comes with building anything real.

Screenshot-to-Code

Drop a UI screenshot into the chat. The model describes the layout, identifies components, and generates working React + Tailwind code. It’s not pixel-perfect, but it gives you an 80% starter in seconds.

Document Intelligence at Scale

Step 3.7 Flash reads dense visual interfaces - PDF reports, spreadsheets, invoices, contracts - and extracts structured data directly. The cookbook examples include receipt-to-CSV, chart-to-JSON, and whiteboard-to-project-plan pipelines.

For enterprise users, this is a big deal. Finance teams can point it at quarterly reports and get structured tables. Legal teams can feed it contracts and get clause summaries. Support teams can analyze bug-report screenshots with full visual context.

Mobile GUI Agents

StepFun ships GELab-Zero, an open-source framework that connects Step 3.7 Flash to Android phones via ADB. The model sees phone screenshots, plans actions, and executes them - clicking, typing, swiping through multi-app workflows. On Android Daily, it scores 61.87%, ahead of Kimi K2.6 (53.36%) and GLM 5V Turbo (51.68%).

Rapid Research with Visual Search

Research tasks that mix text and visual information are a natural fit. The model scored 92.82% F1 on DeepSearchQA and 71.68% on ResearchRubrics. It browses the web autonomously, cross-references sources, and can visually verify information by searching for and examining images. I’ve used it for competitive analysis, market research, and technical documentation research - it finds things I’d miss in a manual search.

Interactive Applications

The combination of 400 tok/s throughput and multimodal input makes Step 3.7 Flash viable for interactive apps. Think: real-time document Q&A where users upload PDFs and ask questions, visual troubleshooting assistants for hardware/software support, or educational tools where students photograph a problem and get step-by-step solutions.

Comparison: Step 3.7 Flash vs the Field

Benchmark	Step 3.7 Flash	DeepSeek V4 Flash	Gemini 3.5 Flash	GPT 5.5	Claude Opus 4.7
Params (Active)	196B (11B)	284B (13B)	Unknown	Unknown	Unknown
Multimodal	Yes	No	Yes	Yes	Yes
Context Window	256K	-	-	-	-
SWE-Bench Pro	56.3	55.6	55.1	58.6	64.3
Terminal-Bench 2.1	59.5	62.0	76.2	82.7	69.4
ClawEval-1.1	67.1	57.8	-	-	70.8*
SimpleVQA	79.2	-	-	79.1	-
HLE w. Tool	47.2	45.1	40.2	52.2	54.7
GDPval	45.8	44.0	57.8	63.0	63.0
Input Price (/1M)	$0.20	-	-	-	-
Output Price (/1M)	$1.15	-	-	-	-

*Claude Opus 4.6 on ClawEval-1.1

Where to Get It

API: platform.stepfun.ai (Global) / platform.stepfun.com (China)
OpenRouter: openrouter.ai/stepfun/step-3.7-flash
NVIDIA NIM: Available as inference microservice
HuggingFace: huggingface.co/stepfun-ai/Step-3.7-Flash
GitHub: github.com/stepfun-ai/Step-3.7-Flash
Discord: discord.gg/RcMJhNVAQc
Web Chat: stepfun.ai (EN) / stepfun.com (CN)
iOS App: Available on the App Store
Android App: Available on Google Play

Bottom Line

Step 3.7 Flash is the clearest value proposition in the Flash model tier right now. You get native multimodal, three reasoning levels, reliable tool calling, and a 256K context window - all at $0.20/M input and $1.15/M output. The advisor mode pushes it into frontier territory for coding at a fraction of the cost. And it’s fully open-source, so you can run it locally if you’ve got the hardware.

If you’re building agents, coding assistants, or any application where speed + vision + reasoning all matter, this model deserves a spot in your stack.

Sources

OpenRouter, “Step 3.7 Flash - Overview,” 2026. https://openrouter.ai/stepfun/step-3.7-flash
StepFun, “Step 3.7 Flash - HuggingFace Model Card,” 2026. https://huggingface.co/stepfun-ai/Step-3.7-Flash
StepFun, “Step 3.7 Flash Model Overview,” 2026. https://platform.stepfun.com/docs/zh/guides/models/step-3.7-flash
StepFun, “Pricing Details,” 2026. https://platform.stepfun.com/docs/zh/guides/pricing/details
StepFun, “Step 3.7 Flash - GitHub README (Pricing Section),” 2026. https://github.com/stepfun-ai/Step-3.7-Flash
OpenRouter, “Step 3.5 Flash - Overview,” 2026. https://openrouter.ai/stepfun/step-3.5-flash
StepFun, “Step 3.7 Flash - Official Blog & Benchmarks,” May 29, 2026. https://static.stepfun.com/blog/step-3.7-flash/
StepFun, “Pricing & Rate Limits,” 2026. https://platform.stepfun.com/docs/zh/guides/pricing/details
OpenRouter, “Step 3.7 Flash - Model Card,” 2026. https://openrouter.ai/stepfun/step-3.7-flash
StepFun, “Reasoning Model Developer Guide,” 2026. https://platform.stepfun.com/docs/zh/guides/developer/reasoning
StepFun, “Step 3.7 Flash Quickstart Guide,” 2026. https://platform.stepfun.com/docs/zh/guides/models/step-3.7-flash-quickstart
StepFun, “Step Plan Quick Start,” 2026. https://platform.stepfun.com/docs/zh/step-plan/quick-start
StepFun, “Step 3.7 Flash - GitHub README (Deployment Section),” 2026. https://github.com/stepfun-ai/Step-3.7-Flash
StepFun, “StepFun Open Platform,” 2026. https://platform.stepfun.com
StepFun, “Chat Completions API Reference,” 2026. https://platform.stepfun.com/docs/zh/api-reference/chat/chat-completion-create
StepFun, “Step 3.7 Flash Cookbook,” 2026. https://platform.stepfun.com/docs/zh/guides/models/step-3.7-flash-cookbook
StepFun, “Step 3.7 Flash Mobile Agent Guide,” 2026. https://platform.stepfun.com/docs/zh/guides/models/step-3.7-flash-mobile-agent

Get our weekly AI digest

The latest AI tools, prompts, and insights — delivered every Tuesday.

No spam. Unsubscribe anytime.

AIUnpacker Editorial Team

Verified

A collective of engineers, journalists, and AI practitioners dedicated to providing hands-on, transparently disclosed analysis of the AI tools shaping tomorrow.

About us ·More articles