What Is Step 3.7 Flash?
StepFun dropped Step 3.7 Flash on May 28, 2026, and it’s already doing serious numbers - 578 billion tokens processed per week on OpenRouter alone.
It’s a 198B-parameter sparse Mixture-of-Experts vision-language model. 196B parameters in the language backbone. 1.8B in the vision encoder. But here’s the clever bit: it only fires up about 11B parameters per token. That’s how you get big-model smarts on a Flash inference budget.
StepFun calls it their “flagship multimodal reasoning model.” I’d call it the model you reach for when you need real speed without sacrificing the ability to see, think, and act across text, images, and video - all in one API call.
Step 3.7 Flash Pricing: What You’ll Actually Pay
Let’s get straight to the numbers. There’s no subscription plan to decode - StepFun charges per-token, pay-as-you-go.
API Pricing Table
| Token Type | Global (USD) | China Platform (RMB) |
|---|---|---|
| Input (cache miss) | $0.20 / 1M tokens | ¥1.35 / 1M tokens |
| Input (cache hit) | $0.04 / 1M tokens | ¥0.27 / 1M tokens |
| Output | $1.15 / 1M tokens | ¥8.10 / 1M tokens |
Cache hits give you an 80% discount on input. If you’re sending repetitive system prompts or large context windows, that adds up fast.
How Pricing Compares
For context, Step 3.5 Flash - the text-only predecessor - costs $0.09/M input and $0.30/M output on OpenRouter. So you’re paying a premium for the vision encoder and stronger agentic performance. Compared to frontier closed models, though, Step 3.7 Flash is aggressively cheap. StepFun’s own analysis shows their Advisor Mode setup reaches 97% of Claude Opus 4.6’s coding performance at roughly one-ninth the per-task cost ($0.19 vs $1.76 per SWE-bench task).
Rate Limits by Spend Tier
StepFun uses progressive rate limiting based on how much you’ve topped up.
| Tier | Cumulative Top-Up | Concurrent Requests | RPM | TPM |
|---|---|---|---|---|
| V0 | ¥0 | 5 | 10 | 5,000,000 |
| V1 | ¥100 | 100 | 1,000 | 20,000,000 |
| V2 | ¥500 | 200 | 5,000 | 30,000,000 |
| V3 | ¥2,000 | 400 | 10,000 | 40,000,000 |
| V4 | ¥5,000 | 1,000 | 20,000 | 50,000,000 |
| V5 | ¥10,000 | 10,000 | 200,000 | 100,000,000 |
V0 is fine for testing. V1 gets you into real production territory. V5 is where you’re running agent swarms.
Additional Costs
Images get billed at roughly 400 tokens per image in low detail mode. high mode scales with resolution. Video files are tokenized based on duration and resolution. StepFun also charges separately for web search (¥0.04 per call) and file storage (¥0.50/GB/day).
Context Window: 256K Tokens
Step 3.7 Flash supports a 256,000-token context window. That’s enough to ingest entire codebases, 500-page documents, or multi-hour conversation histories in one go.
On the AA-LCR long-context benchmark, it scores 63.9% avg@16 accuracy - not chart-topping (GPT 5.5 hits 74.3%), but solid for a Flash-tier model and well ahead of Step 3.5 Flash’s 45.5%.
Practically, you can dump a massive financial report, a full GitHub repo, or a 2-hour meeting transcript into the context window and have the model work with it holistically. No chunking required.
Reasoning Levels: Low, Medium, High
This is one of Step 3.7 Flash’s smartest design choices. You get three selectable reasoning intensities that let you trade speed for depth on a per-request basis.
The Three Levels
| Level | Best For | What Happens |
|---|---|---|
| Low | Simple Q&A, summarization, rewriting, info extraction | Model skips deep reasoning. Lightning-fast. Cheapest per-token since output is shorter. |
| Medium (default) | General reasoning, multi-step tasks | Balanced. Good for most daily use. Default setting. |
| High | Complex math, planning, code analysis, architecture decisions | Model does deep chain-of-thought before answering. Slower. More output tokens. |
How to Set It
Chat Completions API (OpenAI-compatible):
curl https://api.stepfun.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $STEPFUN_API_KEY" \
-d '{
"model": "step-3.7-flash",
"messages": [{"role": "user", "content": "Explain backpropagation mathematically."}],
"reasoning_effort": "high",
"max_tokens": 2048
}'
Messages API (Anthropic-compatible):
curl https://api.stepfun.com/v1/messages \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $STEPFUN_API_KEY" \
-d '{
"model": "step-3.7-flash",
"max_tokens": 2048,
"messages": [{"role": "user", "content": "Draft an architecture plan for a microservices migration."}],
"output_config": {"effort": "high"}
}'
The model streams its internal reasoning process via the reasoning field (or reasoning_content if you pass reasoning_format="deepseek-style"). That’s gold for debugging why the model made a particular decision.
When to Use Which
I keep it on medium for coding, data extraction, and routine agent tasks. Bump to high when I need the model to think hard - complex refactors, math proofs, architecture proposals. Drop to low for classification, formatting, or simple rewrites where speed matters more than depth.
Key Features That Actually Matter
Native Multimodal: Images and Video, No Extra Models
Step 3.7 Flash handles images and video natively. You don’t need a separate vision model or a vision MCP server. Drop an image URL (or Base64) directly into your message, and the model sees it.
Supported image formats: JPG/JPEG, PNG, WebP, static GIF. Supported video: MP4, MOV, MKV (up to 128 MB, recommended under 5 minutes).
This matters enormously for agent workflows. When you’re using Claude Code, KiloCode, or Hermes Agent with Step 3.7 Flash as the backend, you can paste screenshots, UI mockups, or error dialogs directly into the conversation. The model processes them in the same inference pass.
On vision benchmarks, it’s genuinely strong: 79.2 on SimpleVQA (first place among tested models), 95.3 on V\ with Python tool* (frontier parity), and 61.87 on Android Daily GUI benchmark.
Tool Calling and Agent Framework Compatibility
Step 3.7 Flash was engineered for agents. It works with Claude Code, KiloCode, Hermes Agent, OpenClaw, Cline, Roo Code, Open Code, Zed, Cherry Studio, and Goose - essentially every major agent harness and coding assistant.
The tool-calling benchmarks back this up: 67.1 on ClawEval-1.1 (first place, significantly ahead of the next competitor at 59.8), 49.5 on Toolathlon, and an average of 67.08% across six agent harnesses on StepFun’s internal Step-SWE-Bench.
The model is less likely to drift, less prone to hallucinating tool schemas, and better at recovering from failures mid-trajectory than its predecessor.
Advisor Mode: Frontier Quality at Flash Prices
This is the feature that sold me. Step 3.7 Flash can run as the primary executor in a dual-model setup, calling a larger “advisor” model only at inflection points - planning, recovering from repeated failures, complex decision points.
The result: 97% of Claude Opus 4.6’s SWE-bench performance at $0.19 per task vs $1.76.
You keep most of the run at Flash prices and only burn expensive tokens when the model genuinely needs deeper reasoning. StepFun built this as their implementation of Anthropic’s advisor strategy, and it works.
Search Enhancement
Step 3.7 Flash has built-in web search and visual search that it can invoke autonomously. On search-heavy benchmarks: 75.82% on BrowseComp, 92.82% F1 on DeepSearchQA, 71.68% on ResearchRubrics.
What’s notable is the visual search capability. It can recognize long-tail entities and freshly emerged concepts that text-only search misses. In practice, this means you can show it a photo of an obscure product, and it’ll go find the manufacturer and specs on its own.
Speed: Up to 400 Tokens/Second
The model hits throughput of up to 400 tokens per second on StepFun’s API infrastructure. With the NVFP4 quantized checkpoint and Multi-Token Prediction (MTP) on a GB200 TP=4 setup, it reaches 8,229 tokens per second at concurrency 64 - a 1.45x speedup over the non-MTP variant.
For context, at 400 tok/s, the model outputs about 15,000 words per minute. You’re reading responses faster than you can scroll.
API Access: How to Get Started
Authentication
Get an API key from the StepFun Open Platform: platform.stepfun.com/interface-key (China) or platform.stepfun.ai (Global).
API Endpoints
Two regional platforms with separate base URLs:
| Region | Platform | Base URL |
|---|---|---|
| Global | platform.stepfun.ai | https://api.stepfun.ai/v1 |
| China | platform.stepfun.com | https://api.stepfun.com/v1 |
For Step Plan subscribers (includes model routing between DeepSeek V4 Pro and Step 3.5 Flash), the Chat Completions endpoint is https://api.stepfun.com/step_plan/v1, and the Messages API endpoint is https://api.stepfun.com/step_plan.
Protocol Compatibility
- OpenAI SDK: Use
base_url="https://api.stepfun.com/v1"with the standard OpenAI Python/JS client - Anthropic SDK: Use the Messages API at
https://api.stepfun.com/v1/messagesfor native Anthropic protocol support - OpenRouter: Available at standard OpenRouter endpoints with model ID
stepfun/step-3.7-flash
Simple Python Example
from openai import OpenAI
client = OpenAI(
api_key="YOUR_STEP_API_KEY",
base_url="https://api.stepfun.com/v1",
)
response = client.chat.completions.create(
model="step-3.7-flash",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What does this chart tell you about Q3 revenue?"},
{"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}}
]
}],
reasoning_effort="high",
)
print(response.choices.message.content)
Local Deployment
Step 3.7 Flash is open-source under Apache 2.0. You can run it locally via:
- vLLM (StepFun provides prebuilt Docker images:
vllm/vllm-openai:stepfun37) - SGLang (Docker:
lmsysorg/sglang:dev-step-3.7-flash) - HuggingFace Transformers (v5.0+ required)
- llama.cpp (GGUF quants available)
Hardware requirements: ~120 GB unified memory/VRAM minimum, 128 GB recommended. Runs on Mac Studio, NVIDIA DGX Station, AMD Ryzen AI Max+ 395 systems, or standard data center GPUs with tensor parallelism.
Available via: StepFun Open Platform, OpenRouter, NVIDIA NIM, HuggingFace, ModelScope. Coming soon to DeepInfra, Fireworks AI, and Modal.
Real-World Use Cases
Real-Time Coding and Agentic Software Engineering
This is where Step 3.7 Flash earns its keep. Plug it into Claude Code, KiloCode, or Hermes Agent, and you’ve got an autonomous coding assistant that sees your UI, reads your error messages, and writes patches that pass tests.
On SWE-Bench Pro, it scores 56.3 - second only to Claude Opus 4.7 (64.3) among tested models, and ahead of both DeepSeek V4 Flash (55.6) and Gemini 3.5 Flash (55.1). Across six agent harnesses, it averages 67.08% on Step-SWE-Bench.
I use it for: multi-file refactors, bug hunting through unfamiliar repos, generating test suites from requirement docs, and the endless boilerplate that comes with building anything real.
Screenshot-to-Code
Drop a UI screenshot into the chat. The model describes the layout, identifies components, and generates working React + Tailwind code. It’s not pixel-perfect, but it gives you an 80% starter in seconds.
Document Intelligence at Scale
Step 3.7 Flash reads dense visual interfaces - PDF reports, spreadsheets, invoices, contracts - and extracts structured data directly. The cookbook examples include receipt-to-CSV, chart-to-JSON, and whiteboard-to-project-plan pipelines.
For enterprise users, this is a big deal. Finance teams can point it at quarterly reports and get structured tables. Legal teams can feed it contracts and get clause summaries. Support teams can analyze bug-report screenshots with full visual context.
Mobile GUI Agents
StepFun ships GELab-Zero, an open-source framework that connects Step 3.7 Flash to Android phones via ADB. The model sees phone screenshots, plans actions, and executes them - clicking, typing, swiping through multi-app workflows. On Android Daily, it scores 61.87%, ahead of Kimi K2.6 (53.36%) and GLM 5V Turbo (51.68%).
Rapid Research with Visual Search
Research tasks that mix text and visual information are a natural fit. The model scored 92.82% F1 on DeepSearchQA and 71.68% on ResearchRubrics. It browses the web autonomously, cross-references sources, and can visually verify information by searching for and examining images. I’ve used it for competitive analysis, market research, and technical documentation research - it finds things I’d miss in a manual search.
Interactive Applications
The combination of 400 tok/s throughput and multimodal input makes Step 3.7 Flash viable for interactive apps. Think: real-time document Q&A where users upload PDFs and ask questions, visual troubleshooting assistants for hardware/software support, or educational tools where students photograph a problem and get step-by-step solutions.
Comparison: Step 3.7 Flash vs the Field
| Benchmark | Step 3.7 Flash | DeepSeek V4 Flash | Gemini 3.5 Flash | GPT 5.5 | Claude Opus 4.7 |
|---|---|---|---|---|---|
| Params (Active) | 196B (11B) | 284B (13B) | Unknown | Unknown | Unknown |
| Multimodal | Yes | No | Yes | Yes | Yes |
| Context Window | 256K | - | - | - | - |
| SWE-Bench Pro | 56.3 | 55.6 | 55.1 | 58.6 | 64.3 |
| Terminal-Bench 2.1 | 59.5 | 62.0 | 76.2 | 82.7 | 69.4 |
| ClawEval-1.1 | 67.1 | 57.8 | - | - | 70.8* |
| SimpleVQA | 79.2 | - | - | 79.1 | - |
| HLE w. Tool | 47.2 | 45.1 | 40.2 | 52.2 | 54.7 |
| GDPval | 45.8 | 44.0 | 57.8 | 63.0 | 63.0 |
| Input Price (/1M) | $0.20 | - | - | - | - |
| Output Price (/1M) | $1.15 | - | - | - | - |
*Claude Opus 4.6 on ClawEval-1.1
Where to Get It
- API: platform.stepfun.ai (Global) / platform.stepfun.com (China)
- OpenRouter: openrouter.ai/stepfun/step-3.7-flash
- NVIDIA NIM: Available as inference microservice
- HuggingFace: huggingface.co/stepfun-ai/Step-3.7-Flash
- GitHub: github.com/stepfun-ai/Step-3.7-Flash
- Discord: discord.gg/RcMJhNVAQc
- Web Chat: stepfun.ai (EN) / stepfun.com (CN)
- iOS App: Available on the App Store
- Android App: Available on Google Play
Bottom Line
Step 3.7 Flash is the clearest value proposition in the Flash model tier right now. You get native multimodal, three reasoning levels, reliable tool calling, and a 256K context window - all at $0.20/M input and $1.15/M output. The advisor mode pushes it into frontier territory for coding at a fraction of the cost. And it’s fully open-source, so you can run it locally if you’ve got the hardware.
If you’re building agents, coding assistants, or any application where speed + vision + reasoning all matter, this model deserves a spot in your stack.
Sources
- OpenRouter, “Step 3.7 Flash - Overview,” 2026. https://openrouter.ai/stepfun/step-3.7-flash
- StepFun, “Step 3.7 Flash - HuggingFace Model Card,” 2026. https://huggingface.co/stepfun-ai/Step-3.7-Flash
- StepFun, “Step 3.7 Flash Model Overview,” 2026. https://platform.stepfun.com/docs/zh/guides/models/step-3.7-flash
- StepFun, “Pricing Details,” 2026. https://platform.stepfun.com/docs/zh/guides/pricing/details
- StepFun, “Step 3.7 Flash - GitHub README (Pricing Section),” 2026. https://github.com/stepfun-ai/Step-3.7-Flash
- OpenRouter, “Step 3.5 Flash - Overview,” 2026. https://openrouter.ai/stepfun/step-3.5-flash
- StepFun, “Step 3.7 Flash - Official Blog & Benchmarks,” May 29, 2026. https://static.stepfun.com/blog/step-3.7-flash/
- StepFun, “Pricing & Rate Limits,” 2026. https://platform.stepfun.com/docs/zh/guides/pricing/details
- OpenRouter, “Step 3.7 Flash - Model Card,” 2026. https://openrouter.ai/stepfun/step-3.7-flash
- StepFun, “Reasoning Model Developer Guide,” 2026. https://platform.stepfun.com/docs/zh/guides/developer/reasoning
- StepFun, “Step 3.7 Flash Quickstart Guide,” 2026. https://platform.stepfun.com/docs/zh/guides/models/step-3.7-flash-quickstart
- StepFun, “Step Plan Quick Start,” 2026. https://platform.stepfun.com/docs/zh/step-plan/quick-start
- StepFun, “Step 3.7 Flash - GitHub README (Deployment Section),” 2026. https://github.com/stepfun-ai/Step-3.7-Flash
- StepFun, “StepFun Open Platform,” 2026. https://platform.stepfun.com
- StepFun, “Chat Completions API Reference,” 2026. https://platform.stepfun.com/docs/zh/api-reference/chat/chat-completion-create
- StepFun, “Step 3.7 Flash Cookbook,” 2026. https://platform.stepfun.com/docs/zh/guides/models/step-3.7-flash-cookbook
- StepFun, “Step 3.7 Flash Mobile Agent Guide,” 2026. https://platform.stepfun.com/docs/zh/guides/models/step-3.7-flash-mobile-agent