BigQuery charges by the byte processed. A query scanning 100GB costs 100 times more than one scanning 1GBeven if they return identical results. For teams running BigQuery at scale, query optimization isn’t a nice-to-have performance tweak. It’s a direct cost control mechanism.
Most optimization advice is either too generic (“avoid SELECT *”) or too specialized (“use APPROX_COUNT_DISTINCT instead of COUNT DISTINCT”). What engineers actually need is guidance that accounts for their specific query, their data shape, and their use case.
Gemini 3 Pro fills that gap. Give it your actual query and schema context, and it identifies specific optimization opportunities while explaining why each matters for BigQuery’s architecture. Here are 10 prompts that unlock that capability.
Key Takeaways
- BigQuery costs scale with bytes processed, not result set size
- Partitioning and clustering deliver the highest-impact optimizations
- Approximate functions can cut computation by orders of magnitude with ~2% error rate
- JOIN strategy dramatically affects performance
- Always examine query execution plans to understand what BigQuery actually does
Why BigQuery Optimization Hits Harder Than Other Databases
Unlike traditional databases where optimization focuses on speed, BigQuery optimization is fundamentally about cost. Most optimizations reduce bytes processed, which directly reduces your bill while improving speed.
In BigQuery, a slower query that scans fewer bytes is always preferable because it costs less to run. JOINs that duplicate data across large tables are expensive. Subqueries that process the same intermediate result multiple times are expensive. Scanning tables without partition filters is expensive. All fixable with better SQL.
Organizations that address key optimization areas typically see 20-35% reductions in monthly BigQuery costs (Revefi, 2026).
BigQuery Pricing at a Glance (2026)
| Pricing Model | Cost |
|---|---|
| On-demand (per TiB scanned) | $6.25 per TiB |
| Standard Edition (capacity) | $0.04/slot/hour |
| Enterprise Edition (capacity) | $0.06/slot hour |
The break-even point between on-demand and capacity pricing is around 467 TiB scanned per month. Long-term storage (data untouched for 90+ days) automatically drops to 50% of active pricing.
10 Best Gemini 3 Pro SQL Query Optimization Prompts for BigQuery
Prompt 1: General Query Cost Reduction
Analyze and optimize the following BigQuery SQL query for cost reduction. I want to reduce bytes processed while maintaining result accuracy.
Query:
[ paste your SQL query here ]
Schema context:
- Table being queried: [table name]
- Table size: [approximate size in GB/TB if known]
- Partition field: [field used for partitioning, if any]
- Clustering fields: [fields used for clustering, if any]
Specific concerns:
- [e.g., this query runs daily on a cron, cost is becoming high / this query times out / result accuracy can be approximate]
Provide:
1. Byte reduction estimate for each suggested optimization
2. Specific rewrite of problematic clauses
3. Alternative approaches if the current approach is fundamentally expensive
4. Partition and cluster utilization analysis
Why this prompt works: It gives Gemini the query, schema context, and your specific concernseverything needed for targeted recommendations instead of generic advice.
Prompt 2: JOIN Performance Analysis
Analyze the following BigQuery query for JOIN performance issues:
Query:
[ paste your SQL with JOINs ]
Table sizes:
- Table A: [size and whether partitioned/clustered]
- Table B: [size and whether partitioned/clustered]
- Table C: [size and whether partitioned/clustered]
JOIN keys:
- A to B: [join condition]
- B to C: [join condition]
Current issue: [e.g., query is slow / query produces unexpected row multiplication / query runs out of memory]
Provide:
1. Analysis of why the JOIN is expensive (broadcast vs. shuffle, cardinality issues)
2. Rewrite that handles the JOIN more efficiently
3. Recommended table ordering for JOINs
4. Handling of NULLs in join keys
5. If using a JOIN strategy that requires assumptions, state them explicitly
Why this works: JOIN performance depends entirely on table sizes, data distribution, and join key characteristics. This prompt gives BigQuery the context it needs.
Prompt 3: Partition Filter Optimization
The following query does not utilize table partitions efficiently:
Query:
[ paste your SQL query ]
Table: [table name]
Partition field: [field name]
Typical query filter: [what you typically filter on]
Current behavior: [e.g., query scans entire table / partition filter is not being recognized / query filters on a field that is not the partition field]
Provide:
1. Explanation of why the partition filter is not being utilized
2. Rewrite that ensures partition pruning
3. Alternative approach if the required filter cannot be applied to the partition field
4. Monitoring query to verify partition utilization in execution plan
Why this works: Partition pruning is the most effective cost optimization for large tables. This prompt diagnoses why it isn’t happening and provides rewrites to enable it.
Prompt 4: Approximate Function Conversion
Convert the following exact BigQuery aggregation query to use approximate functions where accuracy is acceptable:
Query:
[ paste your SQL with COUNT DISTINCT, COUNT(DISTINCT), or other expensive aggregations ]
Aggregation that needs optimization:
[ e.g., COUNT(DISTINCT user_id) - we need approximately 95% accuracy or better ]
Business use case:
[ e.g., daily active user reporting / unique visitor counts for dashboard ]
Required accuracy: [ percentage or whether exact count is required ]
Provide:
1. Conversion to APPROX_COUNT_DISTINCT or other approximate functions
2. Expected error rate with approximate approach
3. Comparison of cost reduction vs. accuracy trade-off
4. Validation query to confirm approximate results are within acceptable bounds
Why this works: APPROX_COUNT_DISTINCT can cut computation dramaticallysome teams report up to 93% cost reduction when replacing exact COUNT DISTINCT with HyperLogLog-based approximations. Error rates stay around 2% or less.
Prompt 5: Subquery Optimization
Optimize the following BigQuery query that uses subqueries:
Query:
[ paste your SQL with subqueries ]
Subquery usage:
- [ e.g., correlated subquery in WHERE clause / multiple subqueries that could share intermediate results ]
Performance issue:
[ e.g., subquery runs for every row / intermediate result is recomputed multiple times / query is timing out ]
Provide:
1. Explanation of why the current subquery approach is expensive
2. Rewrite using window functions, CTEs, or JOINs instead
3. Shared intermediate computation approach if multiple subqueries compute similar results
4. Cost comparison between original and rewritten approach
Why this works: Subqueriesespecially correlated subqueries in WHERE clausesare one of the most common sources of expensive BigQuery queries. This prompt generates alternatives using BigQuery’s strengths.
Prompt 6: Repeated Query Pattern Optimization
We run variations of this query repeatedly with different filter values:
Base query:
[ paste your SQL query ]
Typical filter variations:
- filter_field = [list of typical values]
- date_range typically covers [typical range]
- This query runs [frequency, e.g., hourly/daily]
Cost per run: [estimate if known]
Total monthly cost: [estimate if known]
Provide:
1. Analysis of what changes between runs and what stays the same
2. Caching recommendations to avoid recomputation
3. Materialized view or table approach if underlying data changes infrequently
4. Query parameterization suggestions for BI tool integration
5. Estimated cost reduction from recommended changes
Why this works: Repeated queries are the biggest cost opportunity for teams running dashboards or scheduled jobs. This prompt identifies what can be cached or pre-computed.
Prompt 7: ARRAY and STRUCT Query Optimization
Optimize the following BigQuery query that processes ARRAY or STRUCT data types:
Query:
[ paste your SQL that unnests arrays or accesses struct fields ]
Data structure:
- [ describe the array/struct schema ]
Current performance issue:
[ e.g., UNNEST creates large row expansion / repeated array access in WHERE clause is slow ]
Provide:
1. Explanation of why array processing is expensive
2. Rewrite using BigQuery ARRAY functions that avoid row expansion
3. Alternative approach using subselects or lateral joins
4. Index/clustering recommendations for array-heavy access patterns
Why this works: ARRAY and STRUCT processing in BigQuery requires understanding how UNNEST operations affect row counts. This prompt generates alternatives without the performance cost of row expansion.
Prompt 8: Date/Time Manipulation Optimization
Optimize the following BigQuery query with expensive date/time operations:
Query:
[ paste your SQL with date_part, DATE_TRUNC, TIMESTAMP_DIFF, or other date manipulations ]
Date operations used:
[ list the date functions being used and on what fields ]
Performance issue:
[ e.g., DATE_TRUNC on unpartitioned field is slow / current_timestamp() prevents caching / date parsing from string is expensive ]
Provide:
1. Rewrite that optimizes date operations for BigQuery
2. Partition and clustering recommendations for date fields
3. current_timestamp replacement that enables query caching
4. Cost comparison if query runs frequently
Why this works: TIMESTAMP functions like current_timestamp() prevent BigQuery’s query caching. Date parsing from strings is expensive. These compound in queries that run on schedules.
Prompt 9: Full Table Scan Prevention
This query is scanning more data than necessary:
Query:
[ paste your SQL query ]
Table: [table name]
Table size: [size]
Partition field: [field]
Cluster fields: [fields]
What I expect to be scanned: [e.g., last 7 days based on filter]
What BigQuery actually scans: [e.g., entire table]
WHERE clause breakdown:
[ describe your filters ]
Provide:
1. Analysis of why full table scan occurs despite filter
2. Rewrite that ensures selective scanning
3. Filter order recommendations
4. Partition and clustering field recommendations
5. Execution plan query to verify what is actually scanned
Why this works: Full table scans on large tables are the most expensive BigQuery pattern. This prompt diagnoses the specific filter issue causing the full scan.
Prompt 10: Query Review for BI Tool Integration
Review the following query for use in a BI tool (Looker Studio/Tableau/Metabase) where it will be run with different filter values by end users:
Query:
[ paste your SQL query ]
BI tool context:
- Dashboard loads [number] views per day
- Users typically filter by [fields]
- Query result should support [chart types or granularity]
Security context:
- Row-level security required: [field that determines what users see]
- Users should only see their own data: [Y/N]
Provide:
1. Recommended parameterization approach for BI tool integration
2. Row-level security implementation
3. Aggregation level recommendations for dashboard performance
4. Caching strategy for common filter combinations
5. Cost estimate for typical dashboard usage patterns
Why this works: BI tool queries introduce complexity around parameterization, row-level security, and caching that standard query optimization doesn’t address.
How to Get Better Results from BigQuery Optimization Prompts
Provide table schema. BigQuery optimization requires understanding your table’s partitioning, clustering, and data types. Include your schema in prompts for accurate recommendations.
Explain business context. The same query might have different optimal implementations depending on whether it runs once or a million times a day. Frequency, user count, result accuracy requirementsall affect optimization decisions.
Verify execution plans. AI recommendations should be checked against BigQuery’s actual execution plan using EXPLAIN or EXPLICIT PLAN modes. Compare bytes processed before and after.
Test accuracy trade-offs. Approximate function conversions may introduce acceptable accuracy trade-offs. Always test that results stay within acceptable bounds for your use case.
FAQ
Does BigQuery optimization also speed up queries?
Yes. Because BigQuery charges per byte processed, optimizations that reduce bytes processed almost always reduce query execution time proportionally. The main exception is queries limited by network latency or result set size rather than computation.
How much can I reduce BigQuery costs with optimization?
Typical optimization reduces costs by 20-35%, though teams implementing comprehensive optimization programsincluding AI-powered monitoringhave reported 40-60% reductions in BigQuery spend. Queries with no partition filters, SELECT *, and exact COUNT DISTINCT on large tables have the highest reduction potential. Already-optimized queries have less room for improvement.
Should I use approximate functions for all COUNT DISTINCT?
No. Approximate functions are appropriate for exploratory analysis, dashboards, and reports where ~2% error is acceptable. Don’t use approximate functions for financial calculations, user-facing counts that affect business logic, or any case where exact answers are required.
How do I verify that a partition filter is actually being used?
Run your query with EXPLAIN or check the execution details in the BigQuery console. Look for “Stage: Read from table” entries that show “Full table scan” versus “Filtered using partition columns.”
Does LIMIT reduce query costs in BigQuery?
No. BigQuery scans the necessary data before applying the LIMIT clause, so you’re billed for the full amount of data processed even if the query returns only a small number of rows. To explore data cheaply, use TABLESAMPLE instead.
When should I switch from on-demand to capacity pricing?
A good sign it’s time to move from on-demand to BigQuery capacity pricing is when your environment consistently consumes more than 100 average slots. At Standard Edition, capacity pricing starts at about $0.04 per slot-hour, which means 100 baseline slots cost roughly $2,920 per month. At that level of usage, capacity pricing often becomes more cost-effective than on-demand, especially for teams processing large and predictable workloads.
Conclusion
BigQuery’s pricing model makes query optimization a direct cost management strategy. Every byte not processed is a byte not billed.
The 10 prompts cover the main optimization scenarios: cost reduction, JOIN performance, partition utilization, approximate functions, subquery elimination, repeated query patterns, array processing, date manipulation, full table scan prevention, and BI tool integration.
Use these prompts to audit your most expensive queries. Start with queries that run most frequently or process the most data. Small optimizations compound when applied to queries running on hourly schedules.
The goal isn’t perfect SQL on the first try. It’s continuous improvement: run the query, see what BigQuery actually does with it, optimize based on what you learn, and repeat.
Sources
- BigQuery Pricing (Google Cloud)
- Approximate Aggregate Functions (Google Cloud Documentation)
- Google BigQuery Cost Optimization: The Complete Guide (Revefi, April 2026)
- Cost Optimization Best Practices for BigQuery (Google Cloud Blog)
- BigQuery HLL: How We Cut COUNT(DISTINCT) Query Costs by 93% (Doit.com)
- How to Implement Approximate Aggregation Functions in BigQuery (OneUpTime)
- Estimate and Control Costs in BigQuery (Google Cloud Documentation)