Discover the best AI tools curated for professionals.

AIUnpacker

Search everything

Find AI tools, reviews, prompts, and more

Quick links
Gemini 3 Pro

Gemini 3 Pro 10 Best SQL Query Optimization Prompts for BigQuery

Stop overpaying for slow BigQuery queries. This guide gives you 10 powerful Gemini prompts to cut costs, speed up SQL, and avoid expensive full table scans using techniques like APPROX_COUNT_DISTINCT and partition pruning.

February 2, 2026
11 min read
AIUnpacker
Verified Content
Editorial Team
Updated: May 19, 2026

Gemini 3 Pro 10 Best SQL Query Optimization Prompts for BigQuery

February 2, 2026 11 min read
Share Article

Get AI-Powered Summary

Let AI read and summarize this article for you in seconds.

BigQuery charges by the byte processed. A query scanning 100GB costs 100 times more than one scanning 1GBeven if they return identical results. For teams running BigQuery at scale, query optimization isn’t a nice-to-have performance tweak. It’s a direct cost control mechanism.

Most optimization advice is either too generic (“avoid SELECT *”) or too specialized (“use APPROX_COUNT_DISTINCT instead of COUNT DISTINCT”). What engineers actually need is guidance that accounts for their specific query, their data shape, and their use case.

Gemini 3 Pro fills that gap. Give it your actual query and schema context, and it identifies specific optimization opportunities while explaining why each matters for BigQuery’s architecture. Here are 10 prompts that unlock that capability.

Key Takeaways

  • BigQuery costs scale with bytes processed, not result set size
  • Partitioning and clustering deliver the highest-impact optimizations
  • Approximate functions can cut computation by orders of magnitude with ~2% error rate
  • JOIN strategy dramatically affects performance
  • Always examine query execution plans to understand what BigQuery actually does

Why BigQuery Optimization Hits Harder Than Other Databases

Unlike traditional databases where optimization focuses on speed, BigQuery optimization is fundamentally about cost. Most optimizations reduce bytes processed, which directly reduces your bill while improving speed.

In BigQuery, a slower query that scans fewer bytes is always preferable because it costs less to run. JOINs that duplicate data across large tables are expensive. Subqueries that process the same intermediate result multiple times are expensive. Scanning tables without partition filters is expensive. All fixable with better SQL.

Organizations that address key optimization areas typically see 20-35% reductions in monthly BigQuery costs (Revefi, 2026).

BigQuery Pricing at a Glance (2026)

Pricing ModelCost
On-demand (per TiB scanned)$6.25 per TiB
Standard Edition (capacity)$0.04/slot/hour
Enterprise Edition (capacity)$0.06/slot hour

The break-even point between on-demand and capacity pricing is around 467 TiB scanned per month. Long-term storage (data untouched for 90+ days) automatically drops to 50% of active pricing.

10 Best Gemini 3 Pro SQL Query Optimization Prompts for BigQuery

Prompt 1: General Query Cost Reduction

Analyze and optimize the following BigQuery SQL query for cost reduction. I want to reduce bytes processed while maintaining result accuracy.

Query:
[ paste your SQL query here ]

Schema context:
- Table being queried: [table name]
- Table size: [approximate size in GB/TB if known]
- Partition field: [field used for partitioning, if any]
- Clustering fields: [fields used for clustering, if any]

Specific concerns:
- [e.g., this query runs daily on a cron, cost is becoming high / this query times out / result accuracy can be approximate]

Provide:
1. Byte reduction estimate for each suggested optimization
2. Specific rewrite of problematic clauses
3. Alternative approaches if the current approach is fundamentally expensive
4. Partition and cluster utilization analysis

Why this prompt works: It gives Gemini the query, schema context, and your specific concernseverything needed for targeted recommendations instead of generic advice.

Prompt 2: JOIN Performance Analysis

Analyze the following BigQuery query for JOIN performance issues:

Query:
[ paste your SQL with JOINs ]

Table sizes:
- Table A: [size and whether partitioned/clustered]
- Table B: [size and whether partitioned/clustered]
- Table C: [size and whether partitioned/clustered]

JOIN keys:
- A to B: [join condition]
- B to C: [join condition]

Current issue: [e.g., query is slow / query produces unexpected row multiplication / query runs out of memory]

Provide:
1. Analysis of why the JOIN is expensive (broadcast vs. shuffle, cardinality issues)
2. Rewrite that handles the JOIN more efficiently
3. Recommended table ordering for JOINs
4. Handling of NULLs in join keys
5. If using a JOIN strategy that requires assumptions, state them explicitly

Why this works: JOIN performance depends entirely on table sizes, data distribution, and join key characteristics. This prompt gives BigQuery the context it needs.

Prompt 3: Partition Filter Optimization

The following query does not utilize table partitions efficiently:

Query:
[ paste your SQL query ]

Table: [table name]
Partition field: [field name]
Typical query filter: [what you typically filter on]

Current behavior: [e.g., query scans entire table / partition filter is not being recognized / query filters on a field that is not the partition field]

Provide:
1. Explanation of why the partition filter is not being utilized
2. Rewrite that ensures partition pruning
3. Alternative approach if the required filter cannot be applied to the partition field
4. Monitoring query to verify partition utilization in execution plan

Why this works: Partition pruning is the most effective cost optimization for large tables. This prompt diagnoses why it isn’t happening and provides rewrites to enable it.

Prompt 4: Approximate Function Conversion

Convert the following exact BigQuery aggregation query to use approximate functions where accuracy is acceptable:

Query:
[ paste your SQL with COUNT DISTINCT, COUNT(DISTINCT), or other expensive aggregations ]

Aggregation that needs optimization:
[ e.g., COUNT(DISTINCT user_id) - we need approximately 95% accuracy or better ]

Business use case:
[ e.g., daily active user reporting / unique visitor counts for dashboard ]

Required accuracy: [ percentage or whether exact count is required ]

Provide:
1. Conversion to APPROX_COUNT_DISTINCT or other approximate functions
2. Expected error rate with approximate approach
3. Comparison of cost reduction vs. accuracy trade-off
4. Validation query to confirm approximate results are within acceptable bounds

Why this works: APPROX_COUNT_DISTINCT can cut computation dramaticallysome teams report up to 93% cost reduction when replacing exact COUNT DISTINCT with HyperLogLog-based approximations. Error rates stay around 2% or less.

Prompt 5: Subquery Optimization

Optimize the following BigQuery query that uses subqueries:

Query:
[ paste your SQL with subqueries ]

Subquery usage:
- [ e.g., correlated subquery in WHERE clause / multiple subqueries that could share intermediate results ]

Performance issue:
[ e.g., subquery runs for every row / intermediate result is recomputed multiple times / query is timing out ]

Provide:
1. Explanation of why the current subquery approach is expensive
2. Rewrite using window functions, CTEs, or JOINs instead
3. Shared intermediate computation approach if multiple subqueries compute similar results
4. Cost comparison between original and rewritten approach

Why this works: Subqueriesespecially correlated subqueries in WHERE clausesare one of the most common sources of expensive BigQuery queries. This prompt generates alternatives using BigQuery’s strengths.

Prompt 6: Repeated Query Pattern Optimization

We run variations of this query repeatedly with different filter values:

Base query:
[ paste your SQL query ]

Typical filter variations:
- filter_field = [list of typical values]
- date_range typically covers [typical range]
- This query runs [frequency, e.g., hourly/daily]

Cost per run: [estimate if known]
Total monthly cost: [estimate if known]

Provide:
1. Analysis of what changes between runs and what stays the same
2. Caching recommendations to avoid recomputation
3. Materialized view or table approach if underlying data changes infrequently
4. Query parameterization suggestions for BI tool integration
5. Estimated cost reduction from recommended changes

Why this works: Repeated queries are the biggest cost opportunity for teams running dashboards or scheduled jobs. This prompt identifies what can be cached or pre-computed.

Prompt 7: ARRAY and STRUCT Query Optimization

Optimize the following BigQuery query that processes ARRAY or STRUCT data types:

Query:
[ paste your SQL that unnests arrays or accesses struct fields ]

Data structure:
- [ describe the array/struct schema ]

Current performance issue:
[ e.g., UNNEST creates large row expansion / repeated array access in WHERE clause is slow ]

Provide:
1. Explanation of why array processing is expensive
2. Rewrite using BigQuery ARRAY functions that avoid row expansion
3. Alternative approach using subselects or lateral joins
4. Index/clustering recommendations for array-heavy access patterns

Why this works: ARRAY and STRUCT processing in BigQuery requires understanding how UNNEST operations affect row counts. This prompt generates alternatives without the performance cost of row expansion.

Prompt 8: Date/Time Manipulation Optimization

Optimize the following BigQuery query with expensive date/time operations:

Query:
[ paste your SQL with date_part, DATE_TRUNC, TIMESTAMP_DIFF, or other date manipulations ]

Date operations used:
[ list the date functions being used and on what fields ]

Performance issue:
[ e.g., DATE_TRUNC on unpartitioned field is slow / current_timestamp() prevents caching / date parsing from string is expensive ]

Provide:
1. Rewrite that optimizes date operations for BigQuery
2. Partition and clustering recommendations for date fields
3. current_timestamp replacement that enables query caching
4. Cost comparison if query runs frequently

Why this works: TIMESTAMP functions like current_timestamp() prevent BigQuery’s query caching. Date parsing from strings is expensive. These compound in queries that run on schedules.

Prompt 9: Full Table Scan Prevention

This query is scanning more data than necessary:

Query:
[ paste your SQL query ]

Table: [table name]
Table size: [size]
Partition field: [field]
Cluster fields: [fields]

What I expect to be scanned: [e.g., last 7 days based on filter]
What BigQuery actually scans: [e.g., entire table]

WHERE clause breakdown:
[ describe your filters ]

Provide:
1. Analysis of why full table scan occurs despite filter
2. Rewrite that ensures selective scanning
3. Filter order recommendations
4. Partition and clustering field recommendations
5. Execution plan query to verify what is actually scanned

Why this works: Full table scans on large tables are the most expensive BigQuery pattern. This prompt diagnoses the specific filter issue causing the full scan.

Prompt 10: Query Review for BI Tool Integration

Review the following query for use in a BI tool (Looker Studio/Tableau/Metabase) where it will be run with different filter values by end users:

Query:
[ paste your SQL query ]

BI tool context:
- Dashboard loads [number] views per day
- Users typically filter by [fields]
- Query result should support [chart types or granularity]

Security context:
- Row-level security required: [field that determines what users see]
- Users should only see their own data: [Y/N]

Provide:
1. Recommended parameterization approach for BI tool integration
2. Row-level security implementation
3. Aggregation level recommendations for dashboard performance
4. Caching strategy for common filter combinations
5. Cost estimate for typical dashboard usage patterns

Why this works: BI tool queries introduce complexity around parameterization, row-level security, and caching that standard query optimization doesn’t address.

How to Get Better Results from BigQuery Optimization Prompts

Provide table schema. BigQuery optimization requires understanding your table’s partitioning, clustering, and data types. Include your schema in prompts for accurate recommendations.

Explain business context. The same query might have different optimal implementations depending on whether it runs once or a million times a day. Frequency, user count, result accuracy requirementsall affect optimization decisions.

Verify execution plans. AI recommendations should be checked against BigQuery’s actual execution plan using EXPLAIN or EXPLICIT PLAN modes. Compare bytes processed before and after.

Test accuracy trade-offs. Approximate function conversions may introduce acceptable accuracy trade-offs. Always test that results stay within acceptable bounds for your use case.

FAQ

Does BigQuery optimization also speed up queries?

Yes. Because BigQuery charges per byte processed, optimizations that reduce bytes processed almost always reduce query execution time proportionally. The main exception is queries limited by network latency or result set size rather than computation.

How much can I reduce BigQuery costs with optimization?

Typical optimization reduces costs by 20-35%, though teams implementing comprehensive optimization programsincluding AI-powered monitoringhave reported 40-60% reductions in BigQuery spend. Queries with no partition filters, SELECT *, and exact COUNT DISTINCT on large tables have the highest reduction potential. Already-optimized queries have less room for improvement.

Should I use approximate functions for all COUNT DISTINCT?

No. Approximate functions are appropriate for exploratory analysis, dashboards, and reports where ~2% error is acceptable. Don’t use approximate functions for financial calculations, user-facing counts that affect business logic, or any case where exact answers are required.

How do I verify that a partition filter is actually being used?

Run your query with EXPLAIN or check the execution details in the BigQuery console. Look for “Stage: Read from table” entries that show “Full table scan” versus “Filtered using partition columns.”

Does LIMIT reduce query costs in BigQuery?

No. BigQuery scans the necessary data before applying the LIMIT clause, so you’re billed for the full amount of data processed even if the query returns only a small number of rows. To explore data cheaply, use TABLESAMPLE instead.

When should I switch from on-demand to capacity pricing?

A good sign it’s time to move from on-demand to BigQuery capacity pricing is when your environment consistently consumes more than 100 average slots. At Standard Edition, capacity pricing starts at about $0.04 per slot-hour, which means 100 baseline slots cost roughly $2,920 per month. At that level of usage, capacity pricing often becomes more cost-effective than on-demand, especially for teams processing large and predictable workloads.

Conclusion

BigQuery’s pricing model makes query optimization a direct cost management strategy. Every byte not processed is a byte not billed.

The 10 prompts cover the main optimization scenarios: cost reduction, JOIN performance, partition utilization, approximate functions, subquery elimination, repeated query patterns, array processing, date manipulation, full table scan prevention, and BI tool integration.

Use these prompts to audit your most expensive queries. Start with queries that run most frequently or process the most data. Small optimizations compound when applied to queries running on hourly schedules.

The goal isn’t perfect SQL on the first try. It’s continuous improvement: run the query, see what BigQuery actually does with it, optimize based on what you learn, and repeat.

Sources

Stay ahead of the curve.

Get our latest AI insights and tutorials delivered straight to your inbox.

AIUnpacker

AIUnpacker Editorial Team

Verified

We are a collective of engineers and journalists dedicated to providing clear, unbiased analysis.