Quick Answer
We identify the best AI prompts for SQL query generation with Claude by focusing on context and iterative refinement. Our approach ensures you get accurate, optimized code by treating the AI as a collaborative partner. This guide provides proven prompt strategies to eliminate syntax errors and bridge the gap between business questions and database logic.
The 'Explain First' Rule
Before asking Claude to write code, prompt it to explain the logic for the query. This forces the AI to articulate its reasoning and reveal flawed assumptions, preventing costly errors before you run a query.
Unlocking the Power of AI for SQL Mastery
You know that sinking feeling when a critical report is due, but your SQL query keeps throwing a syntax error, or worse, it runs for ten minutes and returns nonsensical data? You’re not alone. In 2025, the demand for SQL skills has exploded beyond the data team. Marketers analyzing campaign funnels, product managers tracking user behavior, and sales ops teams building forecasts are all expected to query databases directly. Yet, the learning curve remains brutal. A single misplaced comma, a misunderstood join logic, or an inefficient query that scans terabytes of data can derail your entire day and run up a significant cloud bill. This is where the gap between asking a business question and writing flawless SQL code becomes a major productivity killer.
This is precisely why we’re focusing on Claude for SQL query generation. While many AI models can write SQL, I’ve found Claude to be a superior partner for database tasks due to its exceptional reasoning and context retention. In my own workflow, I’ve seen it excel at deconstructing a complex request—like calculating a 6-month rolling average of customer retention—into logical, understandable steps. Unlike other models that might spit out a monolithic, unexplainable query, Claude acts more like a senior data analyst. It explains the why behind its choices, especially when dealing with nested sub-queries or intricate window functions, ensuring you not only get a working query but also understand the logic behind it.
In this guide, we’ll move beyond simple “write a query for X” prompts. We’ll explore a progression of techniques designed to make you a more effective and efficient SQL practitioner. Here’s what we’ll cover:
- The Art of the Prompt: How to provide the necessary schema context and business logic to get accurate results on the first try.
- Deconstructing Complexity: Using Claude to break down intimidating nested sub-queries and multi-layered joins into digestible parts.
- From Generation to Optimization: Techniques for asking Claude to critique its own work, suggest performance improvements, and even debug errors you’re encountering.
Golden Nugget: The most powerful prompt I use isn’t “write this query.” It’s “explain the logic for this query, then write the code.” This forces the AI to articulate its reasoning first, which often reveals flawed assumptions before you ever run a costly query.
Mastering the Basics: Crafting Effective Prompts for Simple Queries
What’s the single biggest mistake that causes AI to generate incorrect SQL? It’s not a complex logic error; it’s a lack of context. You can ask a powerful model like Claude to “get me the list of top customers,” but without knowing your database structure, you’re asking for a guess. In my experience, that guess is wrong about 90% of the time. It might assume a customers table exists, that “top” means total_revenue, and that the data is in a sales table. This is why the “Context is King” principle is the non-negotiable foundation of effective prompt engineering for SQL.
When you provide schema details, you transform a vague request into a precise instruction. You’re not just telling Claude what you want; you’re showing it where to find it and how the pieces connect. This is the difference between getting a query that runs and a query that gives you the correct business answer.
The “Context is King” Principle: Why Schema is Your Secret Weapon
Think of it like giving directions to a taxi driver. Saying “take me to the airport” works if you’re in a one-airport city. But if you’re in a major hub with multiple airports, you need to specify which one, which terminal, and which entrance. Your database is that complex city. Your tables are the buildings, and your columns are the street addresses.
Providing the schema is the single most important step because it eliminates ambiguity. AI models don’t have telepathic access to your data dictionary. You must feed it the essential information. A robust prompt includes:
- Table Names: The exact name of the tables involved (e.g.,
users,orders,order_items). - Key Column Names: The columns you’ll be filtering on or joining with (e.g.,
user_id,order_id,created_at). - Data Types: Briefly mentioning if a column is a
DATE,VARCHAR,INTEGER, etc., can prevent logic errors, especially with functions likeDATE_TRUNCorCAST. - Relationships: Explicitly stating the join keys is a game-changer (e.g., “
orderstable links tousersviauser_id”).
Golden Nugget: My go-to prompt structure for any new query starts with a simple schema dump. I’ll write: “
userstable:user_id(int),signup_date(timestamp).orderstable:order_id(int),user_id(int),amount(decimal).” This 30-second step saves me 10 minutes of debugging and re-prompting. It’s the highest ROI activity in the entire process.
Prompting for SELECT, WHERE, and JOINs: From Vague to Precise
Let’s see this principle in action. The difference between a junior analyst’s prompt and a senior analyst’s prompt is all in the specificity.
Vague Request (The “Hope and Pray” Method):
“Hey, can you write a query to find our most valuable customers from last month?”
This is a recipe for disaster. Claude has to guess the table names, the definition of “valuable” (highest spend? most orders?), and the date format.
Precise Request (The Expert Method):
“I need a list of the top 10 customers by total spend for October 2025.
Schema:
customerstable:customer_id(int),customer_name(varchar)paymentstable:payment_id(int),customer_id(int),payment_amount(decimal),payment_date(timestamp)Logic:
- Join
paymentstocustomersoncustomer_id.- Filter where
payment_dateis between ‘2025-10-01’ and ‘2025-10-31’.- Group by
customer_idandcustomer_name.- Sum the
payment_amountto get total spend.- Order by total spend in descending order and limit to 10.
Please provide the query in standard SQL.”
This prompt is a blueprint for success. It defines the business question, provides the necessary data dictionary, and outlines the logical steps. The AI isn’t guessing; it’s executing your well-defined plan. This same principle applies to filtering and joins. Instead of “filter for active users,” specify WHERE status = 'active'. Instead of “join users and orders,” specify LEFT JOIN orders ON users.id = orders.user_id.
Controlling Output Format: Dialects and Copy-Paste Readiness
Your work doesn’t end with a correct query; it needs to be runnable in your specific environment. A query written for MySQL might fail in PostgreSQL due to subtle syntax differences. You can eliminate this friction by explicitly telling Claude which SQL dialect you need.
This is also where you can request formatting that makes your life easier. A wall of unformatted SQL is hard to read and debug. Asking for clean, indented code is a simple instruction that pays dividends.
Here are two examples of how to control the output:
1. Prompting for a Specific Dialect:
“Write a query to calculate the average session duration per user. The data is in a table named
user_sessionswith columnsuser_id(int) andsession_duration_seconds(int). Generate the query in T-SQL syntax for SQL Server.”
2. Prompting for Readability:
“Using the schema from the previous example, write a query to find the total number of orders per customer. Please format the SQL with standard indentation and use aliases for all tables (e.g.,
cforcustomers,ofororders) to keep it concise and readable.”
By mastering these foundational prompting techniques—providing rich context, being specific in your requests, and controlling the output format—you build a reliable workflow. You move from hoping the AI gets it right to architecting prompts that guarantee it.
Deconstructing Complexity: How to Generate and Understand Nested Sub-queries
Ever stared at a wall of nested SQL code and felt your brain short-circuit? You know what you want the data to tell you, but the labyrinth of parentheses and sub-queries feels like a foreign language. This is the point where most people either give up or copy-paste code they don’t fully trust. I’ve been there, and it’s a terrible way to work. The key isn’t to become a better SQL typist; it’s to become a better SQL architect. And with modern AI, you can have an architect on demand.
The real power of a tool like Claude isn’t just in writing the code, but in its ability to reason through the logic before a single line is executed. By changing how you ask, you transform the AI from a simple code generator into a collaborative partner that helps you think through the problem. This approach de-risks your analysis and builds your own expertise in the process.
The “Explain the Logic First” Technique
The most common mistake I see analysts make is jumping straight to the request: “Write a query to show me…” This is like asking a builder to construct a house without first looking at the blueprints. You’ll get a structure, but it might not be the one you need. My go-to strategy, honed from countless data projects, is to force the AI to articulate its reasoning first.
Instead of asking for code, I prompt for a step-by-step plan. It looks something like this:
“I need to find the top 10% of sales reps by revenue in Q3 2025, but only for clients who have been with us for over a year. Please first describe the logical steps you would take to solve this with SQL, without writing the code yet. Explain how you’ll handle the ranking and the client filtering.”
This prompt does several things. It forces the AI to break the problem into discrete parts: 1) identify long-term clients, 2) filter sales data for Q3, 3) join the datasets, 4) aggregate revenue per rep, and 5) apply a percentile ranking. By seeing this logic laid out in plain English, I can immediately spot flaws. Maybe the AI’s plan doesn’t account for a client’s start date correctly. I can correct it before it generates 50 lines of complex, potentially buggy code. This single change in prompting strategy saves me hours of debugging and is a cornerstone of using AI effectively.
Breaking Down Nested Queries with a Real-World Example
Let’s apply this to a classic scenario: finding your top-performing sales reps in a specific region, but with a twist. We only want reps whose total sales across all regions exceed the company average. This naturally leads to a nested query.
Here’s how I would prompt Claude to not only generate the query but also make it understandable:
“Generate a SQL query to find the top 5 sales reps (by total sales) who work in the ‘North America’ region. The catch: only include reps whose lifetime sales across all regions are greater than the average lifetime sales of all reps. Please use a nested sub-query for the average calculation and explain what each part of the query does.”
Claude’s response would typically break down into two parts: the explanation and the code.
The Explanation (Interpreted):
- Inner Query (
SELECT AVG(lifetime_sales) ...): This is the sub-query. It first calculates a single number—the average lifetime sales for every single sales rep in the entire company, regardless of their region. This is your benchmark. - Outer Query (
SELECT rep_name, SUM(sales_amount) ...): This is the main query. It filters for reps in ‘North America’, groups them by name, calculates their total sales, and then compares that total against the benchmark number produced by the inner query.
The Generated SQL:
SELECT
rep_name,
SUM(sales_amount) AS region_total_sales
FROM
sales_data
WHERE
region = 'North America'
AND rep_name IN (
-- Sub-query to get reps whose lifetime sales are above average
SELECT rep_name
FROM sales_data
GROUP BY rep_name
HAVING SUM(sales_amount) > (SELECT AVG(lifetime_sales) FROM company_rep_summary)
)
GROUP BY
rep_name
ORDER BY
region_total_sales DESC
LIMIT 5;
Seeing the logic explained separately from the code is a game-changer. It builds trust. You understand why the WHERE ... IN (...) clause is there and what the sub-query is feeding it. You’re no longer just running code; you’re executing a validated plan.
Using CTEs (Common Table Expressions) for Clarity
Nested sub-queries work, but they can be a nightmare to read or modify later. As your queries get more complex, you end up with a “pyramid of doom” that’s hard to debug. A more modern, readable, and maintainable approach is to use Common Table Expressions (CTEs). The WITH clause lets you break the logic into named, sequential steps.
The prompt here is key. You need to guide the AI to restructure its thinking.
“Rewrite the previous query using Common Table Expressions (CTEs) instead of nested sub-queries. Name the CTEs clearly to show the logical steps, like ‘RegionalSales’ and ‘AboveAverageReps’.”
The AI will translate the nested logic into a clean, linear flow:
-- Step 1: Calculate the company-wide average lifetime sales benchmark
WITH CompanyBenchmark AS (
SELECT AVG(lifetime_sales) AS avg_sales
FROM company_rep_summary
),
-- Step 2: Identify all reps whose lifetime sales exceed that benchmark
AboveAverageReps AS (
SELECT rep_name
FROM sales_data
GROUP BY rep_name
HAVING SUM(sales_amount) > (SELECT avg_sales FROM CompanyBenchmark)
)
-- Step 3: Get the final results for North America, filtering by the list from Step 2
SELECT
s.rep_name,
SUM(s.sales_amount) AS region_total_sales
FROM
sales_data s
JOIN
AboveAverageReps aar ON s.rep_name = aar.rep_name
WHERE
s.region = 'North America'
GROUP BY
s.rep_name
ORDER BY
region_total_sales DESC
LIMIT 5;
By prompting for CTEs, you’re not just getting cleaner code. You’re building a query that reads like a recipe. Each CTE is a distinct preparation step, and the final SELECT is the assembly. This is far easier to hand off to a colleague, revisit six months later, or modify for a slightly different question. It’s a hallmark of professional, production-ready SQL.
Advanced Prompting Strategies: Window Functions, CTEs, and Recursive Queries
You’ve mastered the basics, but now you’re facing queries that require more than a simple SELECT and WHERE. Your boss needs a running total for a financial report, or you need to untangle a complex organizational hierarchy. This is where basic prompting fails and advanced strategy begins. How do you communicate this multi-layered logic to an AI without it hallucinating or producing an unrunnable mess? The answer lies in treating Claude like a senior developer: you don’t just give it a task, you provide a blueprint.
Unlocking Window Functions: From Business Question to Analytical Power
Window functions are notorious for being difficult to write from scratch because they require you to think about data in “windows” or partitions. A common mistake is to ask the AI vaguely, “Give me a rank of products.” This often leads to incorrect results because the AI doesn’t know how to partition the data. Instead, you need to frame the prompt with the business question and the specific logic.
Here’s a template I use frequently for a running total, a classic window function:
Prompt Template: Running Total “Write a SQL query to calculate a running total of
daily_salesfrom thesalestable. The table has columnssale_dateandamount. I need the running total to reset for each month. Please use a window function withPARTITION BYfor the month andORDER BYthe sale date. Explain theROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWclause in the context of this calculation.”
This prompt works because it specifies the function (running total), the partitioning logic (reset for each month), and the ordering (ORDER BY sale_date). By asking for an explanation of the frame clause, you force the AI to articulate the logic, ensuring you understand why the numbers are correct before you run it.
Prompting for Multi-layered CTEs: Building a Logical Query Chain
When a query requires multiple steps, dumping it all into a single CTE is a recipe for confusion. The key is to guide the AI to build the query sequentially, just as a human analyst would. You define the logical flow from one CTE to the next, making the output auditable and easy to modify.
Imagine you need to calculate the Month-over-Month (MoM) growth rate of active users. This requires first identifying active users per month, then lagging that count to calculate the growth. Here’s how you’d structure the prompt:
Prompt Template: Multi-layered CTE “Generate a SQL query to calculate the Month-over-Month growth rate of active users. Use Common Table Expressions (CTEs) for clarity.
CTE 1 (
monthly_active_users): First, count the distinctuser_idfor eachmonthfrom theeventstable. Filter for events after January 1, 2025.CTE 2 (
growth_calculation): Joinmonthly_active_usersto itself using a window function (LAG) to get the previous month’s user count. Calculate the MoM growth percentage.Final
SELECT: Return the current month, current user count, previous user count, and the calculated growth rate. Explain the purpose of each CTE in the final output.”
This prompt acts as a project manager, assigning specific tasks to each CTE. It explicitly defines the data flow: CTE 1 prepares the data, CTE 2 performs the calculation, and the final SELECT presents the results. This structure is invaluable for complex business logic and is a hallmark of production-ready code.
Handling Hierarchical Data with Recursive Queries: Taming the Trees
Recursive queries are the final boss of SQL generation. They are essential for organizational charts, bill of materials (BOM), or any parent-child relationship. A vague prompt like “show me the org chart” will almost certainly fail. You must provide the base case and the recursive step.
Here is a specialized prompt structure for generating an employee organizational chart, a classic recursive CTE example:
Prompt Template: Recursive CTE “Write a recursive CTE in SQL to map an employee’s management chain.
Table Schema:
employeeswith columnsemployee_id,manager_id, andemployee_name.Logic:
- Anchor Member: Start with a specific employee, for example
employee_id = 101. Select their details.- Recursive Member: Join the
employeestable to the CTE itself, linkingemployee.manager_idtocte.employee_idto find the next manager up the chain.- Termination Condition: The recursion stops when a manager_id is
NULL.Please output the full query and explain how the
UNION ALLconnects the anchor and recursive members.”
By breaking down the recursion into its core components—anchor, recursive join, and termination condition—you provide the AI with a foolproof recipe. This prevents it from getting stuck in an infinite loop or misinterpreting the join logic, saving you significant debugging time and frustration.
From Prompt to Production: Debugging and Optimizing Your SQL with Claude
Even the most skilled data professionals write flawed SQL. The difference between a junior analyst and a seasoned pro isn’t a lack of errors; it’s the speed and precision with which they find and fix them. This is where Claude transforms from a query generator into a production-ready debugging partner. By treating it as a collaborative code review tool, you can catch costly mistakes before they hit your production environment.
The “Fix My Query” Prompt: Your Instant Code Reviewer
We’ve all been there: a query that looks perfect but throws a cryptic syntax error or, worse, returns nonsensical data. Instead of staring at the screen for an hour, you can get a diagnosis in seconds. The key is to provide context, not just the broken code.
A Workflow for Debugging:
- Paste the Error: Start by giving Claude the raw, broken SQL.
- Provide the Context: Tell it what you were trying to do. For example: “I’m trying to get the total sales for each product category, but this query is giving me a single row with a massive number.”
- Share the Schema (Optional but Recommended): A quick summary of your table structure can help Claude spot logical flaws like joining on the wrong key.
Example Prompt:
“Here is my SQL query that’s supposed to calculate monthly active users, but it’s throwing a ‘column ambiguity’ error in BigQuery. Can you identify the syntax errors, logical flaws, and suggest the corrected code?
My Query:
SELECT user_id, COUNT(DISTINCT session_id) as monthly_sessions, signup_date FROM `project.dataset.users` u JOIN `project.dataset.sessions` s ON u.id = s.user_id WHERE session_date >= '2025-01-01' GROUP BY user_id;Schema:
userstable hasidandsignup_date.sessionstable hasuser_idandsession_date.”
Claude will quickly identify that signup_date is in the SELECT clause but not in the GROUP BY clause, which is the logical flaw causing the ambiguity. It will then provide a corrected query, likely suggesting an aggregate function like MAX(u.signup_date) or moving the column to a subquery.
Performance Tuning and Optimization: From Slow to Swift
A query that works isn’t always a query that works well. In 2025, with data volumes and cloud compute costs continuing to climb, writing inefficient queries is a direct hit to your budget. Claude excels at analyzing query structure and suggesting performance enhancements.
How to Ask for Optimization:
Your prompt should focus on the goal of efficiency. You can ask it to review for common anti-patterns.
- Avoid
SELECT *: Ask Claude to rewrite a query to select only the necessary columns. This is the simplest way to reduce data scanned. - Suggest Indexes: While Claude can’t create indexes for you, you can prompt it: “Based on this query’s
WHEREandJOINclauses, what indexes would you recommend for theordersandcustomerstables to improve performance?” - Rewrite for Readability and Efficiency: Ask it to use Common Table Expressions (CTEs) to break down complex logic. As I’ve found in my own work, a CTE-based query is not only easier for a human to understand but often allows the query optimizer to work more effectively.
Golden Nugget: The most powerful optimization prompt I use is: “Rewrite this query to be more performant in a modern cloud data warehouse. Explain why each change you made will improve execution time and reduce cost.” This forces the AI to articulate its reasoning, teaching you optimization principles while it fixes your code.
Unit Testing and Data Validation: Trust, But Verify
Before you schedule a complex query to run daily, you need to be 100% confident in its output. Running a massive, untested query on production data is a recipe for bad reports and wasted compute. The smartest approach is to use Claude to generate a validation plan.
A Workflow for Validation:
- Generate Sample Data: Ask Claude to create a small, predictable dataset that mimics your table structure.
- Run Your Query: Execute your complex query against this sample data.
- Validate the Logic: Ask Claude to write a separate, simple
SELECTstatement that manually calculates the expected result based on your sample data. Compare the two outputs.
Example Prompt:
“I need to validate my query that calculates a 30-day rolling average of sales. Please generate a sample dataset for a
salestable withsale_dateandamountfor 5 sales over 35 days. Then, write a simple query to calculate the rolling average for day 31 manually, so I can verify my complex query’s output.”
This process acts as a unit test for your SQL. It ensures your logic holds up against known inputs and gives you the confidence to deploy your query to production. By combining debugging, optimization, and validation, you move from simply writing SQL to engineering reliable, cost-effective data solutions.
Real-World Case Studies: Applying AI Prompts to Common Business Problems
The true power of AI for SQL query generation isn’t in writing simple SELECT * statements. It’s in tackling the messy, multi-table, logic-heavy problems that define your actual job. You know the ones: the cohort analysis that requires multiple self-joins, the MRR churn calculation that needs to track subscription state changes over time, or the marketing attribution query that joins a dozen tables. This is where a well-crafted prompt transforms from a convenience into a critical business tool.
Let’s move beyond theory and dive into three common scenarios. For each, I’ll show you the prompt, the resulting SQL, and—most importantly—break down the logic so you can see exactly what’s happening under the hood.
E-commerce: Analyzing Customer Cohorts and Lifetime Value
A classic but notoriously tricky analysis is calculating the Month 1, Month 2, and Month 3 retention rates for customers who first purchased in a given month. A naive approach often fails to correctly group and track users over time.
The Prompt:
“I need a PostgreSQL query for cohort analysis. I have two tables:
users(user_id, created_at)orders(order_id, user_id, order_date, order_total)The goal is to calculate the percentage of users who made a purchase in their signup month (Month 0), the following month (Month 1), and the month after that (Month 2). Group the results by the user’s signup month. Please use Common Table Expressions (CTEs) to structure the logic step-by-step and explain each part.”
The Generated SQL & Logic Breakdown:
Claude would likely generate a query like this, which we can then dissect:
-- CTE 1: Identify each user's first purchase month (their cohort)
WITH user_cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', MIN(order_date)) AS cohort_month
FROM orders
GROUP BY user_id
),
-- CTE 2: Calculate the month offset for each subsequent purchase
purchase_months AS (
SELECT
uc.cohort_month,
o.user_id,
-- Calculate difference in months between cohort and purchase
DATE_PART('month', AGE(DATE_TRUNC('month', o.order_date), uc.cohort_month)) AS month_number
FROM orders o
JOIN user_cohorts uc ON o.user_id = uc.user_id
GROUP BY 1, 2, 3
),
-- CTE 3: Count unique users for each month offset
cohort_size AS (
SELECT
cohort_month,
month_number,
COUNT(DISTINCT user_id) AS active_users
FROM purchase_months
GROUP BY 1, 2
)
-- Final Step: Pivot the data to show retention by month
SELECT
cohort_month,
-- Get the total cohort size (Month 0 count) for percentage calculation
MAX(CASE WHEN month_number = 0 THEN active_users END) AS initial_users,
-- Calculate retention for Month 1
ROUND(100.0 * MAX(CASE WHEN month_number = 1 THEN active_users END) / MAX(CASE WHEN month_number = 0 THEN active_users END), 2) AS month_1_retention,
-- Calculate retention for Month 2
ROUND(100.0 * MAX(CASE WHEN month_number = 2 THEN active_users END) / MAX(CASE WHEN month_number = 0 THEN active_users END), 2) AS month_2_retention
FROM cohort_size
GROUP BY 1
ORDER BY 1 DESC;
user_cohortsCTE: This is the foundation. It finds the very first purchase date for every user and truncates it to the month. This becomes that user’s “cohort.” Without this, you can’t track them.purchase_monthsCTE: Here’s the core logic. For every order a user places, we calculate themonth_number(0, 1, 2, etc.) by comparing the order’s month to theircohort_month. This is the key to tracking them over time.cohort_sizeCTE: This simply counts how many unique users from each cohort were active in eachmonth_number. It’s the raw data we need for the final percentages.- Final
SELECT: This is the presentation layer. We use conditional aggregation (CASE WHEN) to pivot the rows fromcohort_sizeinto columns (month_1_retention,month_2_retention) and calculate the percentages.
Golden Nugget: When prompting for cohort analysis, always ask the AI to “find the first event date” in a separate CTE first. This single step prevents a world of pain where you accidentally count a user’s second purchase as their “first” in the cohort, completely corrupting your retention numbers.
SaaS: Calculating Monthly Recurring Revenue (MRR) Churn
Calculating MRR churn isn’t just about counting cancellations. It’s about tracking the status of every subscription over time to see which ones downgrade, upgrade, or churn completely. This requires a stateful analysis that’s perfect for AI assistance.
The Prompt:
“Write a query to calculate net MRR churn for the last 6 months. Assume a
subscriptionstable withsubscription_id,user_id,mrr, andstatus(active, canceled, paused). Also, apaymentstable withpayment_id,subscription_id, andpayment_date. We need to identify the MRR from subscriptions that were active at the start of each month but were canceled by the end of it. The final output should bemonth,starting_mrr,churned_mrr, andchurn_rate.”
The Generated SQL & Logic Breakdown:
This scenario is complex because it requires comparing a subscription’s state at two different points in time.
WITH monthly_activity AS (
-- Get the MRR status for every subscription at the start of each month
SELECT
subscription_id,
mrr,
DATE_TRUNC('month', payment_date) AS activity_month,
-- Use a window function to find the last payment date before month-end
MAX(payment_date) OVER (PARTITION BY subscription_id ORDER BY payment_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS last_payment_in_month
FROM payments
WHERE payment_date >= NOW() - INTERVAL '6 months'
),
churned_subscriptions AS (
-- Identify which subscriptions churned in a given month
SELECT
s.subscription_id,
s.mrr,
DATE_TRUNC('month', s.canceled_at) AS churn_month
FROM subscriptions s
WHERE s.status = 'canceled'
AND s.canceled_at >= NOW() - INTERVAL '6 months'
),
monthly_mrr_summary AS (
-- Calculate the starting MRR for each month
SELECT
DATE_TRUNC('month', series_month) AS report_month,
COALESCE(SUM(DISTINCT ma.mrr), 0) AS starting_mrr
FROM generate_series(NOW() - INTERVAL '6 months', NOW(), '1 month'::interval) series_month
LEFT JOIN monthly_activity ma ON ma.activity_month = DATE_TRUNC('month', series_month)
GROUP BY 1
),
monthly_churn_summary AS (
-- Calculate the churned MRR for each month
SELECT
churn_month,
SUM(mrr) AS churned_mrr
FROM churned_subscriptions
GROUP BY 1
)
-- Final Step: Combine starting MRR and churned MRR to calculate the rate
SELECT
mms.report_month,
mms.starting_mrr,
COALESCE(mcs.churned_mrr, 0) AS churned_mrr,
ROUND(100.0 * COALESCE(mcs.churned_mrr, 0) / NULLIF(mms.starting_mrr, 0), 2) AS churn_rate
FROM monthly_mrr_summary mms
LEFT JOIN monthly_churn_summary mcs ON mms.report_month = mcs.churn_month
ORDER BY 1;
monthly_activityCTE: This is the tricky part. We can’t just look at thesubscriptionstable’s final status. We need to know what was active during the month. By joining to thepaymentstable, we can see who made a payment that month, indicating activity. The window function helps pinpoint activity within the month.churned_subscriptionsCTE: This is more straightforward. It isolates the subscriptions that were explicitly canceled and when.monthly_mrr_summary&monthly_churn_summary: These CTEs aggregate the raw data into monthly totals for starting MRR and churned MRR, respectively.- Final
SELECT: This joins the two summaries together. TheCOALESCEandNULLIFfunctions are crucial here to handle months with zero churn or zero starting MRR, preventing division-by-zero errors.
Marketing: Identifying High-Value Lead Sources
Marketing teams constantly need to know which channels are driving revenue, not just leads. This means joining lead generation data (top of funnel) with closed-won deal data (bottom of funnel), which often live in different tables with different IDs.
The Prompt:
“I need a query to attribute closed-won revenue to the original marketing source. I have a
leadstable (lead_id,utm_source,created_at) and adealstable (deal_id,lead_email,amount,close_date,status). Theleads.emailanddeals.lead_emailare the join keys. Show me the total revenue and number of deals for eachutm_sourcefor deals closed in Q1 2025. Only include sources that generated at least one deal.”
The Generated SQL & Logic Breakdown:
This query is all about the JOIN and the GROUP BY.
SELECT
l.utm_source,
COUNT(DISTINCT d.deal_id) AS total_deals,
SUM(d.amount) AS total_revenue,
ROUND(SUM(d.amount) / COUNT(DISTINCT d.deal_id), 2) AS avg_deal_size
FROM deals d
-- Inner join ensures we only get deals that have a matching lead
INNER JOIN leads l ON d.lead_email = l.email
WHERE
d.status = 'closed-won'
AND d.close_date >= '2025-01-01'
AND d.close_date < '2025-04-01'
GROUP BY
l.utm_source
HAVING
COUNT(DISTINCT d.deal_id) >= 1
ORDER BY
total_revenue DESC;
INNER JOIN: This is the most important part of the query. AnINNER JOIN(the defaultJOIN) only returns rows where thelead_emailexists in both thedealsandleadstables. This automatically filters out any deals that can’t be traced back to a known marketing source, which is exactly what you want.WHEREClause: This filters the dataset down to only the relevant time period (Q1 2025) and the desired outcome (status = 'closed-won'). Pushing these filters early is key for performance.GROUP BY l.utm_source: This aggregates all the individual deals up to the level of their marketing source, allowing us toSUMthe revenue andCOUNTthe deals for each channel.HAVINGClause: While the prompt requested this, in practice, theINNER JOINalready guarantees this. It’s a good safety net but often redundant. This is a nuance you’d learn to refine in your prompts over time.
Conclusion: Integrating Claude into Your Data Analysis Toolkit
You’ve now moved beyond simply asking for a query. You’ve learned to architect a prompt. The journey from a vague question to a production-ready, optimized SQL statement hinges on three core principles we’ve explored: providing rich context, demanding explanations, and embracing iteration. Giving Claude the schema, data types, and business logic isn’t just helpful—it’s the difference between a guess and a guarantee. Asking for a breakdown of a nested sub-query transforms the AI from a black box into a transparent partner, allowing you to learn and verify the logic. And remember, the first prompt is rarely the final one; refining your request based on the output is where the real power lies.
The Future of AI-Assisted Data Analysis
We are witnessing a fundamental shift in how organizations interact with their data. The role of AI is not to replace data professionals but to democratize data access, empowering product managers, marketers, and executives to ask complex questions directly. This evolution accelerates the path from question to insight, fostering a true data-driven culture. The analyst’s role is elevated from a query-writer to a strategist and validator, focusing on defining the right problems and interpreting the results, while the AI handles the heavy lifting of code generation.
Your Next Steps: From Knowledge to Practice
The most effective way to internalize these techniques is to apply them immediately. Don’t wait for the perfect project.
- Start with your own data: Grab a simple query from your current workload.
- Apply the principles: Add context, ask for a CTE, and request a step-by-step explanation.
- Build your library: Save your most effective prompts. This becomes your personal, reusable toolkit for future analysis.
Golden Nugget: The single most powerful prompt you can add to any complex request is: “First, outline the logical steps you will take to solve this, then write the code.” This forces a chain-of-thought process that dramatically reduces errors and makes the final query auditable.
Performance Data
| Author | SEO Strategist |
|---|---|
| Focus | AI SQL Prompting |
| Tool | Claude AI |
| Year | 2026 Update |
| Goal | Query Accuracy & Speed |
Frequently Asked Questions
Q: Why is Claude better for SQL than other AI models
Claude excels at reasoning and context retention, acting like a senior analyst that explains the ‘why’ behind complex joins and window functions rather than just spitting out monolithic code
Q: What is the most common mistake when prompting AI for SQL
The biggest mistake is a lack of context; failing to provide schema details (table names, columns, data types) leads to incorrect guesses and syntax errors
Q: How can I optimize AI-generated SQL queries
You can ask Claude to critique its own work, suggest performance improvements, and debug errors by providing the specific error messages and query structure