Create your portfolio instantly & get job ready.

www.0portfolio.com
AIUnpacker

Best AI Prompts for SQL Query Generation with Claude

AIUnpacker

AIUnpacker

Editorial Team

28 min read
On This Page

TL;DR — Quick Summary

This guide provides the best AI prompts for SQL query generation using Claude, designed to help data professionals and non-technical users write flawless code. It covers strategies for complex queries like recursive CTEs and techniques to reduce syntax errors. Learn how to bridge the gap between business questions and executable SQL to save time and improve accuracy.

Get AI-Powered Summary

Let AI read and summarize this article for you in seconds.

Quick Answer

We identify the best AI prompts for SQL query generation with Claude by focusing on context and iterative refinement. Our approach ensures you get accurate, optimized code by treating the AI as a collaborative partner. This guide provides proven prompt strategies to eliminate syntax errors and bridge the gap between business questions and database logic.

The 'Explain First' Rule

Before asking Claude to write code, prompt it to explain the logic for the query. This forces the AI to articulate its reasoning and reveal flawed assumptions, preventing costly errors before you run a query.

Unlocking the Power of AI for SQL Mastery

You know that sinking feeling when a critical report is due, but your SQL query keeps throwing a syntax error, or worse, it runs for ten minutes and returns nonsensical data? You’re not alone. In 2025, the demand for SQL skills has exploded beyond the data team. Marketers analyzing campaign funnels, product managers tracking user behavior, and sales ops teams building forecasts are all expected to query databases directly. Yet, the learning curve remains brutal. A single misplaced comma, a misunderstood join logic, or an inefficient query that scans terabytes of data can derail your entire day and run up a significant cloud bill. This is where the gap between asking a business question and writing flawless SQL code becomes a major productivity killer.

This is precisely why we’re focusing on Claude for SQL query generation. While many AI models can write SQL, I’ve found Claude to be a superior partner for database tasks due to its exceptional reasoning and context retention. In my own workflow, I’ve seen it excel at deconstructing a complex request—like calculating a 6-month rolling average of customer retention—into logical, understandable steps. Unlike other models that might spit out a monolithic, unexplainable query, Claude acts more like a senior data analyst. It explains the why behind its choices, especially when dealing with nested sub-queries or intricate window functions, ensuring you not only get a working query but also understand the logic behind it.

In this guide, we’ll move beyond simple “write a query for X” prompts. We’ll explore a progression of techniques designed to make you a more effective and efficient SQL practitioner. Here’s what we’ll cover:

  • The Art of the Prompt: How to provide the necessary schema context and business logic to get accurate results on the first try.
  • Deconstructing Complexity: Using Claude to break down intimidating nested sub-queries and multi-layered joins into digestible parts.
  • From Generation to Optimization: Techniques for asking Claude to critique its own work, suggest performance improvements, and even debug errors you’re encountering.

Golden Nugget: The most powerful prompt I use isn’t “write this query.” It’s “explain the logic for this query, then write the code.” This forces the AI to articulate its reasoning first, which often reveals flawed assumptions before you ever run a costly query.

Mastering the Basics: Crafting Effective Prompts for Simple Queries

What’s the single biggest mistake that causes AI to generate incorrect SQL? It’s not a complex logic error; it’s a lack of context. You can ask a powerful model like Claude to “get me the list of top customers,” but without knowing your database structure, you’re asking for a guess. In my experience, that guess is wrong about 90% of the time. It might assume a customers table exists, that “top” means total_revenue, and that the data is in a sales table. This is why the “Context is King” principle is the non-negotiable foundation of effective prompt engineering for SQL.

When you provide schema details, you transform a vague request into a precise instruction. You’re not just telling Claude what you want; you’re showing it where to find it and how the pieces connect. This is the difference between getting a query that runs and a query that gives you the correct business answer.

The “Context is King” Principle: Why Schema is Your Secret Weapon

Think of it like giving directions to a taxi driver. Saying “take me to the airport” works if you’re in a one-airport city. But if you’re in a major hub with multiple airports, you need to specify which one, which terminal, and which entrance. Your database is that complex city. Your tables are the buildings, and your columns are the street addresses.

Providing the schema is the single most important step because it eliminates ambiguity. AI models don’t have telepathic access to your data dictionary. You must feed it the essential information. A robust prompt includes:

  • Table Names: The exact name of the tables involved (e.g., users, orders, order_items).
  • Key Column Names: The columns you’ll be filtering on or joining with (e.g., user_id, order_id, created_at).
  • Data Types: Briefly mentioning if a column is a DATE, VARCHAR, INTEGER, etc., can prevent logic errors, especially with functions like DATE_TRUNC or CAST.
  • Relationships: Explicitly stating the join keys is a game-changer (e.g., “orders table links to users via user_id”).

Golden Nugget: My go-to prompt structure for any new query starts with a simple schema dump. I’ll write: “users table: user_id (int), email (varchar), signup_date (timestamp). orders table: order_id (int), user_id (int), amount (decimal).” This 30-second step saves me 10 minutes of debugging and re-prompting. It’s the highest ROI activity in the entire process.

Prompting for SELECT, WHERE, and JOINs: From Vague to Precise

Let’s see this principle in action. The difference between a junior analyst’s prompt and a senior analyst’s prompt is all in the specificity.

Vague Request (The “Hope and Pray” Method):

“Hey, can you write a query to find our most valuable customers from last month?”

This is a recipe for disaster. Claude has to guess the table names, the definition of “valuable” (highest spend? most orders?), and the date format.

Precise Request (The Expert Method):

“I need a list of the top 10 customers by total spend for October 2025.

Schema:

  • customers table: customer_id (int), customer_name (varchar)
  • payments table: payment_id (int), customer_id (int), payment_amount (decimal), payment_date (timestamp)

Logic:

  1. Join payments to customers on customer_id.
  2. Filter where payment_date is between ‘2025-10-01’ and ‘2025-10-31’.
  3. Group by customer_id and customer_name.
  4. Sum the payment_amount to get total spend.
  5. Order by total spend in descending order and limit to 10.

Please provide the query in standard SQL.”

This prompt is a blueprint for success. It defines the business question, provides the necessary data dictionary, and outlines the logical steps. The AI isn’t guessing; it’s executing your well-defined plan. This same principle applies to filtering and joins. Instead of “filter for active users,” specify WHERE status = 'active'. Instead of “join users and orders,” specify LEFT JOIN orders ON users.id = orders.user_id.

Controlling Output Format: Dialects and Copy-Paste Readiness

Your work doesn’t end with a correct query; it needs to be runnable in your specific environment. A query written for MySQL might fail in PostgreSQL due to subtle syntax differences. You can eliminate this friction by explicitly telling Claude which SQL dialect you need.

This is also where you can request formatting that makes your life easier. A wall of unformatted SQL is hard to read and debug. Asking for clean, indented code is a simple instruction that pays dividends.

Here are two examples of how to control the output:

1. Prompting for a Specific Dialect:

“Write a query to calculate the average session duration per user. The data is in a table named user_sessions with columns user_id (int) and session_duration_seconds (int). Generate the query in T-SQL syntax for SQL Server.

2. Prompting for Readability:

“Using the schema from the previous example, write a query to find the total number of orders per customer. Please format the SQL with standard indentation and use aliases for all tables (e.g., c for customers, o for orders) to keep it concise and readable.

By mastering these foundational prompting techniques—providing rich context, being specific in your requests, and controlling the output format—you build a reliable workflow. You move from hoping the AI gets it right to architecting prompts that guarantee it.

Deconstructing Complexity: How to Generate and Understand Nested Sub-queries

Ever stared at a wall of nested SQL code and felt your brain short-circuit? You know what you want the data to tell you, but the labyrinth of parentheses and sub-queries feels like a foreign language. This is the point where most people either give up or copy-paste code they don’t fully trust. I’ve been there, and it’s a terrible way to work. The key isn’t to become a better SQL typist; it’s to become a better SQL architect. And with modern AI, you can have an architect on demand.

The real power of a tool like Claude isn’t just in writing the code, but in its ability to reason through the logic before a single line is executed. By changing how you ask, you transform the AI from a simple code generator into a collaborative partner that helps you think through the problem. This approach de-risks your analysis and builds your own expertise in the process.

The “Explain the Logic First” Technique

The most common mistake I see analysts make is jumping straight to the request: “Write a query to show me…” This is like asking a builder to construct a house without first looking at the blueprints. You’ll get a structure, but it might not be the one you need. My go-to strategy, honed from countless data projects, is to force the AI to articulate its reasoning first.

Instead of asking for code, I prompt for a step-by-step plan. It looks something like this:

“I need to find the top 10% of sales reps by revenue in Q3 2025, but only for clients who have been with us for over a year. Please first describe the logical steps you would take to solve this with SQL, without writing the code yet. Explain how you’ll handle the ranking and the client filtering.”

This prompt does several things. It forces the AI to break the problem into discrete parts: 1) identify long-term clients, 2) filter sales data for Q3, 3) join the datasets, 4) aggregate revenue per rep, and 5) apply a percentile ranking. By seeing this logic laid out in plain English, I can immediately spot flaws. Maybe the AI’s plan doesn’t account for a client’s start date correctly. I can correct it before it generates 50 lines of complex, potentially buggy code. This single change in prompting strategy saves me hours of debugging and is a cornerstone of using AI effectively.

Breaking Down Nested Queries with a Real-World Example

Let’s apply this to a classic scenario: finding your top-performing sales reps in a specific region, but with a twist. We only want reps whose total sales across all regions exceed the company average. This naturally leads to a nested query.

Here’s how I would prompt Claude to not only generate the query but also make it understandable:

“Generate a SQL query to find the top 5 sales reps (by total sales) who work in the ‘North America’ region. The catch: only include reps whose lifetime sales across all regions are greater than the average lifetime sales of all reps. Please use a nested sub-query for the average calculation and explain what each part of the query does.”

Claude’s response would typically break down into two parts: the explanation and the code.

The Explanation (Interpreted):

  1. Inner Query (SELECT AVG(lifetime_sales) ...): This is the sub-query. It first calculates a single number—the average lifetime sales for every single sales rep in the entire company, regardless of their region. This is your benchmark.
  2. Outer Query (SELECT rep_name, SUM(sales_amount) ...): This is the main query. It filters for reps in ‘North America’, groups them by name, calculates their total sales, and then compares that total against the benchmark number produced by the inner query.

The Generated SQL:

SELECT
    rep_name,
    SUM(sales_amount) AS region_total_sales
FROM
    sales_data
WHERE
    region = 'North America'
    AND rep_name IN (
        -- Sub-query to get reps whose lifetime sales are above average
        SELECT rep_name
        FROM sales_data
        GROUP BY rep_name
        HAVING SUM(sales_amount) > (SELECT AVG(lifetime_sales) FROM company_rep_summary)
    )
GROUP BY
    rep_name
ORDER BY
    region_total_sales DESC
LIMIT 5;

Seeing the logic explained separately from the code is a game-changer. It builds trust. You understand why the WHERE ... IN (...) clause is there and what the sub-query is feeding it. You’re no longer just running code; you’re executing a validated plan.

Using CTEs (Common Table Expressions) for Clarity

Nested sub-queries work, but they can be a nightmare to read or modify later. As your queries get more complex, you end up with a “pyramid of doom” that’s hard to debug. A more modern, readable, and maintainable approach is to use Common Table Expressions (CTEs). The WITH clause lets you break the logic into named, sequential steps.

The prompt here is key. You need to guide the AI to restructure its thinking.

“Rewrite the previous query using Common Table Expressions (CTEs) instead of nested sub-queries. Name the CTEs clearly to show the logical steps, like ‘RegionalSales’ and ‘AboveAverageReps’.”

The AI will translate the nested logic into a clean, linear flow:

-- Step 1: Calculate the company-wide average lifetime sales benchmark
WITH CompanyBenchmark AS (
    SELECT AVG(lifetime_sales) AS avg_sales
    FROM company_rep_summary
),

-- Step 2: Identify all reps whose lifetime sales exceed that benchmark
AboveAverageReps AS (
    SELECT rep_name
    FROM sales_data
    GROUP BY rep_name
    HAVING SUM(sales_amount) > (SELECT avg_sales FROM CompanyBenchmark)
)

-- Step 3: Get the final results for North America, filtering by the list from Step 2
SELECT
    s.rep_name,
    SUM(s.sales_amount) AS region_total_sales
FROM
    sales_data s
JOIN
    AboveAverageReps aar ON s.rep_name = aar.rep_name
WHERE
    s.region = 'North America'
GROUP BY
    s.rep_name
ORDER BY
    region_total_sales DESC
LIMIT 5;

By prompting for CTEs, you’re not just getting cleaner code. You’re building a query that reads like a recipe. Each CTE is a distinct preparation step, and the final SELECT is the assembly. This is far easier to hand off to a colleague, revisit six months later, or modify for a slightly different question. It’s a hallmark of professional, production-ready SQL.

Advanced Prompting Strategies: Window Functions, CTEs, and Recursive Queries

You’ve mastered the basics, but now you’re facing queries that require more than a simple SELECT and WHERE. Your boss needs a running total for a financial report, or you need to untangle a complex organizational hierarchy. This is where basic prompting fails and advanced strategy begins. How do you communicate this multi-layered logic to an AI without it hallucinating or producing an unrunnable mess? The answer lies in treating Claude like a senior developer: you don’t just give it a task, you provide a blueprint.

Unlocking Window Functions: From Business Question to Analytical Power

Window functions are notorious for being difficult to write from scratch because they require you to think about data in “windows” or partitions. A common mistake is to ask the AI vaguely, “Give me a rank of products.” This often leads to incorrect results because the AI doesn’t know how to partition the data. Instead, you need to frame the prompt with the business question and the specific logic.

Here’s a template I use frequently for a running total, a classic window function:

Prompt Template: Running Total “Write a SQL query to calculate a running total of daily_sales from the sales table. The table has columns sale_date and amount. I need the running total to reset for each month. Please use a window function with PARTITION BY for the month and ORDER BY the sale date. Explain the ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW clause in the context of this calculation.”

This prompt works because it specifies the function (running total), the partitioning logic (reset for each month), and the ordering (ORDER BY sale_date). By asking for an explanation of the frame clause, you force the AI to articulate the logic, ensuring you understand why the numbers are correct before you run it.

Prompting for Multi-layered CTEs: Building a Logical Query Chain

When a query requires multiple steps, dumping it all into a single CTE is a recipe for confusion. The key is to guide the AI to build the query sequentially, just as a human analyst would. You define the logical flow from one CTE to the next, making the output auditable and easy to modify.

Imagine you need to calculate the Month-over-Month (MoM) growth rate of active users. This requires first identifying active users per month, then lagging that count to calculate the growth. Here’s how you’d structure the prompt:

Prompt Template: Multi-layered CTE “Generate a SQL query to calculate the Month-over-Month growth rate of active users. Use Common Table Expressions (CTEs) for clarity.

CTE 1 (monthly_active_users): First, count the distinct user_id for each month from the events table. Filter for events after January 1, 2025.

CTE 2 (growth_calculation): Join monthly_active_users to itself using a window function (LAG) to get the previous month’s user count. Calculate the MoM growth percentage.

Final SELECT: Return the current month, current user count, previous user count, and the calculated growth rate. Explain the purpose of each CTE in the final output.”

This prompt acts as a project manager, assigning specific tasks to each CTE. It explicitly defines the data flow: CTE 1 prepares the data, CTE 2 performs the calculation, and the final SELECT presents the results. This structure is invaluable for complex business logic and is a hallmark of production-ready code.

Handling Hierarchical Data with Recursive Queries: Taming the Trees

Recursive queries are the final boss of SQL generation. They are essential for organizational charts, bill of materials (BOM), or any parent-child relationship. A vague prompt like “show me the org chart” will almost certainly fail. You must provide the base case and the recursive step.

Here is a specialized prompt structure for generating an employee organizational chart, a classic recursive CTE example:

Prompt Template: Recursive CTE “Write a recursive CTE in SQL to map an employee’s management chain.

Table Schema: employees with columns employee_id, manager_id, and employee_name.

Logic:

  1. Anchor Member: Start with a specific employee, for example employee_id = 101. Select their details.
  2. Recursive Member: Join the employees table to the CTE itself, linking employee.manager_id to cte.employee_id to find the next manager up the chain.
  3. Termination Condition: The recursion stops when a manager_id is NULL.

Please output the full query and explain how the UNION ALL connects the anchor and recursive members.”

By breaking down the recursion into its core components—anchor, recursive join, and termination condition—you provide the AI with a foolproof recipe. This prevents it from getting stuck in an infinite loop or misinterpreting the join logic, saving you significant debugging time and frustration.

From Prompt to Production: Debugging and Optimizing Your SQL with Claude

Even the most skilled data professionals write flawed SQL. The difference between a junior analyst and a seasoned pro isn’t a lack of errors; it’s the speed and precision with which they find and fix them. This is where Claude transforms from a query generator into a production-ready debugging partner. By treating it as a collaborative code review tool, you can catch costly mistakes before they hit your production environment.

The “Fix My Query” Prompt: Your Instant Code Reviewer

We’ve all been there: a query that looks perfect but throws a cryptic syntax error or, worse, returns nonsensical data. Instead of staring at the screen for an hour, you can get a diagnosis in seconds. The key is to provide context, not just the broken code.

A Workflow for Debugging:

  1. Paste the Error: Start by giving Claude the raw, broken SQL.
  2. Provide the Context: Tell it what you were trying to do. For example: “I’m trying to get the total sales for each product category, but this query is giving me a single row with a massive number.”
  3. Share the Schema (Optional but Recommended): A quick summary of your table structure can help Claude spot logical flaws like joining on the wrong key.

Example Prompt:

“Here is my SQL query that’s supposed to calculate monthly active users, but it’s throwing a ‘column ambiguity’ error in BigQuery. Can you identify the syntax errors, logical flaws, and suggest the corrected code?

My Query:

SELECT
  user_id,
  COUNT(DISTINCT session_id) as monthly_sessions,
  signup_date
FROM
  `project.dataset.users` u
  JOIN `project.dataset.sessions` s ON u.id = s.user_id
WHERE
  session_date >= '2025-01-01'
GROUP BY
  user_id;

Schema: users table has id and signup_date. sessions table has user_id and session_date.”

Claude will quickly identify that signup_date is in the SELECT clause but not in the GROUP BY clause, which is the logical flaw causing the ambiguity. It will then provide a corrected query, likely suggesting an aggregate function like MAX(u.signup_date) or moving the column to a subquery.

Performance Tuning and Optimization: From Slow to Swift

A query that works isn’t always a query that works well. In 2025, with data volumes and cloud compute costs continuing to climb, writing inefficient queries is a direct hit to your budget. Claude excels at analyzing query structure and suggesting performance enhancements.

How to Ask for Optimization:

Your prompt should focus on the goal of efficiency. You can ask it to review for common anti-patterns.

  • Avoid SELECT *: Ask Claude to rewrite a query to select only the necessary columns. This is the simplest way to reduce data scanned.
  • Suggest Indexes: While Claude can’t create indexes for you, you can prompt it: “Based on this query’s WHERE and JOIN clauses, what indexes would you recommend for the orders and customers tables to improve performance?”
  • Rewrite for Readability and Efficiency: Ask it to use Common Table Expressions (CTEs) to break down complex logic. As I’ve found in my own work, a CTE-based query is not only easier for a human to understand but often allows the query optimizer to work more effectively.

Golden Nugget: The most powerful optimization prompt I use is: “Rewrite this query to be more performant in a modern cloud data warehouse. Explain why each change you made will improve execution time and reduce cost.” This forces the AI to articulate its reasoning, teaching you optimization principles while it fixes your code.

Unit Testing and Data Validation: Trust, But Verify

Before you schedule a complex query to run daily, you need to be 100% confident in its output. Running a massive, untested query on production data is a recipe for bad reports and wasted compute. The smartest approach is to use Claude to generate a validation plan.

A Workflow for Validation:

  1. Generate Sample Data: Ask Claude to create a small, predictable dataset that mimics your table structure.
  2. Run Your Query: Execute your complex query against this sample data.
  3. Validate the Logic: Ask Claude to write a separate, simple SELECT statement that manually calculates the expected result based on your sample data. Compare the two outputs.

Example Prompt:

“I need to validate my query that calculates a 30-day rolling average of sales. Please generate a sample dataset for a sales table with sale_date and amount for 5 sales over 35 days. Then, write a simple query to calculate the rolling average for day 31 manually, so I can verify my complex query’s output.”

This process acts as a unit test for your SQL. It ensures your logic holds up against known inputs and gives you the confidence to deploy your query to production. By combining debugging, optimization, and validation, you move from simply writing SQL to engineering reliable, cost-effective data solutions.

Real-World Case Studies: Applying AI Prompts to Common Business Problems

The true power of AI for SQL query generation isn’t in writing simple SELECT * statements. It’s in tackling the messy, multi-table, logic-heavy problems that define your actual job. You know the ones: the cohort analysis that requires multiple self-joins, the MRR churn calculation that needs to track subscription state changes over time, or the marketing attribution query that joins a dozen tables. This is where a well-crafted prompt transforms from a convenience into a critical business tool.

Let’s move beyond theory and dive into three common scenarios. For each, I’ll show you the prompt, the resulting SQL, and—most importantly—break down the logic so you can see exactly what’s happening under the hood.

E-commerce: Analyzing Customer Cohorts and Lifetime Value

A classic but notoriously tricky analysis is calculating the Month 1, Month 2, and Month 3 retention rates for customers who first purchased in a given month. A naive approach often fails to correctly group and track users over time.

The Prompt:

“I need a PostgreSQL query for cohort analysis. I have two tables:

  1. users (user_id, created_at)
  2. orders (order_id, user_id, order_date, order_total)

The goal is to calculate the percentage of users who made a purchase in their signup month (Month 0), the following month (Month 1), and the month after that (Month 2). Group the results by the user’s signup month. Please use Common Table Expressions (CTEs) to structure the logic step-by-step and explain each part.”

The Generated SQL & Logic Breakdown:

Claude would likely generate a query like this, which we can then dissect:

-- CTE 1: Identify each user's first purchase month (their cohort)
WITH user_cohorts AS (
    SELECT
        user_id,
        DATE_TRUNC('month', MIN(order_date)) AS cohort_month
    FROM orders
    GROUP BY user_id
),

-- CTE 2: Calculate the month offset for each subsequent purchase
purchase_months AS (
    SELECT
        uc.cohort_month,
        o.user_id,
        -- Calculate difference in months between cohort and purchase
        DATE_PART('month', AGE(DATE_TRUNC('month', o.order_date), uc.cohort_month)) AS month_number
    FROM orders o
    JOIN user_cohorts uc ON o.user_id = uc.user_id
    GROUP BY 1, 2, 3
),

-- CTE 3: Count unique users for each month offset
cohort_size AS (
    SELECT
        cohort_month,
        month_number,
        COUNT(DISTINCT user_id) AS active_users
    FROM purchase_months
    GROUP BY 1, 2
)

-- Final Step: Pivot the data to show retention by month
SELECT
    cohort_month,
    -- Get the total cohort size (Month 0 count) for percentage calculation
    MAX(CASE WHEN month_number = 0 THEN active_users END) AS initial_users,
    -- Calculate retention for Month 1
    ROUND(100.0 * MAX(CASE WHEN month_number = 1 THEN active_users END) / MAX(CASE WHEN month_number = 0 THEN active_users END), 2) AS month_1_retention,
    -- Calculate retention for Month 2
    ROUND(100.0 * MAX(CASE WHEN month_number = 2 THEN active_users END) / MAX(CASE WHEN month_number = 0 THEN active_users END), 2) AS month_2_retention
FROM cohort_size
GROUP BY 1
ORDER BY 1 DESC;
  • user_cohorts CTE: This is the foundation. It finds the very first purchase date for every user and truncates it to the month. This becomes that user’s “cohort.” Without this, you can’t track them.
  • purchase_months CTE: Here’s the core logic. For every order a user places, we calculate the month_number (0, 1, 2, etc.) by comparing the order’s month to their cohort_month. This is the key to tracking them over time.
  • cohort_size CTE: This simply counts how many unique users from each cohort were active in each month_number. It’s the raw data we need for the final percentages.
  • Final SELECT: This is the presentation layer. We use conditional aggregation (CASE WHEN) to pivot the rows from cohort_size into columns (month_1_retention, month_2_retention) and calculate the percentages.

Golden Nugget: When prompting for cohort analysis, always ask the AI to “find the first event date” in a separate CTE first. This single step prevents a world of pain where you accidentally count a user’s second purchase as their “first” in the cohort, completely corrupting your retention numbers.

SaaS: Calculating Monthly Recurring Revenue (MRR) Churn

Calculating MRR churn isn’t just about counting cancellations. It’s about tracking the status of every subscription over time to see which ones downgrade, upgrade, or churn completely. This requires a stateful analysis that’s perfect for AI assistance.

The Prompt:

“Write a query to calculate net MRR churn for the last 6 months. Assume a subscriptions table with subscription_id, user_id, mrr, and status (active, canceled, paused). Also, a payments table with payment_id, subscription_id, and payment_date. We need to identify the MRR from subscriptions that were active at the start of each month but were canceled by the end of it. The final output should be month, starting_mrr, churned_mrr, and churn_rate.”

The Generated SQL & Logic Breakdown:

This scenario is complex because it requires comparing a subscription’s state at two different points in time.

WITH monthly_activity AS (
    -- Get the MRR status for every subscription at the start of each month
    SELECT
        subscription_id,
        mrr,
        DATE_TRUNC('month', payment_date) AS activity_month,
        -- Use a window function to find the last payment date before month-end
        MAX(payment_date) OVER (PARTITION BY subscription_id ORDER BY payment_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS last_payment_in_month
    FROM payments
    WHERE payment_date >= NOW() - INTERVAL '6 months'
),

churned_subscriptions AS (
    -- Identify which subscriptions churned in a given month
    SELECT
        s.subscription_id,
        s.mrr,
        DATE_TRUNC('month', s.canceled_at) AS churn_month
    FROM subscriptions s
    WHERE s.status = 'canceled'
      AND s.canceled_at >= NOW() - INTERVAL '6 months'
),

monthly_mrr_summary AS (
    -- Calculate the starting MRR for each month
    SELECT
        DATE_TRUNC('month', series_month) AS report_month,
        COALESCE(SUM(DISTINCT ma.mrr), 0) AS starting_mrr
    FROM generate_series(NOW() - INTERVAL '6 months', NOW(), '1 month'::interval) series_month
    LEFT JOIN monthly_activity ma ON ma.activity_month = DATE_TRUNC('month', series_month)
    GROUP BY 1
),

monthly_churn_summary AS (
    -- Calculate the churned MRR for each month
    SELECT
        churn_month,
        SUM(mrr) AS churned_mrr
    FROM churned_subscriptions
    GROUP BY 1
)

-- Final Step: Combine starting MRR and churned MRR to calculate the rate
SELECT
    mms.report_month,
    mms.starting_mrr,
    COALESCE(mcs.churned_mrr, 0) AS churned_mrr,
    ROUND(100.0 * COALESCE(mcs.churned_mrr, 0) / NULLIF(mms.starting_mrr, 0), 2) AS churn_rate
FROM monthly_mrr_summary mms
LEFT JOIN monthly_churn_summary mcs ON mms.report_month = mcs.churn_month
ORDER BY 1;
  • monthly_activity CTE: This is the tricky part. We can’t just look at the subscriptions table’s final status. We need to know what was active during the month. By joining to the payments table, we can see who made a payment that month, indicating activity. The window function helps pinpoint activity within the month.
  • churned_subscriptions CTE: This is more straightforward. It isolates the subscriptions that were explicitly canceled and when.
  • monthly_mrr_summary & monthly_churn_summary: These CTEs aggregate the raw data into monthly totals for starting MRR and churned MRR, respectively.
  • Final SELECT: This joins the two summaries together. The COALESCE and NULLIF functions are crucial here to handle months with zero churn or zero starting MRR, preventing division-by-zero errors.

Marketing: Identifying High-Value Lead Sources

Marketing teams constantly need to know which channels are driving revenue, not just leads. This means joining lead generation data (top of funnel) with closed-won deal data (bottom of funnel), which often live in different tables with different IDs.

The Prompt:

“I need a query to attribute closed-won revenue to the original marketing source. I have a leads table (lead_id, email, utm_source, created_at) and a deals table (deal_id, lead_email, amount, close_date, status). The leads.email and deals.lead_email are the join keys. Show me the total revenue and number of deals for each utm_source for deals closed in Q1 2025. Only include sources that generated at least one deal.”

The Generated SQL & Logic Breakdown:

This query is all about the JOIN and the GROUP BY.

SELECT
    l.utm_source,
    COUNT(DISTINCT d.deal_id) AS total_deals,
    SUM(d.amount) AS total_revenue,
    ROUND(SUM(d.amount) / COUNT(DISTINCT d.deal_id), 2) AS avg_deal_size
FROM deals d
-- Inner join ensures we only get deals that have a matching lead
INNER JOIN leads l ON d.lead_email = l.email
WHERE
    d.status = 'closed-won'
    AND d.close_date >= '2025-01-01'
    AND d.close_date < '2025-04-01'
GROUP BY
    l.utm_source
HAVING
    COUNT(DISTINCT d.deal_id) >= 1
ORDER BY
    total_revenue DESC;
  • INNER JOIN: This is the most important part of the query. An INNER JOIN (the default JOIN) only returns rows where the lead_email exists in both the deals and leads tables. This automatically filters out any deals that can’t be traced back to a known marketing source, which is exactly what you want.
  • WHERE Clause: This filters the dataset down to only the relevant time period (Q1 2025) and the desired outcome (status = 'closed-won'). Pushing these filters early is key for performance.
  • GROUP BY l.utm_source: This aggregates all the individual deals up to the level of their marketing source, allowing us to SUM the revenue and COUNT the deals for each channel.
  • HAVING Clause: While the prompt requested this, in practice, the INNER JOIN already guarantees this. It’s a good safety net but often redundant. This is a nuance you’d learn to refine in your prompts over time.

Conclusion: Integrating Claude into Your Data Analysis Toolkit

You’ve now moved beyond simply asking for a query. You’ve learned to architect a prompt. The journey from a vague question to a production-ready, optimized SQL statement hinges on three core principles we’ve explored: providing rich context, demanding explanations, and embracing iteration. Giving Claude the schema, data types, and business logic isn’t just helpful—it’s the difference between a guess and a guarantee. Asking for a breakdown of a nested sub-query transforms the AI from a black box into a transparent partner, allowing you to learn and verify the logic. And remember, the first prompt is rarely the final one; refining your request based on the output is where the real power lies.

The Future of AI-Assisted Data Analysis

We are witnessing a fundamental shift in how organizations interact with their data. The role of AI is not to replace data professionals but to democratize data access, empowering product managers, marketers, and executives to ask complex questions directly. This evolution accelerates the path from question to insight, fostering a true data-driven culture. The analyst’s role is elevated from a query-writer to a strategist and validator, focusing on defining the right problems and interpreting the results, while the AI handles the heavy lifting of code generation.

Your Next Steps: From Knowledge to Practice

The most effective way to internalize these techniques is to apply them immediately. Don’t wait for the perfect project.

  • Start with your own data: Grab a simple query from your current workload.
  • Apply the principles: Add context, ask for a CTE, and request a step-by-step explanation.
  • Build your library: Save your most effective prompts. This becomes your personal, reusable toolkit for future analysis.

Golden Nugget: The single most powerful prompt you can add to any complex request is: “First, outline the logical steps you will take to solve this, then write the code.” This forces a chain-of-thought process that dramatically reduces errors and makes the final query auditable.

Performance Data

Author SEO Strategist
Focus AI SQL Prompting
Tool Claude AI
Year 2026 Update
Goal Query Accuracy & Speed

Frequently Asked Questions

Q: Why is Claude better for SQL than other AI models

Claude excels at reasoning and context retention, acting like a senior analyst that explains the ‘why’ behind complex joins and window functions rather than just spitting out monolithic code

Q: What is the most common mistake when prompting AI for SQL

The biggest mistake is a lack of context; failing to provide schema details (table names, columns, data types) leads to incorrect guesses and syntax errors

Q: How can I optimize AI-generated SQL queries

You can ask Claude to critique its own work, suggest performance improvements, and debug errors by providing the specific error messages and query structure

Stay ahead of the curve.

Join 150k+ engineers receiving weekly deep dives on AI workflows, tools, and prompt engineering.

AIUnpacker

AIUnpacker Editorial Team

Verified

Collective of engineers, researchers, and AI practitioners dedicated to providing unbiased, technically accurate analysis of the AI ecosystem.

Reading Best AI Prompts for SQL Query Generation with Claude

250+ Job Search & Interview Prompts

Master your job search and ace interviews with AI-powered prompts.