How AI Chatbots Turn Messy Excel Sheets into Clean, Actionable Data

Graziano Stefanelli
Apr 29
3 min read

A hands-on walkthrough with GPT-4o, Claude 3.5, and Gemini 1.5 Pro

Why This Matters

Even the best analysts waste hours fixing spreadsheets: duplicate rows; inconsistent naming; dates in multiple formats; blank cells where numbers should live. Modern AI chatbots can compress that data-cleaning slog into a handful of smart prompts. In this article you’ll see exactly how to do it—three different ways—using the 2025 versions of GPT-4o (OpenAI), Claude 3.5 Sonnet (Anthropic) and Gemini 1.5 Pro (Google).

______________

The Common Workflow at a Glance

Upload the raw XLSX
Describe the problems you see (duplicates, wrong date formats, etc.)
Ask the chatbot for a preview of its cleaning plan
Approve / tweak the plan
Generate the cleaned file and download it

The devil, of course, is in the prompt details—let’s dive in.

Method 1 – Fast, Iterative Fixes with GPT-4o

Prompt template: “You are a data quality assistant. Clean this spreadsheet so it is ready for a customer-lifetime-value model. • Remove exact duplicate rows based on the Email column; • Standardize date fields to ISO yyyy-mm-dd; • Convert the ‘Country’ column to two-letter ISO codes; • Flag rows missing ‘Annual Spend’ and fill with the median of that column; • Return a new workbook with a single sheet called clean_customers.Show me a summary of the changes before you export.”

Why GPT-4o shines

Speed – transforms ten-thousand-row sheets in seconds;
Natural follow-ups – “Undo step 3 and show the difference” works reliably;
Python under the hood – if you add “show code,” you’ll get pandas scripts you can reuse.

Watch-outs

Tends to infer column types; if your headers are vague, spell them out in the prompt;
Larger than ~50 MB? The session may cut off—split files first.

Method 2 – Context-Rich Validation with Claude 3.5 Sonnet

Prompt template: “Act as a senior data steward. I’ve uploaded 2024_sales.xlsx. Audit the sheet for structural issues (merged cells, mixed numeric/text types, blank headers); Propose a cleansing spec in Markdown; Wait for my approval; Apply the spec and return a cleaned file plus an ‘audit_log’ sheet summarizing every change.”

Why Claude 3.5 shines

Longest context window – reads multi-tab financial models without truncation;
Human-style spec first – you get a bulletproof plan to share with teammates or auditors;
Granular change log – perfect for regulated industries or grant reporting.

Watch-outs

Slower export time once the spec is approved;
Occasionally over-verbose—trim with, “Give me only key actions, no commentary.”

Method 3 – Table & Data-Type Mastery with Gemini 1.5 Pro

Prompt template: “Clean this marketing-spend file so it is ready for time-series modeling. • Detect and convert any numbers stored as text; • Identify outliers >3 σ and highlight in red; • Unpivot monthly spend columns into a date / channel / spend long format; • Save the tidy data as a new sheet named melted_spend.”

Why Gemini 1.5 shines

Table structure detection – spots hidden totals, header rows in the wrong place;
Built-in stats helpers – natively flags outliers, missingness heatmaps, etc.;
Strong ‘unpivot / reshape’ prompts – ideal for BI tool ingestion.

Watch-outs

Max sheet size ~30 MB before timing out;
Doesn’t yet export intermediate Python/R code, only final file.

____________

____________________

Pro Tips That Work Across All Three

Ground truth early – paste two sample bad rows and the perfect version you want; the model will generalize;
Name sheets explicitly – avoid default “Sheet 1 (2)”;
Iterate, don’t monologue – treat it like pair programming: ask, review, tweak;
Save versions – keep the raw file, chatbot-cleaned file, and final manually-verified file.