Excel Tips

Data Cleaning in Excel: A Realistic Walkthrough for Analysts

A practical guide to cleaning messy data in Excel, covering version control, formatting cleanup, and string manipulation techniques.

6 min read

The Least Discussed Topic in Data: Data Cleaning in Excel

What's the one thing the vast majority of data professionals say takes up most of their time?

Data cleaning in Excel.

And yet it's among the least discussed topics in the data sphere. Everyone wants to talk about insights and visualizations. Nobody wants to talk about the three hours you spent fixing date formats.

According to Wikipedia: Data cleansing is the process of detecting and correcting or removing corrupt, inaccurate records from a record set, table, or database.

But here's the part that's actually rewarding: data wrangling—transforming and mapping data from one raw form into another format with the intent of making it more appropriate and valuable.

Data analysts typically spend the majority of their time in data wrangling compared to actual analysis. This is where the transformation of data actually begins.

First Things First: Version Control

When I receive a data file, the first thing I do is save it under a different name.

Original: source_data.xlsx
New: volume_by_region_data_request_v1.xlsx

Why? Because clean data requires documentation. If anything happens to the file, or if I need to reference what version I was on, I have that history.

This also lets me create checkpoints throughout the process. If I muck something up, I can go back.

Getting Familiar with the File

Before touching anything, I want to understand what I'm working with:

  1. How many rows? Ctrl + Shift + End to see the extent of the data
  2. What's in each column? Scan the headers
  3. Any obvious issues? Look for:
    • Numbers that are left-aligned (probably stored as text)
    • Dates that look like text
    • Green triangles indicating errors
    • Blank cells where there shouldn't be any

Preserving Raw Data

Before cleaning, I make copies of the raw data tabs and hide them.

Why? If I ever need to cross-reference where I started, or if I've deleted something I shouldn't have, I can look back at the source data.

Steps:
1. Ctrl + Click on tab to copy
2. Rename with "_OG" suffix (for "original")
3. Right-click > Hide

These are my backups. In most cases, I'll never look at them again. But they're there just in case.

Quick Formatting Cleanup

Start with basic hygiene:

ShortcutAction
Ctrl + A, Ctrl + ASelect entire sheet
Alt + H + WToggle text wrap
Alt + H + O + IAuto-fit column width
Ctrl + TConvert to data table

Data tables are powerful because they:

  • Automatically expand when you add data
  • Allow structured references in formulas
  • Make it easy to filter and sort

Fixing Common Data Issues

Fill Down Blank Cells

Many systems export data with blanks where values should repeat:

Client A | Jan | 100
         | Feb | 150
         | Mar | 200
Client B | Jan | 80

To fill those blanks:

  1. Select the column: Ctrl + Space
  2. Go to Special > Blanks: Alt + H + F + D + S + K
  3. Type = and press Up arrow
  4. Ctrl + Enter to apply to all selected cells
  5. Copy and Paste Values to lock it in

Convert Text to Numbers/Dates

If dates or numbers are stored as text:

Method 1: Find and Replace

Ctrl + H
Find: - (dash)
Replace: - (same dash)
Replace All

Method 2: Text to Columns

Alt + A + E + F (Data > Text to Columns > Finish)

This trick forces Excel to re-evaluate the cell contents.

Checking for Duplicates

If a column should contain unique values (like Client IDs):

  1. Copy the column to an empty area
  2. Alt + A + M (Remove Duplicates)
  3. Compare the count to your original

If duplicates exist when they shouldn't, you have a data quality issue to investigate.

String Manipulation for Mismatched IDs

Often you'll have IDs that don't match between files:

File 1: C-1234567

File 2: 1234567

Use MID or RIGHT functions to extract what you need:

=MID(A2, 3, 7)    ' Start at character 3, grab 7 characters
=RIGHT(A2, 7)      ' Grab rightmost 7 characters

Always verify your results match before moving forward.

Building Lookup References

Once your data is clean, you can connect tables using lookups:

=XLOOKUP(client_id, lookup_range, return_range)
=INDEX(return_range, MATCH(client_id, lookup_range, 0))

I prefer XLOOKUP when available because the syntax is cleaner. But INDEX/MATCH works everywhere.

The Most Important Habit: Save Often

I can't emphasize this enough: don't forget to save while you're working through this process.

Ctrl + S should become muscle memory.

And when you make a big structural change to the file, consider saving a new version.

The Bottom Line

Data cleaning isn't glamorous, but it's where the real work happens. A clean dataset makes everything downstream easier:

  • Analysis goes faster
  • Formulas work correctly
  • Stakeholders trust your numbers

Take the time to do it right. Document your process. And always keep a backup of your raw data.

Mastering data cleaning in Excel isn't glamorous, but it's what makes everything else possible.

Common Questions About Data Cleaning

Q: How do I know when my data is "clean enough"?

When you can run your analysis without errors and you trust the results enough to present them. If you're still getting #REF errors or seeing numbers that don't make sense, keep cleaning.

Q: Should I clean data in Excel or use Python/R?

Depends on your audience and your workflow. If you're delivering in Excel anyway, clean in Excel. If you're doing recurring data pipelines, automate it in Python. For one-off analyses? Excel is usually faster.

Q: What's the most common data cleaning mistake?

Not saving versions. You clean something, realize you went too far, and can't get back. Always save a copy of your raw data before you start.

Excel for Analytics

The complete course for finance professionals who want to level up their Excel skills.

Learn More
Matt Brattin
Matt Brattin

SaaS CFO turned educator. 20+ years in finance leadership, from Big 4 audit to building companies. Now helping 250,000+ professionals master the skills that actually move careers.