Data Cleaning in Excel: A Realistic Walkthrough for Analysts
A practical guide to cleaning messy data in Excel, covering version control, formatting cleanup, and string manipulation techniques.
The Least Discussed Topic in Data: Data Cleaning in Excel
What's the one thing the vast majority of data professionals say takes up most of their time?
Data cleaning in Excel.
And yet it's among the least discussed topics in the data sphere. Everyone wants to talk about insights and visualizations. Nobody wants to talk about the three hours you spent fixing date formats.
According to Wikipedia: Data cleansing is the process of detecting and correcting or removing corrupt, inaccurate records from a record set, table, or database.
But here's the part that's actually rewarding: data wrangling—transforming and mapping data from one raw form into another format with the intent of making it more appropriate and valuable.
Data analysts typically spend the majority of their time in data wrangling compared to actual analysis. This is where the transformation of data actually begins.
First Things First: Version Control
When I receive a data file, the first thing I do is save it under a different name.
Original: source_data.xlsx
New: volume_by_region_data_request_v1.xlsx
Why? Because clean data requires documentation. If anything happens to the file, or if I need to reference what version I was on, I have that history.
This also lets me create checkpoints throughout the process. If I muck something up, I can go back.
Getting Familiar with the File
Before touching anything, I want to understand what I'm working with:
- How many rows?
Ctrl + Shift + Endto see the extent of the data - What's in each column? Scan the headers
- Any obvious issues? Look for:
- Numbers that are left-aligned (probably stored as text)
- Dates that look like text
- Green triangles indicating errors
- Blank cells where there shouldn't be any
Preserving Raw Data
Before cleaning, I make copies of the raw data tabs and hide them.
Why? If I ever need to cross-reference where I started, or if I've deleted something I shouldn't have, I can look back at the source data.
Steps:
1. Ctrl + Click on tab to copy
2. Rename with "_OG" suffix (for "original")
3. Right-click > Hide
These are my backups. In most cases, I'll never look at them again. But they're there just in case.
Quick Formatting Cleanup
Start with basic hygiene:
| Shortcut | Action |
|---|---|
Ctrl + A, Ctrl + A | Select entire sheet |
Alt + H + W | Toggle text wrap |
Alt + H + O + I | Auto-fit column width |
Ctrl + T | Convert to data table |
Data tables are powerful because they:
- Automatically expand when you add data
- Allow structured references in formulas
- Make it easy to filter and sort
Fixing Common Data Issues
Fill Down Blank Cells
Many systems export data with blanks where values should repeat:
Client A | Jan | 100
| Feb | 150
| Mar | 200
Client B | Jan | 80
To fill those blanks:
- Select the column:
Ctrl + Space - Go to Special > Blanks:
Alt + H + F + D + S + K - Type
=and press Up arrow Ctrl + Enterto apply to all selected cells- Copy and Paste Values to lock it in
Convert Text to Numbers/Dates
If dates or numbers are stored as text:
Method 1: Find and Replace
Ctrl + H
Find: - (dash)
Replace: - (same dash)
Replace All
Method 2: Text to Columns
Alt + A + E + F (Data > Text to Columns > Finish)
This trick forces Excel to re-evaluate the cell contents.
Checking for Duplicates
If a column should contain unique values (like Client IDs):
- Copy the column to an empty area
Alt + A + M(Remove Duplicates)- Compare the count to your original
If duplicates exist when they shouldn't, you have a data quality issue to investigate.
String Manipulation for Mismatched IDs
Often you'll have IDs that don't match between files:
File 1: C-1234567
File 2: 1234567
Use MID or RIGHT functions to extract what you need:
=MID(A2, 3, 7) ' Start at character 3, grab 7 characters
=RIGHT(A2, 7) ' Grab rightmost 7 characters
Always verify your results match before moving forward.
Building Lookup References
Once your data is clean, you can connect tables using lookups:
=XLOOKUP(client_id, lookup_range, return_range)
=INDEX(return_range, MATCH(client_id, lookup_range, 0))
I prefer XLOOKUP when available because the syntax is cleaner. But INDEX/MATCH works everywhere.
The Most Important Habit: Save Often
I can't emphasize this enough: don't forget to save while you're working through this process.
Ctrl + S should become muscle memory.
And when you make a big structural change to the file, consider saving a new version.
The Bottom Line
Data cleaning isn't glamorous, but it's where the real work happens. A clean dataset makes everything downstream easier:
- Analysis goes faster
- Formulas work correctly
- Stakeholders trust your numbers
Take the time to do it right. Document your process. And always keep a backup of your raw data.
Mastering data cleaning in Excel isn't glamorous, but it's what makes everything else possible.
Common Questions About Data Cleaning
Q: How do I know when my data is "clean enough"?
When you can run your analysis without errors and you trust the results enough to present them. If you're still getting #REF errors or seeing numbers that don't make sense, keep cleaning.
Q: Should I clean data in Excel or use Python/R?
Depends on your audience and your workflow. If you're delivering in Excel anyway, clean in Excel. If you're doing recurring data pipelines, automate it in Python. For one-off analyses? Excel is usually faster.
Q: What's the most common data cleaning mistake?
Not saving versions. You clean something, realize you went too far, and can't get back. Always save a copy of your raw data before you start.
Excel for Analytics
The complete course for finance professionals who want to level up their Excel skills.

SaaS CFO turned educator. 20+ years in finance leadership, from Big 4 audit to building companies. Now helping 250,000+ professionals master the skills that actually move careers.