Data Analysis
Tools & Processes
Combined Data Cleaning Process Summary
G = Google
M = Microsoft
Create a backup copy of the original data (M)
Create a tab at the beginning of the Sheet and name it <START HERE>, <README> or something similar. Use the <START HERE> tab to document:
Definition / description of each tab in the Sheet.
If in a new Sheet organize and save with a recognizable name: <Original File Name>_<BACKUP COPY>_<DATE>.
Note the location of the Sheet, source of imported / merged data, errors found, process to fix and other data / information another analyst can use to understand the sheet, data, and status of the analysis.
Easiest way to do this is to combine cells to form text boxes and color code to the color of the respective tab.
Create a text box with the DCP (Data Cleaning Process) and note the status and date for each process.
Import the data from an external source (M)
Either in a separate tab or Sheet.
Import the data into a separate tab, name it something recognizable: <IMPORT_SOURCE_<DATE> and then copy / merge into the production Sheet.
Do tasks that don't require column manipulation:
Spell-checking (M)
Find and Replace dialog box. (M)
Select & remove blank cells - Filter (G)
Delete extra spaces (G)
Change text case (G)
Delete all formatting (G)
Conditional formatting (G) where text in a cell doesn't match the desired criteria ie LEN or number of characters.
Transpose the data (G)
Processes / Tools needing further research:
CountIF: if a value is above or below a certain range ie membership costs below or above the normal range.
=COUNTIF(range,"value")
LEN: the length of a text string.
=LEN(range)
LEFT / RIGHT: A set number of characters from the left side of a text string.
=LEFT(range,number of characters)
MID: A function that gives you a segment from the middle of a text string.
=MID(range,reference starting point, number of middle characters)
CONCATENATE: A function that joins together two or more text strings.
=CONCATENATE(item1,item2)
TRIM: A function that removes leading, trailing, and repeated spaces in data.
=TRIM(range)
_____
Summary
Import the data from an external source (M)
Create a backup copy of the original data (M)
Ensure data is in tabular (rows and columns) format (M)
Do tasks that don't require column manipulation first, such as spell-checking or using the Find and Replace dialog box. (M)
Next, do tasks that do require column manipulation. The general steps for manipulating a column are (M):
Insert a new column (B) next to the original column (A) that needs cleaning. (M)
Add a formula that will transform the data at the top of the new column (B). (M)
Fill down the formula in the new column (B). In an Excel table, a calculated column is automatically created with values filled down. (M)
Select the new column (B), copy it, and then paste as values into the new column (B). (M)
Remove the original column (A), which converts the new column from B to A. (M)
Process Summary - Google
Select and remove blank cells: Create and use a Filter
Transpose the data
Get rid of the extra spaces in the cells with string data
Change Text Lower/Uppercase/Proper Case
Delete all formatting
Reference
Process Summary - Microsoft
Import the data from an external source
Create a backup copy of the original data
Ensure data is in tabular (rows and columns) format
Do tasks that don't require column manipulation first, such as spell-checking or using the Find and Replace dialog box.
Next, do tasks that do require column manipulation. The general steps for manipulating a column are:
Insert a new column (B) next to the original column (A) that needs cleaning.
Add a formula that will transform the data at the top of the new column (B).
Fill down the formula in the new column (B). In an Excel table, a calculated column is automatically created with values filled down.
Select the new column (B), copy it, and then paste as values into the new column (B).
Remove the original column (A), which converts the new column from B to A.