Data Management Best Practices
Principles
File Naming Principles
Variable Naming Principles
| Principle | Do | Don't |
| Consistency | elections_raw.xlsx elections_clean.csv elections.txt | rawdata.xlsx electionsprojectdata.csv dofile_eproject.txt |
| Brief, but descriptive | unemployment2020.csv unemploy2020_v05.docx | data.csv final.docx |
| No spaces or special characters | r_d_spending.csv | R&D spending data.csv |
| General to Specific | econ378_healthcare_v03.txt econ378_healthcare.csv | Final version health econ 378.txt healthproject_econ378.csv |
| YYYYMMDD format | testscores_2011_10_12.csv | testscores_10-12-11.csv |
| Version control | nba_players_v01.docx nba_players_v02.docx nba_players_v03.docx | Project.docx Finalproject.docx Finalfinalproject.docx |
Variable Names
- Brief (use industry instead of IndustryOfOccupation)
- Descriptive (use birthyear instead of by)
- No spaces or special characters (use income instead of what is your income?)
Variable Format
- Consistency: the same values should be expressed the same way (e.g., choose one way to express a level of education instead of grade 10, 10th, sophomore and 10)
- Change categorical and text variables to binary format for analysis and visualization
- Standardize error or no-value codes
Previous Topic
DMP Tool