Data should stand the test of time (10 years for NIH) and be machine readable.
● Include a header line (first line or record)
● Label each column with a short but descriptive name
○ Names should be unique
○ Use letters, numbers or underscore _
○ Do not include blank spaces or symbols such as +-&*
● Columns of data should be consistent (Use the same naming convention for text data)
● Each line should be complete
● Columns should include only a single kind of data, such as text or “string data”, integer numbers, floating point or real numbers
Organizing Your Files and Directories.
Organizing your files and directories will help with searching & finding, sharing, security, clarity and preservation.
● Name folders for major functions and activities
● Structure by date or event (especially subfolders)
● Names should be self-explanatory
● Avoid duplication
● Make it simple and consistent
● Use descriptive names
● Not too long, use Camel Case
● Try to include time
● Date using YYYYMMDD
● Use version numbers
● Don’t use spaces, may use - or _
● Don’t change default extensions
Provide Good Metadata
● Who created the data?
● Who maintains it?
● When were the data collected? When were they published?
● Where was it collected (geographic location)?
● What is the content of the data? The structure?
● Why were the data created?
● How were they produced /analyzed?
Dataset documentation should include:
● Variable names and descriptions
● Explanation of codes and schemas used
● Algorithms used to transform data
● File format and software (including version) used
● Readme file
● Data dictionary
● Structured documentation in XML formats for use in programs such as: DDI, FGDC, EML