Data organization refers to the methods by which a researcher stores and classifies data digitally. Oftentimes, data organization is dictated through standards, which are issued at local, disciplinary, and international levels. Using standards to guide data organization practices can save time coming up with and documenting organizational choices and will ultimately make data more reusable to other researchers familiar with those standards. That being said, if deviations from standards need to be made, they can be if they are well-documented.
The Research Commons keeps an up-to-date list of standards resources for relevant disciplines. Official DoD standardization documents can be found in the ASSIST database.
Attributes of files should be selected based on the type of data someone is working with and how it will need to be used. File selection can be broken down into three considerations:
When writing this portion, ask yourself:
What is the best way to capture the data?
Data needs to be in a file type that best suits its purpose. For example, numerical data is probably more suited to capture in a spreadsheet than an image file.
Example file types: Algorithms, computer code, databases, digital images or video, spreadsheets, text, etc.
What is the best way to open the data?
After deciding what file type you will select, options for software and file formats will narrow. Think through what program would be best suited to store your data in. For example, if you have numerical data that you want to save in a spreadsheet, storing the data in a .xlsx format for use in Microsoft Excel would make sense.
That being said, always consider whether you can save your data in an open format, versus proprietary. Open formats tend to have more longevity and community support, making long-term access to your data more likely. For more information on choosing a file format, see this primer on file formats for long-term access from MIT.
Version control is the process of saving a new copy of a file each time the file is updated, rather than overwriting one file. Using version control gives a safety net, allowing the opportunity to retrace one's steps when issues pop up, or revert back to an older version of the file when necessary.
Version control can be done manually, where a researcher opts to save their file as a new one each time they save, or automatically with the help of a computer program. Choosing an automatic way of version control leaves less room for human error. NUWC Newport employees can use Microsoft SharePoint for versioning.
When writing this portion, ask yourself:
Will I be using version control?
Yes or no.
How will I implement version control?
You can either implement version control manually, where you save a new document each time, or automatically using a program.
One of the simplest steps you can take towards having more organized data is deciding on a consistent file naming convention. Some standards have suggested naming conventions or you can make up your own. Just make sure that whatever convention you decide on is consistent and documented.
Here are some file naming best practices:
Make your file names meaningful by including elements such the project's name or acronym, location, researcher name, date, type of data, and/or version number
File names should be no more than 25 characters
Do not use spaces; instead, use underscores (_), hyphens (-) or camel case (CamelCase)
Write dates in YYYYMMDD format
Use leading zeroes (001, 002, etc.) when saving files in numerical order
Avoid special characters