Recommended formats for FAIR data - Research data management @UNIMI

Each file is usually identified as follows: prefix.suffix. The prefix is used to identify the file and will coincide with the name you decide to assign to your data file (importantly, make sure to follow the naming conventions of your discipline of research); whilst the suffix indicates the format of the file.

File formats can be divided between open (non-proprietary) and closed (proprietary) formats: it is important to know whether data file formats are open or proprietary, because this influences the re-use and the interoperability of research data, which are two key FAIR principles:

A proprietary file format is produced for profit by a business, such as Microsoft. This means that products associated to a proprietary software (such as Microsoft Word which is associated to Microsoft) can only be used after purchase of the associated usage license. To open files saved in proprietary formats, it is necessary to have the proprietary software or usage license. This limits interoperability of research data: not all researchers might have proprietary softwares installed and thus they might not be able to open research data save in proprietary formats nor re-use the research data for future research.
An open file format is produced with open-source and public-domain softwares which can be accessed, downloaded and used by anyone. This means that files saved in open formats have less restrictions than proprietary formats, enhancing interoperability as well as the re-use of research data.

To enhance interoperability, you can either work with open formats from the very beginning of your research (FAIR by desing approach), or alternatively you can decide to work with proprietary formats and, then, convert your files into open formats when you will be disseminating your research results. Browse the table below to learn more about open and closed formats:

USEFUL LINKS:

Importantly, take note that tabular files, which are among the most commonly used for research data in all disciplinary areas, can be a bit tricky when to be uploaded in an open data repository. Read more about how to manage tabular files for sharing them on Dataverse UNIMI:

As the data files are uploaded in a dataset by the user, the Dataverse application tries to process and convert them into an archival format: this processes is called 'Ingestion'. The goal of the ingest process is to extract the data content from the user’s files and archive it in an application-neutral, easily-readable format. There can be multiple reasons for which the 'ingestion' process may fail, especially when it comes to tabular data. If you are willing to upload an Excel file the Dataverse application will automatically try to ingest it and convert it in an open format tabular file. For this operation to be successful, please, do check the following point:

If the original tabular file has multiple sheets, only the first sheet of the file will be ingested (i.e. the file will not be available for preview). The other sheets will be available when a user downloads the original file. To have all sheets of a tabular file ingested and searchable at the variable level, upload each sheet as an individual file in your dataset.
You may encounter ingest errors after uploading a tabular file if the file is formatted in a way that can’t be ingested by the Dataverse software. Ingest errors can be caused by a variety of formatting inconsistencies, including: line breaks in a cell, blank cells, single cells that span multiple rows, missing headers. Check here how to automatically remove carriage returns and formatting.
if the file contains tables but also images, graphs, captions or other, only the table will be ingested and visible. Keep you tabular file as simple as possible!

More information about Tabular data file ingest are available in the Dataverse User Guide, whilst an example of an Excel file that was successfully ingested is available in the Dataverse Sample Data GitHub repository.