Although Open Science and research data management are part of a pan-disciplinary movement, they take various forms depending on the fields of study, disciplines, academic cultures, and national evaluation systems: thus, research data in the social and human disciplines (SSH) requires appropriate management.
Data on people, for example, must be appropriately shared, archived and published in order to avoid legal and ethical problems. Similarly, working with images or historical texts requires skills such as appropriate organization, naming of sources, and ability to explore archives. It is crucial to ensure that you are generating or re-using data which is coherent with the FAIR principles (which are standard principles and which can be applied to all disciplinary fields) and, where possible, with open data. This page illustrates some key points to keep in mind during your research with specific kind of data, along with a list of useful links and resources to guide you.

Working with data from surveys, interviews, audio and video interviews
What type of data is it? It is any data derived from surveys, interviews, video-interviews, etc which ask individuals to respond to specific queries, and are very widespread in sociological, political, economic and psychological disciplines (but also in arts, humanities and language studies). Surveys and interviews can be targeted at both explanatory purposes or at specific research questions, and often answers are then elaborated according to different methodologies and disciplinary areas.
Where can I find survey and interviews data? Despite the ones that you will be collecting in first person, this kind of data can be found on specific public databases, national or international, which gather the results of surveys according to specific parameters used. Examples include the European social survey, the Eurobarometer survey, Eurostat and many more.
GDPR aspects & data management best practices: surely, all survey and interviews data which is NOT anonymised is sensitive data, thus the key aspect to consider for appropriately managing survey data is to follow appropriate anonymisation procedures, attain ethical committee’s authorization and lay-out and submit explained consent forms to individuals prior to surveys and interviews.
Useful resources:
- EU's Open Data and Accessible Source Materials Guidelines for Humanities and Social Sciences
- Social Sciences and Humanities Open Science Cloud
- Economic surveys
- European Social Survey (ESS)
- Survey Bank
- Eurobarometer
- Eurostat's database
- CLARIN for language data
- CESSDA for survey data
- Guide to Social Science data preparation and archiving
- A guide on how to anonymise video interviews
- Anonymising video-audio data
- Documenting consent
Working with Social Media Data
What type of data is it? Posts published on social media, including comments and media content, are a type of data which is increasingly being used by researchers to investigate diverse types of phenomena spanning across the sociological, economic, political and psychological disciplines (but also in arts, humanities and language studies). Usually, diverse types of analysis can be carried out by extracting information on the contents, frequency and diffusion of content published on social media, which today is a crucial part of global communication and the way that networks between people emerge and develop.
Where can I find social media data? Social media data can be found on the social platforms used by users, which can vary across countries, such as Facebook, Tik Tok, Instagram, and X, amongst others. It can be extracted through data scraping techniques and the use of APIs.
GDPR aspects & data management best practices: using social media as both a research tool and data source offers valuable opportunities, but also introduces distinct challenges that must be thoughtfully addressed throughout the entire research process. Since social media users are typically human participants, researchers are strongly encouraged to familiarize themselves with the platform they intend to study - understanding not only its policies but also its cultural dynamics. Additionally, social media companies may impose restrictions on data access or seek to influence how their platforms are represented, making it essential to critically assess the reliability and integrity of the data. Researchers should also remain aware that content shared on these platforms is often highly searchable, which can lead to the identification of users and potentially expose them to harm or vulnerability. Furthermore, for this type of research it is important to understand what type of consent users give to the platforms for the re-use of the content which they publish, similarly to how consent forms operate, thus, it is always a good golden rule to make sure to understand, first and foremost, how different platforms extract information and information diffusion rights from users.
Working with cultural heritage data (historical, artistic, literary, cultural, musical, media and performance, archeological data)
What type of data is it? These kinds of data includes artwork, performance recordings, exhibition catalogues, critical reviews, artist interviews, images of artwork, archeological findings, historical publications, images of historical scripts, manuscripts, letters, newspapers, government/political publications, other kind of works that belong to literature, music, the figurative arts, architecture, theatre, films.
Where can I find this kinds of data? Cultural heritage data can be found on ad-hoc physical archives, where the original documents and artworks are preserved, or on digital archives which provides high-resolution images of the original documents and artwork which have been archived through scanning and photography procedures and with appropriate metadata. Archeological, musical, media and performance data can be found on dedicated databases too and can also be physically collected by researchers. Importantly, if you find some useful cultural heritage data online (both in digital libraries, databases and on a random web-search) always check for the terms of use of the website and the material made available, as well as the presence of a licence: the use of works freely and freely available on the internet is not automatically free.
Copyright aspects & data management best practices: this type of data does not normally see the application of the GDPR, unless you are collecting data from interviews, performances, audios and videos in which human subjects appear: in this case, please refer to the first section of this page. Nonetheless, when working with cultural heritage data it is essential to pay attention to copyrights. Importantly, different kinds of ownership and related data rights can be found in association with this data, as the following:
Useful resources:
- Social Sciences and Humanities Open Science Cloud
- ICOM's guide on copyright and open licenses for cultural data
- Linee guida per l’acquisizione, la circolazione e il riuso delle riproduzioni dei beni culturali in ambiente digitale
- Zotero's library on Data Management Best Practices in the Humanities
- The Heritage Data Reuse Charter
- Gli Open Data per il patrimonio culturale: aspetti teorici ed esperienze in Italia
- Reproductions of State Cultural Property: the new Ministerial Decree 108/2024
- A focus on the regulations for Archaeology
- Archaeological data management best practice guidance
- DARIAH, the pan-European Infrastructure for Arts and Humanities
- privately owned - in this case it is necessary to always ask the owner what is legally doable with the data and how to manage it, also by signing an agreement or other related documentation.
- public but bound by the protection of cultural heritage - it is necessary to verify what is legally doable from the institution which preserves the original work and from the Codice dei beni culturali e del paesaggio
- public but whith confidentiality clauses - confidential and classified data (mainly archival sources) is data which is not yet publicly available for consultation and has been closed to the public for a number of years (often 30 years). If during your research you acquire reserved access to classified data, make sure that you do not share or publish classified information and that you have the necessary agreements in place.
- public and in the public domain - this type of data is by definition not subject to copyright protection and can therefore be freely used, modified and shared for study and research purposes (including doctoral thesis); in this case, you are allowed to share on data repositories your elaboration of the original data. If, on the other hand, you intend to share such data for editorial purposes (e.g. in a scientific publication), you need to ask permission and follow the regulations of the institution that preserves the original work (which may, sometimes, ask for a fee).
- public, but subjected to copyright - copyright grants protection to creative works belonging to literature, music, figurative arts, architecture, theatre, cinematography, science, whatever their mode or form of expression (see Legge 22 Aprile 1941, n. 633). Copyright in Italy expires 70 years after the death of the author or 70 years after the first publication for collective works: to avoid infringing copyright, first check whether a licence for use exists and check that the uses you intend to make of the work are consistent with the terms of the licence.
Working with ethnographic, anthropological and geographical data
What type of data is it? It can be field notes, drawing, audio and visual recordings, images of material objects and artifacts, records from participant observation, and many more.
Where can I find this kinds of data? It can be found on ad-hoc physical archives and museums, or on digital archives which provides high-resolution images of the original materials which have been archived through scanning and photography procedures and with appropriate metadata. Importantly, the vast majority of this type of data can also be physically collected by researchers. Importantly, if you find some useful data online (both on databases and on a random web-search) always check for the terms of use of the website and the material made available, as well as the presence of a licence: the use of works freely and freely available on the internet is not automatically free.
GDPR aspects & data management best practices: if it is not ecologial data, usually geographical data is not considered sensitive data; nevertheless, it can fall under the legislation on cultural heritage, thus, in this case, please refer to the above section. On the other hand, ethnographic and anthropological data fall under the GDPR, thus the key aspect to consider for appropriately managing this kind of data is to follow appropriate anonymisation procedures, attain ethical committee’s authorization and lay-out and submit explained consent forms. Importantly, when managing ethnographic and anthropological data, in some cases, is crucial not only to comply with the FAIR principles - in order to have data which is Findable, Accessible, Interoperable and Reusabel -, but also with the CARE principles to fully engage with Indigenous Peoples rights and interests.