R for Data Science starter

R is a free, open-source, multi-platform statistical environment that has become the de facto standard for quantitative analysis in the social sciences, and in many more research fields. 

Because every step - from data import to the final plot - is expressed in shareable script files, R guarantees reproducibility and transparency, fully aligned with FAIR principles and the Research Data Management policy and practices of the University of Milan. 

Beyond a programming language, R is a fast-evolving ecosystem whose global community quickly fills any methodological gap. For data-fairness this means: 

  • End-to-end transparency – scripts and notebooks document the entire workflow. 
  • Zero cost, open licences – free to install on any OS; integrates with Git, Docker and Singularity. 
  • Rapid innovation – hundreds of new packages appear every year; e.g. 2024-25 saw dsld (discrimination analysis), keras3 (deep learning), extended survey functions for ultra-complex samples, and causal-inference tools such as causalweight and causalQual. 

How R fits into the research data management cycle

  1. Plan - In the DMP declare open formats (CSV, Parquet) and version-controlled scripts.
  2. Collect - Use httr2 or rvest to fetch data; auto-generate metadata.  
  3. Analyse - Organise code in clear folders (R/ data/ output/); document with Quarto.  
  4. Share - Deposit datasets and .Rmd files in Data@UNIMI – Dataverse, obtain a DOI, and link it in the paper. 
  5. Preserve - Save an renv.lock file so anyone can rerun the project even a decade later. The lock-file captures: package names and exact versions, their sources, CRAN mirror settings, and relevant OS hashes. 

The advantages of R

  • Free and open source: public code, anyone can improve it - meaning that No licence fees, vendor-lock-in-free.
  • Script base: Every operation lives in .R or notebook files - meaning that Readers can inspect and replicate each step.
  • Wide community: Thousands of new packages, forums, conferences - meaning that Someone has likely solved a problem similar to yours.
  • Integration with RDM: Plays well with Git, Dataverse, Docker, Quarto - meaning that Makes FAIR compliance straightforward.

Glossary

Key packages: what they do in simple terms

Resources for getting started (basic to advanced level)

First steps 

  • Install R from https://cran.r-project.org, install RStudio.  
  • Open RStudio and run: > install.packages(c("tidyverse", "survey", "sf")). You’re ready to replicate the first examples and embark on your FAIR, reproducible-analysis journey. Happy coding! 

Use case walk-throughs