Using appropriate relative file paths is essential for creating portable and reproducible projects. Portable projects are self-contained, meaning all code, data and other files needed by your project are contained within it.
Relative paths play a central role in portability, allowing you to specify file locations relative to a known reference point (e.g., the project root). Compared to system-specific absolute paths, relative file paths work on any system, allowing others to reproduce your work without modifying file paths.
This vignette demonstrates how relative paths from different project
roots can be created using the fromhere
package, with
discussion on how each path type affects the portability of your
project.
How you organise your project files and specify paths to them changes the portability of your project. Less portable file paths require more specific file structures. For example, in increasing portability:
Absolute paths require the exact same file structure from the file system root (including the username John)
e.g. "/Users/John/Documents/my-awesome-project/data/survey_cleaned_2024.csv"
Home directory paths start from the user’s home directory (usernames can differ, but location cannot)
e.g. "~/Documents/my-awesome-project/data/survey_cleaned_2024.csv"
Project paths start from the project directory (the project location can differ, but project organisation cannot)
e.g. from_rproj("data/survey_cleaned_2024.csv")
Execution paths start from the current working directory (execution folder)
e.g. from_wd("resources/logo.png")
within
"docs/analysis.qmd"
The working directory can differ depending on context. In general,
setwd()
(but please don’t do this!)The files (and paths) used within each part of your project affects its portability.
Consider the R project "my-awesome-project"
, which has
been saved to the Documents folder on John’s computer
("/Users/John/Documents/"
). It has the following
contents:
my-awesome-project/
├── data-raw/ # Raw / unprocessed data files
├── data/ # Clean / processed data files
│ ├── survey_cleaned_2024.csv
│ ├── population_summary.csv
├── R/ # R scripts
├── docs/ # Documents
│ ├── analysis.qmd
│ ├── memo.qmd
│ ├── resources/ # Document resources
│ │ ├── logo.png
│ │ └── signature.png
├── outputs/ # Results, figures, tables, and other outputs
│ ├── figures/ # Graphs and charts
│ └── tables/ # Data tables and results
│ ├── regression_summary.csv
├── README.md # Project description and instructions
└── my-awesome-project.Rproj # R Project file
Consider the two documents in this project,
"docs/analysis.qmd"
and "docs/memo.qmd"
:
"docs/analysis.qmd"
is a complete analysis using
from_rproj()
to access data, code, and outputs from the
project. When sharing this document with others, you will need to share
the entire project for the document to be reproducible."docs/memo.qmd"
is a brief message using only the logo
and signature resources with from_wd()
. To reproduce this
document, the entire project is not needed since everything needed is
contained within the document folder ("docs"
).The memo.qmd
document is more portable than
analysis.qmd
since it can be moved independently of the
broader project. It is good practice to organise your project such that
individual components are portable. In this case the logo and signature
is kept within the docs
folder because they are only needed
for documents. While less portable than the memo, the analysis is still
reasonably portable since everything is contained in the project folder
(rather than files scattered around the computer).
By adopting relative paths and understanding project portability, you
can create projects that are robust, shareable, and adaptable to new
environments. The fromhere
package simplifies this process,
allowing you to declare paths relative to various project roots,
enhancing the portability of your work.
An absolute path includes the full location of the files on the computer, so reading the survey data would be:
# Path hardcoded to John's specific system
read.csv("/Users/John/Documents/my-awesome-project/data/survey_cleaned_2024.csv")
This is NOT portable, imagine if this project was
shared with Jane who saved it in the Downloads folder of their Windows
computer. In this case, the file path starting with
"/Users/John/Documents"
doesn’t exist on Jane’s computer.
Instead the suitable path for Jane is:
# Path hardcoded to Jane's specific system
read.csv("C:\Users\Jane\Downloads\my-awesome-project\data\survey_cleaned_2024.csv")
As a result, the code cannot be reproduced without modification. Jane will have to update all the paths for them to work on her computer.
A relative path specifies the location of files from a starting folder. In an R session, relative paths start from the current working directory. When you open an RStudio Project, the current working directory will be (automatically set to) your project folder.
The code for reading the survey data with a relative path is:
Notice that the relative path doesn’t include any details about the location of the project on the computer. This allows the project folder to be moved around (on the same computer or shared with others) without needing to change the file paths.
The fromhere
package allows you to specify the starting
point of relative paths explicitly. This can be useful
since the paths will work even if the working directory in R changes.
For example, the working directory is changed to the document’s folder
("docs"
) when rendering RMarkdown/Quarto documents. Using
fromhere::from_rproj()
will explicitly specify paths
starting from the project folder even in RMarkdown/Quarto documents:
# Path relative to the project folder
library(fromhere)
read.csv(from_rproj("data/survey_cleaned_2024.csv"))
.
, and ..
File paths can also include .
and ..
, these
paths refer to current directory (.
) and the parent
directory (..
).
This allows you to navigate up folders, for example
from_rproj("..")
will give you the folder above the project
("/Users/John/Documents"
for John, and
"C:\Users\Jane\Downloads"
for Jane).
Using from_rproj("..")
results in a less portable
project, since your work now depends on the file structure above your
project folder.