File naming convention
How to save your files so you can find them when you need
Start with the Why #
FUSI #
For me, a good file name must be:
Found: the name makes it easy to find the file.Unique: the file in itself is unique as its name, and not only within the scope of its folder but of the entire project it belongs to.Semantic: the name makes it easy to know what content the file contains even if you don’t know the convention.Interoperableacross operating systems.
As well, it would be nice if it could:
- order naturally in the file system display
- not be too long
- not be too hard to maintain
- follow a convention that is accessible for newcomers
Basename #
[DATE] [GROUP] [TAG] [VERSION] [MARK] [EXTENSION]
Date span #
[VALUE DATE]--[PERIOD END DATE]
All files names starts with an ISO Date (YYYY-MM-DD).
The date should be what we call the value date, meaning the date of the
underlying significant event the file is related to (the file system should be
responsible for annotating metadata related to creation and modification of the files).
The interpretation of this really differs depending on the context.
Let’s say you have a banking statement for all the transactions of the month of
June 2022 (from the first to the last day of June).
Then what we call the value date is 2022-06-30.
Even if the document itself was generated on the 5th of May.
Because the document covers a period rather than a date, we can give it a span:
2022-06-30--2022-06-01 which truncates to 2022-06.
If its a statement of the bank fees for the last 3 months (produced ex post),
then it would be 2022-06-30--2022-04-01 which truncates to 2022-06-30--04-01.
But this feature is only useful in some context and might lead to confusion.
So for in this case, I would just use 2022-06-30. And if the statements are
recurring every 3 month, maybe add t1 as a tag. See underneath.
As well, if it were a loan amortization schedule for a newly subscribed mortgage
of 10 years with a monthly amortization starting next month (file is produced ex ante),
it could be 2022-06-30--2032-07-30.
But as it is, it’s probably overwhelming and just more convenient to use
2022-06-30 with tags and versioning instead of period.
truncation #
YYYY
YYYY-MM
YYYY-MM-DD
YYYY-MM-DD--DD
YYYY-MM-DD--MM-DD
YYYY-MM-DD--YYYY-MM-DD
- 2022 => 2022-01-01–2022-12-31
- 2022-01 => 2022-01-01–2022-01-31
- 2022-01-01 => 2022-01-01
- 2022-01-01–2022-01-01 => 2022-01-01
- 2022-01-01–2022-12-01 => 2022
- 2022-01-01–2022-01-31 => 2022-01
- 2022-01-01–2022-01-15 => 2022-01-01–15
- 2022-01-01–2022-02-15 => 2022-01-01–02-15
- 2022-01-01–2022-02-28 => 2022-01-30–02-28
1calendr::date_span("2022-01-01--03")
Registered S3 method overwritten by 'calendr':
method from
c.Date base
[1] "2022-01-01--03"
Group #
upper case
Tag #
lower case
Names are useless use tags instead they will help you catch meaning and find the file more easily.
And please dont break FUSI by using your OS tags instead.
Extension #
lower case (.pdf, .md etc.)
Version #
SemVer is the only formal convention for file versioning that works I am aware of.
A semantic version number is
MAJORfor backwards incompatible (breaking) changes to the public APIMINORfor backwards compatible changes changes to the public API (new features)PATCHfor backwards compatible bug fixesPre-releaseidentifier (optional)Buildidentifier (option)
semver R Package provides tools for parsing, rendering and operating on semantic version strings.
Marks #
tmpas an indicator that the file is temporaryoldas an indicator that the file is obsolete (deprecated, expired, superseeded content)badas an indicator that their is a problem with the file (corruption)bakas an indicator that the file is an archivage copy
Function #
If you can’t explain it to a computer, you don’t understand it yourself.1
1#' Title
2#'
3#' @param dates [character] or [Date] [vector]
4#'
5#' @return
6#' @export [character]
7file_date <- function(dates = calendr::today()) {
8 dates <- calendr::as_date(dates)
9 start <- dates[order(dates)][1]
10 end <- dates[order(dates)][length(dates)]
11 if (end && start != end) {
12 date <- sprintf("%s--%s", end, start)
13 if (calendr::year(start) == calendr::year(end)) {
14 date <- sprintf("%s--%s-%s", end, format(start, "%m"), format(start, "%d"))
15 if (identical(start, calendr::year_first(start)) &
16 identical(end, calendr::year_last(end))) {
17 date <- sprintf("%s", calendr::year(start))
18 } else if (calendr::month(end) == calendr::month(start)) {
19 date <- sprintf("%s--%s", end, format(start, "%d"))
20 if (identical(start, calendr::month_first(start)) &
21 identical(end, calendr::month_last(end))) {
22 date <- sprintf("%s-%s", calendr::year(start), format(start, "%m"))
23 }
24 }
25 }
26 } else {
27 date <- as.character(start)
28 }
29 date
30}
31
32testthat::expect_identical(
33 file_date(c("2022-05-01", "2022-05-01")),
34 "2022-05-01"
35)
36
37testthat::expect_identical(
38 file_date(c("2022-05-01", "2022-05-05")),
39 "2022-05-05--01"
40)
41
42testthat::expect_identical(
43 file_date(c("2022-05-05", "2022-05-01")),
44 "2022-05-05--01"
45)
46
47testthat::expect_identical(
48 file_date("2022-05-05"),
49 "2022-05-05"
50)
51
52testthat::expect_identical(
53 file_date(c("2022-04-01", "2022-04-30")),
54 "2022-04"
55)
56
57testthat::expect_identical(
58 file_date(c("2022-01-01", "2022-12-31")),
59 "2022"
60)
1make_file_name <- function(ext, tag, date = calendr::today(), category = NULL) {
2 name <- paste(tag, collapse = "-")
3 name <- sprintf("%s.%s", tolower(name), tolower(ext))
4 if (!is.null(category)) {
5 category <- paste(toupper(category), collapse = "-")
6 name <- sprintf("%s_%s", category, name)
7 }
8 name <- sprintf("%s_%s", file_date(date), name)
9 name <- stringi::stri_trans_general(name, "Latin-ASCII")
10 # 14 is the POSIX max but 255 is the rule on modern systems
11 max <- getOption("archive.file.max.length", default = 255)
12 if (nchar(name) > max) {
13 cli::cli_warn("That name is more than 255 characters, this might prevent interoperability.")
14 }
15 allowed_chars <- strsplit("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz._-", "")[[1]]
16 chars <- unique(unlist(strsplit(name, "")))
17 if (!all(chars %in% allowed_chars)) {
18 cli::cli_abort("Invalid characters: {.val {chars[!chars %in% allowed_chars]}} passed sanitition...")
19 }
20 name
21}
Real world example #
Let’s say, on 26th of April 2022, I am employed as a plumber by the
company ACME and receive my salary along with the pay slip for the month.
I am going to scan this slip as a pdf file and store it in my archives.
The date #
The value date is not today but the end of the month so: 2022-04-30. The salary covers for a period ranging from 2022-04-01 to the value date.
1dates <- c("2022-04-30", "2022-04-01")
2file_date(dates)
[1] "2022-04"
The rest #
1make_file_name(
2 ext = "pdf",
3 tag = c("salary", "plumber"),
4 date = dates,
5 category = c("Acme")
6)
[1] "2022-04_ACME_salary-plumber.pdf"
[1] "2022-04_ACME_salary-plumber.pdf"
[2] "2022-04-30_ACME_salary-plumber.pdf"
[3] "2022-04-30--02_ACME_salary-plumber.pdf"
[4] "2022-04-30--03-15_ACME_salary-plumber.pdf"

- Started working on 2022-04-01 and invoiced on 2022-04-30
- Invoiced on 2022-04-30
- Started working on 2022-04-02 and invoiced on 2022-04-30
- Started working on 2022-03-15 and invoiced on 2022-04-30
Reading #
1read_file_name <- function(file) {
2 file <- basename(file)
3 sapply(file, function(x) {
4
5 }, USE.NAMES = FALSE)
6}
Free interpretation of Albert Einstein’s “If you can’t explain it simply, you don’t understand it well enough”. ↩︎
Comments