Back

File naming convention

How to save your files so you can find them when you need

Start with the Why #

FUSI #

For me, a good file name must be:

As well, it would be nice if it could:

Basename #

[DATE] [GROUP] [TAG] [VERSION] [MARK] [EXTENSION]

Date span #

[VALUE DATE]--[PERIOD END DATE]

All files names starts with an ISO Date (YYYY-MM-DD).

The date should be what we call the value date, meaning the date of the underlying significant event the file is related to (the file system should be responsible for annotating metadata related to creation and modification of the files).

The interpretation of this really differs depending on the context. Let’s say you have a banking statement for all the transactions of the month of June 2022 (from the first to the last day of June). Then what we call the value date is 2022-06-30. Even if the document itself was generated on the 5th of May.

Because the document covers a period rather than a date, we can give it a span: 2022-06-30--2022-06-01 which truncates to 2022-06.

If its a statement of the bank fees for the last 3 months (produced ex post), then it would be 2022-06-30--2022-04-01 which truncates to 2022-06-30--04-01. But this feature is only useful in some context and might lead to confusion. So for in this case, I would just use 2022-06-30. And if the statements are recurring every 3 month, maybe add t1 as a tag. See underneath.

As well, if it were a loan amortization schedule for a newly subscribed mortgage of 10 years with a monthly amortization starting next month (file is produced ex ante), it could be 2022-06-30--2032-07-30. But as it is, it’s probably overwhelming and just more convenient to use 2022-06-30 with tags and versioning instead of period.

truncation #

YYYY
YYYY-MM
YYYY-MM-DD
YYYY-MM-DD--DD
YYYY-MM-DD--MM-DD
YYYY-MM-DD--YYYY-MM-DD
1calendr::date_span("2022-01-01--03")
Registered S3 method overwritten by 'calendr':
  method from
  c.Date base
[1] "2022-01-01--03"

Group #

upper case

Tag #

lower case

Names are useless use tags instead they will help you catch meaning and find the file more easily.

And please dont break FUSI by using your OS tags instead.

Extension #

lower case (.pdf, .md etc.)

Version #

SemVer is the only formal convention for file versioning that works I am aware of.

A semantic version number is

semver R Package provides tools for parsing, rendering and operating on semantic version strings.

Marks #

Function #

If you can’t explain it to a computer, you don’t understand it yourself.1

 1#' Title
 2#'
 3#' @param dates [character] or [Date] [vector]
 4#'
 5#' @return
 6#' @export [character]
 7file_date <- function(dates = calendr::today()) {
 8  dates <- calendr::as_date(dates)
 9  start <- dates[order(dates)][1]
10  end <- dates[order(dates)][length(dates)]
11  if (end && start != end) {
12    date <- sprintf("%s--%s", end, start)
13    if (calendr::year(start) == calendr::year(end)) {
14      date <- sprintf("%s--%s-%s", end, format(start, "%m"), format(start, "%d"))
15      if (identical(start, calendr::year_first(start)) &
16            identical(end, calendr::year_last(end))) {
17        date <- sprintf("%s", calendr::year(start))
18      } else if (calendr::month(end) == calendr::month(start)) {
19        date <- sprintf("%s--%s", end, format(start, "%d"))
20        if (identical(start, calendr::month_first(start)) &
21            identical(end, calendr::month_last(end))) {
22          date <- sprintf("%s-%s", calendr::year(start), format(start, "%m"))
23        }
24      }
25    }
26  } else {
27    date <- as.character(start)
28  }
29  date
30}
31
32testthat::expect_identical(
33  file_date(c("2022-05-01", "2022-05-01")),
34  "2022-05-01"
35)
36
37testthat::expect_identical(
38  file_date(c("2022-05-01", "2022-05-05")),
39  "2022-05-05--01"
40)
41
42testthat::expect_identical(
43  file_date(c("2022-05-05", "2022-05-01")),
44  "2022-05-05--01"
45)
46
47testthat::expect_identical(
48  file_date("2022-05-05"),
49  "2022-05-05"
50)
51
52testthat::expect_identical(
53  file_date(c("2022-04-01", "2022-04-30")),
54  "2022-04"
55)
56
57testthat::expect_identical(
58  file_date(c("2022-01-01", "2022-12-31")),
59  "2022"
60)
 1make_file_name <- function(ext, tag, date = calendr::today(), category = NULL) {
 2  name <- paste(tag, collapse = "-")
 3  name <- sprintf("%s.%s", tolower(name), tolower(ext))
 4  if (!is.null(category)) {
 5    category <- paste(toupper(category), collapse = "-")
 6    name <- sprintf("%s_%s", category, name)
 7  }
 8  name <- sprintf("%s_%s", file_date(date), name)
 9  name <- stringi::stri_trans_general(name, "Latin-ASCII")
10  # 14 is the POSIX max but 255 is the rule on modern systems
11  max <- getOption("archive.file.max.length", default = 255)
12  if (nchar(name) > max) {
13    cli::cli_warn("That name is more than 255 characters, this might prevent interoperability.")
14  }
15  allowed_chars <- strsplit("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz._-", "")[[1]]
16  chars <- unique(unlist(strsplit(name, "")))
17  if (!all(chars %in% allowed_chars)) {
18    cli::cli_abort("Invalid characters: {.val {chars[!chars %in% allowed_chars]}} passed sanitition...")
19  }
20  name
21}

Real world example #

Let’s say, on 26th of April 2022, I am employed as a plumber by the company ACME and receive my salary along with the pay slip for the month. I am going to scan this slip as a pdf file and store it in my archives.

The date #

The value date is not today but the end of the month so: 2022-04-30. The salary covers for a period ranging from 2022-04-01 to the value date.

1dates <- c("2022-04-30", "2022-04-01")
2file_date(dates)
[1] "2022-04"

The rest #

1make_file_name(
2  ext = "pdf",
3  tag = c("salary", "plumber"),
4  date = dates,
5  category = c("Acme")
6)
[1] "2022-04_ACME_salary-plumber.pdf"
[1] "2022-04_ACME_salary-plumber.pdf"          
[2] "2022-04-30_ACME_salary-plumber.pdf"       
[3] "2022-04-30--02_ACME_salary-plumber.pdf"   
[4] "2022-04-30--03-15_ACME_salary-plumber.pdf"
  1. Started working on 2022-04-01 and invoiced on 2022-04-30
  2. Invoiced on 2022-04-30
  3. Started working on 2022-04-02 and invoiced on 2022-04-30
  4. Started working on 2022-03-15 and invoiced on 2022-04-30

Reading #

1read_file_name <- function(file) {
2  file <- basename(file)
3  sapply(file, function(x) {
4    
5  }, USE.NAMES = FALSE)
6}

  1. Free interpretation of Albert Einstein’s “If you can’t explain it simply, you don’t understand it well enough”↩︎

Metadata
PublicationMay 10, 2022
Last editMay 30, 2022
SourcesView source
LicenseCreative CommonsAttribution - Some rights reserved
ContributeSuggest change
Comments