File naming convention

How to save your files so you can find them when you need

Start with the Why #

FUSI #

For me, a good file name must be:

Found: the name makes it easy to find the file.
Unique: the file in itself is unique as its name, and not only within the scope of its folder but of the entire project it belongs to.
Semantic: the name makes it easy to know what content the file contains even if you don’t know the convention.
Interoperable across operating systems.

As well, it would be nice if it could:

order naturally in the file system display
not be too long
not be too hard to maintain
follow a convention that is accessible for newcomers

Basename #

[DATE] [GROUP] [TAG] [VERSION] [MARK] [EXTENSION]

Date span #

[VALUE DATE]--[PERIOD END DATE]

All files names starts with an ISO Date (YYYY-MM-DD).

The date should be what we call the value date, meaning the date of the underlying significant event the file is related to (the file system should be responsible for annotating metadata related to creation and modification of the files).

The interpretation of this really differs depending on the context. Let’s say you have a banking statement for all the transactions of the month of June 2022 (from the first to the last day of June). Then what we call the value date is 2022-06-30. Even if the document itself was generated on the 5th of May.

Because the document covers a period rather than a date, we can give it a span: 2022-06-30--2022-06-01 which truncates to 2022-06.

If its a statement of the bank fees for the last 3 months (produced ex post), then it would be 2022-06-30--2022-04-01 which truncates to 2022-06-30--04-01. But this feature is only useful in some context and might lead to confusion. So for in this case, I would just use 2022-06-30. And if the statements are recurring every 3 month, maybe add t1 as a tag. See underneath.

As well, if it were a loan amortization schedule for a newly subscribed mortgage of 10 years with a monthly amortization starting next month (file is produced ex ante), it could be 2022-06-30--2032-07-30. But as it is, it’s probably overwhelming and just more convenient to use 2022-06-30 with tags and versioning instead of period.

truncation #

YYYY
YYYY-MM
YYYY-MM-DD
YYYY-MM-DD--DD
YYYY-MM-DD--MM-DD
YYYY-MM-DD--YYYY-MM-DD

2022 => 2022-01-01–2022-12-31
2022-01 => 2022-01-01–2022-01-31
2022-01-01 => 2022-01-01
2022-01-01–2022-01-01 => 2022-01-01
2022-01-01–2022-12-01 => 2022
2022-01-01–2022-01-31 => 2022-01
2022-01-01–2022-01-15 => 2022-01-01–15
2022-01-01–2022-02-15 => 2022-01-01–02-15
2022-01-01–2022-02-28 => 2022-01-30–02-28

1calendr::date_span("2022-01-01--03")

Registered S3 method overwritten by 'calendr':
  method from
  c.Date base

[1] "2022-01-01--03"

Group #

upper case

Tag #

lower case

Names are useless use tags instead they will help you catch meaning and find the file more easily.

And please dont break FUSI by using your OS tags instead.

Extension #

lower case (.pdf, .md etc.)

Version #

SemVer is the only formal convention for file versioning that works I am aware of.

A semantic version number is

MAJOR for backwards incompatible (breaking) changes to the public API
MINOR for backwards compatible changes changes to the public API (new features)
PATCH for backwards compatible bug fixes
Pre-release identifier (optional)
Build identifier (option)

semver R Package provides tools for parsing, rendering and operating on semantic version strings.

Marks #

tmp as an indicator that the file is temporary
old as an indicator that the file is obsolete (deprecated, expired, superseeded content)
bad as an indicator that their is a problem with the file (corruption)
bak as an indicator that the file is an archivage copy

Function #

If you can’t explain it to a computer, you don’t understand it yourself.¹

 1#' Title
 2#'
 3#' @param dates [character] or [Date] [vector]
 4#'
 5#' @return
 6#' @export [character]
 7file_date <- function(dates = calendr::today()) {
 8  dates <- calendr::as_date(dates)
 9  start <- dates[order(dates)][1]
10  end <- dates[order(dates)][length(dates)]
11  if (end && start != end) {
12    date <- sprintf("%s--%s", end, start)
13    if (calendr::year(start) == calendr::year(end)) {
14      date <- sprintf("%s--%s-%s", end, format(start, "%m"), format(start, "%d"))
15      if (identical(start, calendr::year_first(start)) &
16            identical(end, calendr::year_last(end))) {
17        date <- sprintf("%s", calendr::year(start))
18      } else if (calendr::month(end) == calendr::month(start)) {
19        date <- sprintf("%s--%s", end, format(start, "%d"))
20        if (identical(start, calendr::month_first(start)) &
21            identical(end, calendr::month_last(end))) {
22          date <- sprintf("%s-%s", calendr::year(start), format(start, "%m"))
23        }
24      }
25    }
26  } else {
27    date <- as.character(start)
28  }
29  date
30}
31
32testthat::expect_identical(
33  file_date(c("2022-05-01", "2022-05-01")),
34  "2022-05-01"
35)
36
37testthat::expect_identical(
38  file_date(c("2022-05-01", "2022-05-05")),
39  "2022-05-05--01"
40)
41
42testthat::expect_identical(
43  file_date(c("2022-05-05", "2022-05-01")),
44  "2022-05-05--01"
45)
46
47testthat::expect_identical(
48  file_date("2022-05-05"),
49  "2022-05-05"
50)
51
52testthat::expect_identical(
53  file_date(c("2022-04-01", "2022-04-30")),
54  "2022-04"
55)
56
57testthat::expect_identical(
58  file_date(c("2022-01-01", "2022-12-31")),
59  "2022"
60)

 1make_file_name <- function(ext, tag, date = calendr::today(), category = NULL) {
 2  name <- paste(tag, collapse = "-")
 3  name <- sprintf("%s.%s", tolower(name), tolower(ext))
 4  if (!is.null(category)) {
 5    category <- paste(toupper(category), collapse = "-")
 6    name <- sprintf("%s_%s", category, name)
 7  }
 8  name <- sprintf("%s_%s", file_date(date), name)
 9  name <- stringi::stri_trans_general(name, "Latin-ASCII")
10  # 14 is the POSIX max but 255 is the rule on modern systems
11  max <- getOption("archive.file.max.length", default = 255)
12  if (nchar(name) > max) {
13    cli::cli_warn("That name is more than 255 characters, this might prevent interoperability.")
14  }
15  allowed_chars <- strsplit("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz._-", "")[[1]]
16  chars <- unique(unlist(strsplit(name, "")))
17  if (!all(chars %in% allowed_chars)) {
18    cli::cli_abort("Invalid characters: {.val {chars[!chars %in% allowed_chars]}} passed sanitition...")
19  }
20  name
21}

Real world example #

Let’s say, on 26th of April 2022, I am employed as a plumber by the company ACME and receive my salary along with the pay slip for the month. I am going to scan this slip as a pdf file and store it in my archives.

The date #

The value date is not today but the end of the month so: 2022-04-30. The salary covers for a period ranging from 2022-04-01 to the value date.

1dates <- c("2022-04-30", "2022-04-01")
2file_date(dates)

[1] "2022-04"

The rest #

1make_file_name(
2  ext = "pdf",
3  tag = c("salary", "plumber"),
4  date = dates,
5  category = c("Acme")
6)

[1] "2022-04_ACME_salary-plumber.pdf"

[1] "2022-04_ACME_salary-plumber.pdf"          
[2] "2022-04-30_ACME_salary-plumber.pdf"       
[3] "2022-04-30--02_ACME_salary-plumber.pdf"   
[4] "2022-04-30--03-15_ACME_salary-plumber.pdf"

Started working on 2022-04-01 and invoiced on 2022-04-30
Invoiced on 2022-04-30
Started working on 2022-04-02 and invoiced on 2022-04-30
Started working on 2022-03-15 and invoiced on 2022-04-30

Reading #

1read_file_name <- function(file) {
2  file <- basename(file)
3  sapply(file, function(x) {
4    
5  }, USE.NAMES = FALSE)
6}

Free interpretation of Albert Einstein’s “If you can’t explain it simply, you don’t understand it well enough”. ↩︎

Metadata

Publication	May 10, 2022
Last edit	May 30, 2022
Sources	View source
License	- Some rights reserved
Contribute	Suggest change

Comments