Back

Introducing calendr R Package

I made a lightweight R Package to ease the pain of working with Dates

1. Start with the Why #

I started calendr while working on a project which required to work extensively with French holidays.

While Python has python-holidays, there isn’t quite a similar project in R.

Furthermore, even though there is a impressive amount of R packages providing utilities to work with Date and Datetime objects, I found they don’t really fit my needs most often because they are too complex in their interface or architecture (so they cannot be extended easily).

And that is usually because they try to tame Date and time representations altogether, which requires to handle timezone, DST, leap seconds etc. However, I my case, as often, I only needed Date.

So the goal of calendr is to provide utilities to manipulate pure Date objects.

2. Features #

3. On Dates #

Dates are purely conventional representation of time. So far, we’ve implemented 2 of the most standard conventions.

Gregorian calendar #

The Gregorian standard (the most common, in the western world at least), was introduced in 1582 by the Pope Gregory XIII with a design based on the Christian celebration of Easter.

It defines a 400 years cycle (146097 days) starting on Year 1. Each year is divided in 12 months of a defined length ranging from 28 to 31 days, starting on January 1st and ending on December 31st.

A common Gregorian year has 365 days (52 weeks of 7 days + 1 day). To compensate for the fact that a period of 365 days is shorter than a tropical year by almost 6 hours (leading to lagging seasons), 97 years in the cycle have an extra day (February 29th) resulting in years of 366 days (52 weeks + 2 days). They are called leap years and and occurs every 4 years (this still create ) And every 100 years, one leap year is skipped unless the year is divisible by 400.

ISO Standard #

Because people localized the Gregorian calendar while adopting it, same expressions can nowadays have diverse meaning depending on the context. For instance, 01/05/2021 is 1st of April for some and 5th of January for others. This is why, the ISO, a private organization based in Switzerland, worked on a new uniform standard which was introduced in 1988 as ISO 8601.

It is also based on a 400 years cycle but which starts on Year 0 (same as 1 BCE1) but uses weeks instead of months. An ISO year starts on the Monday of the week that contains year’s first Thursday (midpoint of the week). The difference between the start of the ISO year and the start of the standard year ranges from -3 to 3. It has either 364 days or 371 (the later being an ISO leap years which occurs every 5, 6 or 7 years) and 52 weeks (53 for ISO leap years). For a given 400 year cycle, 71 years are leap.

4. Dates in R #

Base R comes with Date objects which are whole numbers of days since the Unix Epoch2 (1970-01-01), with negative values for earlier dates. The numbers are wrapped in an S3 class named Date which provides them with superpowers (methods for formatting, arithmetic, comparison etc.).

1date <- as.Date("2022-05-02")
2date
3unclass(date)
[1] "2022-05-02"
[1] 19114

So 2022-05-02 is represented as 19114 days.

1epoch <- as.Date("1970-01-01")
2unclass(epoch)
3unclass(epoch) - unclass(date)
[1] 0
[1] -19114

It matches the number of days since 1970-01-01 (which is represented a 0).

Time spans #

1epoch - date
Time difference of -19114 days

As we used arithmetic methods on Date object, we get the result as a difftime, which is another S3 class based on whole numbers but using a units attribute to change the representation, using the largest from secs, mins, hours, days, weeks.

Under the hood, the Date objects have been converted to POSIXct which is yet another class for Datetime representation taken at midnight on a given timezone (defaulting to UTC).

1posix_ct <- as.POSIXct(date)
2unclass(posix_ct)
[1] 1651449600
1sec_in_year <- 86400
2unclass(posix_ct) / sec_in_year
[1] 19114

When working with dates this is not a problem, but when you need to represent time precisely, you should know that difftime isn’t aware of DST nor leap years.

Along with POSIXct there is POSIXlt which is closer to human-readable forms which components are integer vectors, except sec and zone.

1posix_lt <- as.POSIXlt(date)
2unlist(unclass(posix_lt))
  sec   min  hour  mday   mon  year  wday  yday isdst 
    0     0     0     2     4   122     1   121     0 

A virtual class “POSIXt” exists from which both of the classes inherit: it is used to allow operations such as subtraction to mix the two classes.

So there comes lubridate the tidy package dedicated to datetime manipulation. It provides 3 alternate representation of time spans:

Time series #

Calendars #

5. Date formats #

codeformatconventionabbrpadlocalerange
%CCenturyGregorianFALSETRUEFALSE00-99
%xDateGregorianFALSEFALSETRUENA
%FDateISO 8601FALSEFALSEFALSENA
%DDateUS / C99FALSEFALSEFALSENA
%dDay of MonthGregorianFALSETRUEFALSE01-31
%e, %-dDay of MonthGregorianFALSEFALSEFALSE1-31
%uDay of WeekFrom Monday (UK)FALSEFALSEFALSE1-7
%wDay of WeekFrom Sunday (US)FALSEFALSEFALSE0-6
%aDay of WeekNameTRUEFALSETRUEsun.-sat.
%ADay of WeekNameFALSEFALSETRUEsunday-saturday
%jDay of YearGregorianFALSETRUEFALSE001-366
%-jDay of YearGregorianFALSEFALSEFALSE1-366
%-mMonthGregorianFALSEFALSEFALSE1-12
%mMonthGregorianFALSETRUEFALSE01-12
%b,%hMonthNameTRUEFALSETRUEjan.-dec.
%BMonthNameFALSEFALSETRUEjanuary-december
%WWeekFrom Monday (UK)FALSETRUEFALSE00-53
%UWeekFrom Sunday (US)FALSETRUEFALSENA
%VWeekISO 8601FALSETRUEFALSE01-53
%YYearGregorianFALSEFALSEFALSENA
%-yYearGregorianTRUEFALSEFALSE0-99
%yYearGregorianTRUETRUEFALSE00-99
%GYearISO 8601FALSEFALSEFALSENA
%-gYearISO 8601TRUEFALSEFALSE0-99
%gYearISO 8601TRUETRUEFALSE00-99

Which we can summarize a bit:

full (abbr)ISO 8601Gregorian USGregorian UKlocale
Date%F%D%D%x
Year%G (%g)%Y (%y)%Y (%y)NA
MonthNA%m%m%B (%b)
Week%V%U%WNA
Day of YearNA%j%jNA
Day of WeekNA%w%u%A (%a)
Day of MonthNA%d%dNA

Only the date related (not datetime) are described here. Find more formats with ?strptime.

6. Representations #

In calendr, the only way to input

These are called the lossless date formats. Internally, these function call will all result in an object of class c("calDate", "Date") with the same value and a varying format attribute.

Lossy formats (output only) #

         day         week    fortnight      bimonth        month     bimester 
       %d/%a         %7.d        %15.d         %.5m        %m/%b         %2.m 
   trimester quadrimester     semester         year      biennum 
        %3.m         %4.m         %6.m           %y         %2.y 
   year isoweek     day 
    "y"     "w"     "d" 

format.Date(date, format, locale) {

setlocale(locale)

pad

}

as_date(date, format, locale) {}

remove padding with with -

if argument lossless is set to TRUE then we’ll abort if year, month and day cannot be deducted from called output.

Accessors #

By default, accessors always return an integer. When user asks for names he get them as an order factor.

.count_dates(date, what = c(“days”, “fortnights”, “weeks”, “isoweeks”, “months”, “year”), within = c(‘week", “year”…))

When dealing with week default is ISO conventions. Whereas when dealing with month default is gregorian convention (iso = FALSE).

shortcuts #

For any x_of_y (like day_of_week) function they is a serie of functions:

For month-isoweek https://stackoverflow.com/questions/46319137/aggregate-iso-weeks-into-months-with-a-dataset-containing-just-iso-weeks

It might seems that fortnight and bimonth are the same. Think again => A fortnight is exactly 14 or 15 days (set by user) starting from January 1st (or other fiscal year) while bimonth is half of a month (remainders are recycled in the first or last half depending on user) so between 14 and 15 days depending on the month but always start on Month 1st.

*within* #

days_within_month(date)

targets_within_group(date, group_bundle, year_bundle)

*_first and *_last #

ISO #

ISO year doesn’t necessarily start on January 1st.

use iso = TRUE to change this behavior.

Fiscal #

By default first day of the year is January 1st, but this is not always the case. Most schools use September 1st and your company might use any date it best fits its interets.

So year_start lets you change this.

7. Date spans #

%–% infix to create time span

2021-04-01 %in% (2021-01-01 %–% 05-02)

8. Bizdays #

Holidays #

Weekends #

9. SimpleR Time Serie #


  1. BCE which stands for Before Common Era has the exact same meaning as BC which stands for Before Christ↩︎

  2. In chronology and periodization, an epoch or reference epoch is an instant in time chosen as the origin of a particular calendar era. The “epoch” serves as a reference point from which time is measured. The Unix Epoch is set arbitrarily on “1970-01-01” at midnight UTC and systems based on UNIX count time from seconds from this base (POSIX standard). ↩︎

Metadata
PublicationMay 2, 2022
Last editMay 30, 2022
SourcesView source
Licenseb4D8 © 2022  - All rights reserved
ContributeSuggest change
Comments