Introducing calendr R Package
I made a lightweight R Package to ease the pain of working with Dates
1. Start with the Why #
I started calendr while working on a project which required to work extensively
with French holidays.
While Python has python-holidays, there isn’t quite a similar project in R.
Furthermore, even though there is a impressive amount
of R packages providing utilities to work with Date and Datetime objects, I found
they don’t really fit my needs most often because they are too complex in their
interface or architecture (so they cannot be extended easily).
And that is usually because they try to tame Date and time representations
altogether, which requires to handle timezone, DST,
leap seconds etc. However, I my case, as often, I only needed Date.
So the goal of calendr is to provide utilities to manipulate pure Date objects.
2. Features #
- Calendering API
- French holidays calendar
- Date formaters
- localized labels
3. On Dates #
Dates are purely conventional representation of time. So far, we’ve implemented 2 of the most standard conventions.
Gregorian calendar #
The Gregorian standard (the most common, in the western world at least), was introduced in 1582 by the Pope Gregory XIII with a design based on the Christian celebration of Easter.
It defines a 400 years cycle (146097 days) starting on Year 1. Each year is divided in 12 months of a defined length ranging from 28 to 31 days, starting on January 1st and ending on December 31st.
A common Gregorian year has 365 days (52 weeks of 7 days + 1 day). To compensate for the fact that a period of 365 days is shorter than a tropical year by almost 6 hours (leading to lagging seasons), 97 years in the cycle have an extra day (February 29th) resulting in years of 366 days (52 weeks + 2 days). They are called leap years and and occurs every 4 years (this still create ) And every 100 years, one leap year is skipped unless the year is divisible by 400.
ISO Standard #
Because people localized the Gregorian calendar while adopting it, same expressions can nowadays have diverse meaning depending on the context. For instance, 01/05/2021 is 1st of April for some and 5th of January for others. This is why, the ISO, a private organization based in Switzerland, worked on a new uniform standard which was introduced in 1988 as ISO 8601.
It is also based on a 400 years cycle but which starts on Year 0 (same as 1 BCE1) but uses weeks instead of months. An ISO year starts on the Monday of the week that contains year’s first Thursday (midpoint of the week). The difference between the start of the ISO year and the start of the standard year ranges from -3 to 3. It has either 364 days or 371 (the later being an ISO leap years which occurs every 5, 6 or 7 years) and 52 weeks (53 for ISO leap years). For a given 400 year cycle, 71 years are leap.
4. Dates in R #
Base R comes with Date objects which are whole numbers of days since the
Unix Epoch2 (1970-01-01), with negative values for earlier dates.
The numbers are wrapped in an S3 class named Date which provides them with
superpowers (methods for formatting, arithmetic, comparison etc.).
1date <- as.Date("2022-05-02")
2date
3unclass(date)
[1] "2022-05-02"
[1] 19114
So 2022-05-02 is represented as 19114 days.
1epoch <- as.Date("1970-01-01")
2unclass(epoch)
3unclass(epoch) - unclass(date)
[1] 0
[1] -19114
It matches the number of days since 1970-01-01 (which is represented a 0).
Time spans #
1epoch - date
Time difference of -19114 days
As we used arithmetic methods on Date object, we get the result as a difftime,
which is another S3 class based on whole numbers but using a units attribute to change
the representation, using the largest from secs, mins, hours, days, weeks.
Under the hood, the Date objects have been converted to POSIXct which is yet
another class for Datetime representation taken at midnight on a given timezone
(defaulting to UTC).
1posix_ct <- as.POSIXct(date)
2unclass(posix_ct)
[1] 1651449600
1sec_in_year <- 86400
2unclass(posix_ct) / sec_in_year
[1] 19114
When working with dates this is not a problem, but when you need to represent
time precisely, you should know that difftime isn’t aware of
DST nor leap years.
Along with POSIXct there is POSIXlt which is closer to human-readable forms
which components are integer vectors, except sec and zone.
1posix_lt <- as.POSIXlt(date)
2unlist(unclass(posix_lt))
sec min hour mday mon year wday yday isdst
0 0 0 2 4 122 1 121 0
A virtual class “POSIXt” exists from which both of the classes inherit: it is used to allow operations such as subtraction to mix the two classes.
So there comes lubridate the tidy package dedicated to datetime manipulation.
It provides 3 alternate representation of time spans:
lubridate::duration: exact number of seconds in between twodatetime(basicallyPOSIXct/difftimewith leap years and DST support). However, while they measure the exact passage of time but do not always align with human measurements like hours, months and years.lubridate::period: record the change in the clock time between two date-times. They are measured in human units: years, months, days, hours, minutes, and seconds. (likePOSIXlt).lubridate::interval: time spans bound by two real date-times.
Time series #
stats::tsis somewhat inflexible.- tis
ti(Time Index) andtis(Time Indexed Series) - tsibble are great
Calendars #
- timedate which is poorly documented and doesn’t provide a nice API (in my opinion).
5. Date formats #
| code | format | convention | abbr | pad | locale | range |
|---|---|---|---|---|---|---|
| %C | Century | Gregorian | FALSE | TRUE | FALSE | 00-99 |
| %x | Date | Gregorian | FALSE | FALSE | TRUE | NA |
| %F | Date | ISO 8601 | FALSE | FALSE | FALSE | NA |
| %D | Date | US / C99 | FALSE | FALSE | FALSE | NA |
| %d | Day of Month | Gregorian | FALSE | TRUE | FALSE | 01-31 |
| %e, %-d | Day of Month | Gregorian | FALSE | FALSE | FALSE | 1-31 |
| %u | Day of Week | From Monday (UK) | FALSE | FALSE | FALSE | 1-7 |
| %w | Day of Week | From Sunday (US) | FALSE | FALSE | FALSE | 0-6 |
| %a | Day of Week | Name | TRUE | FALSE | TRUE | sun.-sat. |
| %A | Day of Week | Name | FALSE | FALSE | TRUE | sunday-saturday |
| %j | Day of Year | Gregorian | FALSE | TRUE | FALSE | 001-366 |
| %-j | Day of Year | Gregorian | FALSE | FALSE | FALSE | 1-366 |
| %-m | Month | Gregorian | FALSE | FALSE | FALSE | 1-12 |
| %m | Month | Gregorian | FALSE | TRUE | FALSE | 01-12 |
| %b,%h | Month | Name | TRUE | FALSE | TRUE | jan.-dec. |
| %B | Month | Name | FALSE | FALSE | TRUE | january-december |
| %W | Week | From Monday (UK) | FALSE | TRUE | FALSE | 00-53 |
| %U | Week | From Sunday (US) | FALSE | TRUE | FALSE | NA |
| %V | Week | ISO 8601 | FALSE | TRUE | FALSE | 01-53 |
| %Y | Year | Gregorian | FALSE | FALSE | FALSE | NA |
| %-y | Year | Gregorian | TRUE | FALSE | FALSE | 0-99 |
| %y | Year | Gregorian | TRUE | TRUE | FALSE | 00-99 |
| %G | Year | ISO 8601 | FALSE | FALSE | FALSE | NA |
| %-g | Year | ISO 8601 | TRUE | FALSE | FALSE | 0-99 |
| %g | Year | ISO 8601 | TRUE | TRUE | FALSE | 00-99 |
abbr: abbreviatedpad: left padded (prefixed) with 0 for fixed-widthlocale: result change based onLC_TIMElocale
Which we can summarize a bit:
| full (abbr) | ISO 8601 | Gregorian US | Gregorian UK | locale |
|---|---|---|---|---|
| Date | %F | %D | %D | %x |
| Year | %G (%g) | %Y (%y) | %Y (%y) | NA |
| Month | NA | %m | %m | %B (%b) |
| Week | %V | %U | %W | NA |
| Day of Year | NA | %j | %j | NA |
| Day of Week | NA | %w | %u | %A (%a) |
| Day of Month | NA | %d | %d | NA |
Only the date related (not datetime) are described here.
Find more formats with ?strptime.
6. Representations #
In calendr, the only way to input
-
Ywd=>iso_date=> “2022-W18-01” -
Yfd=>year_fortnight_day=>yearmon_date(date, .5)=> “2022-f5-XX” -
Ybd=>year_bimonth_day=>yearmon_date(date, .5)=> “2022-05-b1-XX” -
Ymd=>date=>yearmon_date(date, 1)=> “2022-05-02” -
YBd=>year_bimester_day=>yearmon_date(date, 2)=> “2022-BX-XX” -
YTd=>year_quarter_day=>year_trimester_day=>yearmon_date(date, 3)=> “2022-T2-31” -
YQd=>year_quadrimester_day=>yearmon_date(date, 4)=> “2022-QX-XX” -
YSd=>year_semester_day=>yearmon_date(date, 6)=> “2022-S1-122”
These are called the lossless date formats.
Internally, these function call will all result in an object of class c("calDate", "Date")
with the same value and a varying format attribute.
Lossy formats (output only) #
day week fortnight bimonth month bimester
%d/%a %7.d %15.d %.5m %m/%b %2.m
trimester quadrimester semester year biennum
%3.m %4.m %6.m %y %2.y
year isoweek day
"y" "w" "d"
format.Date(date, format, locale) {
setlocale(locale)
pad
}
as_date(date, format, locale) {}
remove padding with with -
if argument lossless is set to TRUE then we’ll abort if year, month and day
cannot be deducted from called output.
Accessors #
By default, accessors always return an integer.
When user asks for names he get them as an order factor.
.count_dates(date, what = c(“days”, “fortnights”, “weeks”, “isoweeks”, “months”, “year”), within = c(‘week", “year”…))
When dealing with week default is ISO conventions.
Whereas when dealing with month default is gregorian convention (iso = FALSE).
-
day_of_week(date, iso) -
day_of_month(date, iso, month_bundle = 1) -
day_of_year(date, iso, year_bundle = 1) -
fortnight_of_month(date, month_bundle = 1) -
fortnight_of_year(date, year_bundle) -
week_of_month(date, month_bundle = 1) -
week_of_year(date, year_bundle = 1, iso) -
month_of_year(date, month_bundle = 1, year_bundle = 1) -
year(date, iso, year_bundle = 1)
shortcuts #
day=>day_of_monthfortnight=>fortnight_of_yearweek=>week_of_year
For any x_of_y (like day_of_week) function they is a serie of functions:
x_of_yto transform date into integerassert_xfalls between the given boundaries
x_in_yreturning a vector of datesy_firsty_last
For month-isoweek https://stackoverflow.com/questions/46319137/aggregate-iso-weeks-into-months-with-a-dataset-containing-just-iso-weeks
It might seems that fortnight and bimonth are the same. Think again => A fortnight is exactly 14 or 15 days (set by user) starting from January 1st (or other fiscal year) while bimonth is half of a month (remainders are recycled in the first or last half depending on user) so between 14 and 15 days depending on the month but always start on Month 1st.
*within* #
days_within_month(date)
targets_within_group(date, group_bundle, year_bundle)
- target days always plural
- within ==> %within% infix
- group = month always singular
*_first and *_last #
ISO #
ISO year doesn’t necessarily start on January 1st.
use iso = TRUE to change this behavior.
Fiscal #
By default first day of the year is January 1st, but this is not always the case. Most schools use September 1st and your company might use any date it best fits its interets.
So year_start lets you change this.
7. Date spans #
%–% infix to create time span
2021-04-01 %in% (2021-01-01 %–% 05-02)
8. Bizdays #
Holidays #
Weekends #
9. SimpleR Time Serie #
BCE which stands for Before Common Era has the exact same meaning as BC which stands for Before Christ. ↩︎
In chronology and periodization, an epoch or reference epoch is an instant in time chosen as the origin of a particular calendar era. The “epoch” serves as a reference point from which time is measured. The Unix Epoch is set arbitrarily on “1970-01-01” at midnight UTC and systems based on UNIX count time from seconds from this base (POSIX standard). ↩︎
Comments