7  OPTIONAL - Time Series

Week 7 - Cleaning and plotting time series data

This is an optional tutorial if you plan to work with data that is time dependent, i.e time series data.

Note

THIS TUTORIAL WILL NOT BE IN THE EXAM. It is an optional extra learning tool.

This tutorial contains examples from longitudinal survey data (i.e surveying across time), weather data (date-based and date-time-based), and recordings of age data to look at growth over time.

To work with data with dates and time, we will need the R package lubridate. Some of the turorials require more specialised packages to easily plot and work with time series data such as tsibble and feasts.

# If you need to install new packages, here are code hidden in the comments.
#install.packages("tsibble")
#remotes::install_github("tidyverts/feasts")

# Load libraries
library(tidyverse)
library(lubridate) # for converting dates and time
library(lterdatasampler) # for accessing ecological data
library(tsibble) # to tidy temporal data with wrangling tools
library(feasts) # extension of tsibble to produce time series features, decompositions, statistical summaries and convenient visualisations.
library(broom)

lubridate

All of example data in this tutorial have been pre-cleaned. Let’s say for example, you loaded your raw data from a csv file, the date column is often stored as character. In this example, date is formatted as year-month-day, so to con to convert character to date format, use this following code with the ymd() function.

# Use example dataset and only keep the date and temperature
test_data <- luq_streamchem %>%
  select(sample_date, temp)

# convert character to date
test_data <- test_data %>%
  dplyr::mutate(sample_date = lubridate::ymd(sample_date))

lubridate can handle many formats, but how you format them depends on how the orignal dates are written.

dmy("05-01-1987")   # day-month-year
[1] "1987-01-05"
mdy("01/05/1987")   # month-day-year
[1] "1987-01-05"
ymd_hms("1987-01-05 14:30:00")  # date + time
[1] "1987-01-05 14:30:00 UTC"

Here are some examples of what function to use depending how the dates were formatted.

You can also break down the dates into useful variables:

test_data <- test_data %>%
  mutate(
    year  = lubridate::year(sample_date), # extract year only
    month = lubridate::month(sample_date), # extract month only as numeric
    month_lab = lubridate::month(sample_date, label = TRUE),  # extract month only as labels (Jan, Feb, Mar)
    doy   = lubridate::yday(sample_date),     # day of year
    week  = lubridate::isoweek(sample_date)  # present week as Monday, Tuesday, Wednesday et al.
  )

Some ecological time series have uneven sampling. You can check if the dates are evenly spaced out by the time difference with a lag of one using the lag() function.

test_data %>%
  arrange(sample_date) %>%
  mutate(time_diff = sample_date - lag(sample_date))
# A tibble: 317 × 8
   sample_date  temp  year month month_lab   doy  week time_diff
   <date>      <dbl> <dbl> <dbl> <ord>     <dbl> <dbl> <drtn>   
 1 1987-01-05     20  1987     1 Jan           5     2 NA days  
 2 1987-01-13     20  1987     1 Jan          13     3  8 days  
 3 1987-01-20     20  1987     1 Jan          20     4  7 days  
 4 1987-01-27     20  1987     1 Jan          27     5  7 days  
 5 1987-02-03     20  1987     2 Feb          34     6  7 days  
 6 1987-02-10     20  1987     2 Feb          41     7  7 days  
 7 1987-02-17     20  1987     2 Feb          48     8  7 days  
 8 1987-02-24     19  1987     2 Feb          55     9  7 days  
 9 1987-03-03     20  1987     3 Mar          62    10  7 days  
10 1987-03-10     20  1987     3 Mar          69    11  7 days  
# ℹ 307 more rows

This helps identify missing sampling periods or inconsistent effort.

Example dataset

Bad time data

Bad data can happen for a number of reasons. Your logger may have been damaged, or exposed to something that artificially changed the data. The logger might have a software bug. An example dataset with bad recorded data (likely equipment issue) shown in this figure.

Time series plot illustrating bad data recorded after 1st November. A sharp dip is observed showing outlier data.

What do you do with these data?

These data need to filtered out of your dataset in order to perform any kind of analyses. The way you filter data will be based on the type of data you are working with and how the data manifests. For example:

  • Filter out outliers by a redefined threshold. This example you can filter out values below zero.

  • Filter out outliers based on the IQR range. See week 1’s workshop for how to filter data using this method.

  • Filter out outliers based on the time. If you know a certain time period that damaged the equipment (i.e storm event), you can filter out by time.

Analysis

You are now ready to analyse time-series data. However, time-series analysis is beyond this second-year unit, but here are some possibilities you can you do with time data:

  • Repeated measures ANOVA: Test if the means of three or more related groups—where the same individual/community are measured multiple times, such as in longitudinal studies or before/after treatments—are statistically different.

  • Population dynamics: You can quantify changes in populations or number of individuals across time using using generalised linear models or generalised additive models for nonlinear effects. Some simple forecasting tool can identify trend, seasonality (breeding seasons, migration, lunar cycles, predator/prey dynamics), cycles, irregular trends, and predict future numbers.

  • Forecasting: More advanced forescasting models can deal with temporal autocorrelation, lagged effects, non-Gaussian data and missing observations, measurement, error, time-varying effects, non-linearities, and multi-series clustering. Some common time-series models that incorporates uncertainty in forecasting (i.e stochastic processes) includes include random walk, autoregressive, autoregressive integrated moving average, and exponential smoothing. Here is an example live forecast for the population and community dyanmics of desert rodents.

Change in Merriam’s kangaroo rat (Dipodomys merriami) captures over time from 1994 to current.
  • Movement ecology: Temporal and spatial data from tracking animals can help influence where animals move, how fast they move, if there daily/seasonal patterns in their movement, what kind kind of habitats do they move in, and what bahavioural states can be inferred from the data.

Additional resource

  • Coding Club Analysing Time Series Data: An R tutorial to understand the various temporal patterns in data by decomposing data into different cyclic trends.