Splits a data frame of time-stamped sequences into discrete time windows, computes per-site entropy and Mclust clustering for each window.
Usage
partition_time_windows(
data,
n_sites,
window_length = 1,
window_type = 3,
start_date = NULL,
end_date = NULL,
date_format = "%b-%Y",
verbose = FALSE,
...
)Arguments
- data
Data frame. Must contain a column named
Datecoercible toDate. Columns1throughn_sitesmust be the numeric site columns; any additional columns (such asCountryor other per-sequence metadata) may follow.- n_sites
Integer. Number of site columns. Site columns must occupy positions
1throughn_sitesofdata; any other columns (Date, optionallyCountry, etc.) must come after.- window_length
Integer. Window duration in months. Default is
1.- window_type
Integer (1, 2, or 3). Windowing strategy:
1= Cumulative,2= Sliding,3= Disjoint. Default is3.- start_date
Date or character. Window start. Defaults to earliest date in
data.- end_date
Date or character. Window end (cutoff). Defaults to latest date in
data.- date_format
Character. Format string for date labels. Default is
"%b-%Y".- verbose
Logical. If
TRUE, prints a progress bar and completion summary. Default isFALSE.- ...
Additional arguments passed to
cluster_sites_by_entropy.
Value
A named list:
- Partitions
List of data frame chunks, one per window.
- Entropies
List of numeric entropy vectors, one per window.
- Clusters
List of clustering results from
cluster_sites_by_entropy, one per window.- Max_Entropy
Numeric vector. Maximum cluster class label per window (equals number of clusters found;
NAif window was empty).- Dates_Labels
Character vector of window label strings.
- N_partitions
Integer. Total number of windows.
Details
Three windowing strategies are supported via window_type.
Throughout the descriptions below, T denotes the number of whole
months between start_date and end_date, and w
denotes window_length:
1 — Cumulative: Start is fixed; end expands by
wmonths each step. Each window includes all prior data plus one more period. Producesfloor(T / w)chunks.2 — Sliding: A window of
wmonths slides one month at a time. ProducesT - w + 1chunks.3 — Disjoint: Consecutive non-overlapping windows of
wmonths. Producesfloor(T / w)chunks. Default.
For example, with T = 12 months and w = 2: cumulative
produces 6 chunks (each progressively larger); sliding produces 11
chunks (overlapping, each 2 months wide); disjoint produces 6 chunks
(each exactly 2 months, non-overlapping).
Empty windows. When a window contains no observations, the
corresponding entries in the returned lists carry placeholder values:
Entropies is a zero-vector of length n_sites;
Clusters is a schema-consistent empty result matching
cluster_sites_by_entropy's empty-input return; Max_Entropy
is NA_integer_. Downstream consumers can guard on
nrow(Partitions[[i]]) > 0 or !is.na(Max_Entropy[i]).
Column layout requirement. The function extracts site columns
as data[, seq_len(n_sites), drop = FALSE]. The site columns
must therefore occupy positions 1 through n_sites.
Date (and any other metadata columns such as Country)
must come after the site columns. A common error is to place
Date first; this will produce incorrect entropy values.
See also
calculate_entropy for the per-site entropy
computation; cluster_sites_by_entropy for the GMM
clustering applied per window; relabel_entropy_classes
for standardizing the cluster labels in downstream consumers; and
calculate_hellinger_matrix for typical downstream use
of the Partitions output.
Examples
dates <- seq(as.Date("2020-01-01"), as.Date("2020-06-01"), by = "month")
df <- data.frame(
s1 = 1L,
s2 = c(rep(1L, 15), rep(2L, 15)),
Date = rep(dates, each = 5)
)
res <- partition_time_windows(df, n_sites = 2, window_length = 2,
window_type = 3, verbose = TRUE)
#> Partitioning 2 disjoint windows ...
#>
|
| | 0%
|
|========================= | 50%
|
|==================================================| 100%
#> Partitioning complete: 2 partitions generated (Jan-2020 to Jun-2020).
print(res$Dates_Labels)
#> [1] "Jan-2020 - Feb-2020" "Mar-2020 - Apr-2020"
print(res$Max_Entropy)
#> [1] NA 1