Customising demography
demography.Rmd
Introduction
This vignette introduces how demographic processes are represented and customized in the PREVAIL model.
Demography is a critical foundation for infectious disease models like PREVAIL because the underlying population structure directly shapes both current immunity profiles and future epidemic projections. Age distribution, birth rates, death rates, and migration all influence how susceptible, immune, and at-risk populations evolve over time. Just as importantly, demographic structure determines the configuration of contact matrices, which define how different age groups interact and transmit infection within the population. Accurately capturing these dynamics allows the model to reflect realistic patterns of immunity buildup, population turnover, and the flow of infection between groups. This is especially important for long-term projections or settings with rapid demographic change, where shifts in both population structure and contact patterns can substantially alter the risk of outbreaks and the effectiveness of interventions.
While the model has built in data to handle each of these aspects, if the user has their own data we can flexibly integrate this into the model design. This hopefully allows the user to end up with better estimates of population level immunity, and the impact of different scenarios. We explain how users can define and modify the age structure (from single-year to multi-year age groups), specify age-specific or overall rates for migration, deaths, and births, and contact matricies. Each of these have substantial implications for simulation outputs. By the end of this section, you will be able to flexibly adapt demographic assumptions—such as age groupings, population movement, and vital rates—to match your data sources or scenario needs, ensuring that model results accurately reflect your setting of interest.
1. Prepare your custom data
Firstly we need to load our required packages, we’re using the pacman package here, which will also install these packages if missing.
#If pacman is not already present, install
if(!require("pacman")) install.packages("pacman")
#Install missing packages, and load
pacman::p_load(
PREVAIL,
tidyverse
)
To use our own custom data, we need to make sure that it is in the
format required. This will depend on whether it is a parameter that is
age independent (migration
) or age
dependent (fertility
, mortality
,
population
).
For parameters that are age independent, we only require a
data.frame with two columns, year
and
value
.
custom_migration_df <- data.frame(
year = 1950:2023,
value = sample(x = -5:5, size = 74, replace = TRUE)
)
head(custom_migration_df)
#> year value
#> 1 1950 1
#> 2 1951 -1
#> 3 1952 0
#> 4 1953 -2
#> 5 1954 0
#> 6 1955 3
For parameters that are age dependent, we can also include a
column called age
for age specific values. However, if this
information is missing, then we can auto-fill it later.
custom_mortality_df <- expand.grid(
year = 1950:2023,
age = 0:100
) %>%
dplyr::group_by(year, age) %>%
dplyr::mutate(value = sample(0:5, size = 1))
head(custom_mortality_df)
#> # A tibble: 6 × 3
#> # Groups: year, age [6]
#> year age value
#> <int> <int> <int>
#> 1 1950 0 3
#> 2 1951 0 2
#> 3 1952 0 2
#> 4 1953 0 4
#> 5 1954 0 0
#> 6 1955 0 2
This information is expanded within the function
custom_data_process_wrapper()
, which is what we use to
incorporate our custom data. This function is very similar to
data_load_process_wrapper()
but allows us to customise
datasets used, rather than only allowing the defaults.
Reformatting demographic data
Inside of the function custom_data_process_wrapper()
is
the function reformat_demographic_data()
which checks our
custom data, re-formats, and expands it to the format expected by the
functions that prepare our data and run our model.
The function, reformat_demographic_data()
has several
different approaches to how it deals with missing data, which in
custom_data_process_wrapper()
are dependent on the specific
dataset it is trying to format. Generally, it will fill in missing data
that is required for the model by using the closest year and age
available. Where no age data is present, it will use the year value
across each age group (no age dependence), and where no year data is
available it will use the closest years values.
This auto-fill applies if we have limited data on years, and limited data on ages. For example, if we only had data on years 1970 to 2015, and ages 20-50, then we would still be able to include this.
custom_mortality <- expand.grid(
year = 1970:2015,
age = 20:50
) %>%
dplyr::group_by(year, age) %>%
dplyr::mutate(value = sample(0:5, size = 1))
head(custom_mortality)
#> # A tibble: 6 × 3
#> # Groups: year, age [6]
#> year age value
#> <int> <int> <int>
#> 1 1970 20 5
#> 2 1971 20 5
#> 3 1972 20 5
#> 4 1973 20 0
#> 5 1974 20 1
#> 6 1975 20 2
This will also work if you have a range of ages, separated by
-
.
custom_mortality <- expand.grid(
year = 1970:2015,
age = c("20-29", "30-59", "60-100")
) %>%
dplyr::group_by(year, age) %>%
dplyr::mutate(value = sample(0:5, size = 1))
head(custom_mortality)
#> # A tibble: 6 × 3
#> # Groups: year, age [6]
#> year age value
#> <int> <fct> <int>
#> 1 1970 20-29 5
#> 2 1971 20-29 1
#> 3 1972 20-29 2
#> 4 1973 20-29 3
#> 5 1974 20-29 5
#> 6 1975 20-29 4
Ages that are not present in the data, but required by the model (those 0-100) will take the values of the nearest age and year.
Contact matricies
Contact matricies require a slightly different formatting. They are
time independent in our model, are so do not require a year
column. However, they require an age_from
,
age_to
, and value
column.
The same logic around auto-filling missing years, and the inclusion of ranges “-” and how it fills in the values as described previously applies.
custom_contact <- expand.grid(
age_from = c("15-30", "31-50", "50+"),
age_to = c("15-30", "31-50", "50+")
) %>%
group_by(age_from, age_to) %>%
mutate(
value = pmax(rnorm(1, mean = 3, sd = 1), 0)
)
head(custom_contact)
#> # A tibble: 6 × 3
#> # Groups: age_from, age_to [6]
#> age_from age_to value
#> <fct> <fct> <dbl>
#> 1 15-30 15-30 3.57
#> 2 31-50 15-30 4.19
#> 3 50+ 15-30 1.93
#> 4 15-30 31-50 2.47
#> 5 31-50 31-50 3.36
#> 6 50+ 31-50 2.97
Values are normalized during the parameter smoothing process.
2. Customising our demographic parameters
Here we use custom_data_process_wrapper()
instead of
data_load_process_wrapper()
. However, the approach is
generally the same, as it takes all of the arguments used in
data_load_process_wrapper()
, and a few others, all of which
have a default value of NA
(which tells the model to use
the inbuilt data).
- custom_migration
- custom_fertility
- custom_mortality
- custom_population
- custom_contact_matricies
- custom_routine_vaccination
- custom_sia_vaccination
- custom_disease_data
- custom_vaccination_schedule
- custom_disease_parameters
- custom_vaccine_parameters
Here we will be focusing on the demographic inputs:
- custom_migration
- custom_fertility
- custom_mortality
- custom_population
- custom_contact_matricies
Running the function
To run the model with custom data, you add in your custom data.frame
at the appropriate point. So to run the model with our custom_mortality
data.frame, we would fill in our initial arguments as usual, and then
specify custom_mortality = custom_mortality_df
.
custom_params <- custom_data_process_wrapper(
iso = "GBR",
disease = "measles",
R0 = 15,
custom_mortality = custom_mortality_df
)
and if we wanted to run the model with our custom mortality and migration it would be:
custom_params <- custom_data_process_wrapper(
iso = "GBR",
disease = "measles",
R0 = 15,
custom_mortality = custom_mortality_df,
custom_migration = custom_migration_df
)
This provides our custom parameters for use in
model_run()
.
Note: This function is slower than
data_load_process_wrapper()
, so allow for a longer when
running it, especially when you have multiple custom datasets.
3. Running the model with custom parameters
As described in the vignette, Getting Started with PREVAIL, to run the model we then just have to supply the parameters required.
custom_model_run <- run_model_unpack_results(
params = custom_params,
no_runs = 1
)
From this, you can then run the additional plotting functions, extract current susceptibility and look at onward scenarios as previously described.