Generate probabilities of imprinting to custom subsets of influenza A viruses
By default, get_imprinting_probabilities()
calculates
subtype-specific probabilities of imprinting to influenza A H1N1, H2N2,
or H3N2. Researchers may want to calculate different kinds of imprinting
probabilities. E.g. perhaps we want to study imprinting to specific
influenza isolates, clades, glycosylation states,
or to a multivalent vaccine.
To calculate imprinting to custom groups of influenza A viruses, use
the annual_frequencies
option in
get_imprinting_probabilities()
.
The annual_frequencies
input must be a list whose names
match the countries
input. Each element of the list must be
a data frame or tibble with the following columns:
-
year
- containing numeric values from 1918-max(observation_years
) - any number of frequency columns whose names indicate the imprinting category, and whose values indicate the annual fraction of circulating influenza A exposures in that category. Within each year (row), these values must sum to one.
As an example, let’s imagine we want to calculate subtype-specific imprinting probabilities, with some probability of imprinting by vaccination in the United States and Germany. Note that the pediatric influenza vaccination rates used in this example are PURELY HYPOTEHTICAL, and not based on data or actual vaccine policies in these countries.
Let’s start by making a data frame of circulation frequencies for the United States.
library(imprinting)
## Start with subtype-specific fractions for H1N1, H2N2, H3N2
US_frequencies = get_country_cocirculation_data(country = 'United States', max_year = 2022) %>%
select(1:4)
head(US_frequencies)
#> # A tibble: 6 × 4
#> year `A/H1N1` `A/H2N2` `A/H3N2`
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1918 1 0 0
#> 2 1919 1 0 0
#> 3 1920 1 0 0
#> 4 1921 1 0 0
#> 5 1922 1 0 0
#> 6 1923 1 0 0
Now, add in a vaccination column. Not all countries vaccinate healthy infants against influenza, and infant influenza vaccination has only been widely practiced for the past few decades, even in countries where coverage is now high. Hypothetically, let’s assume that 5% of US infants were vaccinated against influenza starting in 1995, increasing steadily to 75% coverage in 2020. (Again, this is purely hypothetical, and not based on data.)
## Add a vaccination column
US_frequencies <- US_frequencies %>%
mutate(vaccination = c(rep(0, 77), seq(.5, .75, length = 26), .75, .75), # Add a vaccination column
`A/H1N1` = `A/H1N1`*(1-vaccination), # Assume only non-vaccinated children have primary
`A/H2N2` = `A/H2N2`*(1-vaccination), # infections; multiply the subtype-specific circulation
`A/H3N2` = `A/H3N2`*(1-vaccination)) # fractions by one minus the year's vaccination probability.
tail(US_frequencies, n = 30)
#> # A tibble: 30 × 5
#> year `A/H1N1` `A/H2N2` `A/H3N2` vaccination
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1993 0.105 0 0.895 0
#> 2 1994 0.00522 0 0.995 0
#> 3 1995 0.0108 0 0.489 0.5
#> 4 1996 0.288 0 0.202 0.51
#> 5 1997 0 0 0.48 0.52
#> 6 1998 0.00444 0 0.466 0.53
#> 7 1999 0.00372 0 0.456 0.54
#> 8 2000 0.130 0 0.320 0.55
#> 9 2001 0.335 0 0.105 0.56
#> 10 2002 0.0155 0 0.415 0.57
#> # … with 20 more rows
Assume Germany adopted their infant vaccination policy 10 years later, in 2005. Generate a Germany-specific table of frequencies:
Germany_frequencies <- get_country_cocirculation_data(country = 'Germany',
max_year = 2022) %>%
select(1:4) %>%
mutate(vaccination = c(rep(0, 87), seq(.05, .75, length = 16), .75, .75),
`A/H1N1` = `A/H1N1`*(1-vaccination), # Assume only non-vaccinated children have primary
`A/H2N2` = `A/H2N2`*(1-vaccination), # infections; multiply the subtype-specific circulation
`A/H3N2` = `A/H3N2`*(1-vaccination))
tail(Germany_frequencies, 20)
#> # A tibble: 20 × 5
#> year `A/H1N1` `A/H2N2` `A/H3N2` vaccination
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2002 0 0 1 0
#> 2 2003 0.000727 0 0.999 0
#> 3 2004 0 0 0.95 0.05
#> 4 2005 0.190 0 0.713 0.0967
#> 5 2006 0.145 0 0.712 0.143
#> 6 2007 0.0820 0 0.728 0.19
#> 7 2008 0.0537 0 0.710 0.237
#> 8 2009 0.447 0 0.269 0.283
#> 9 2010 0.644 0 0.0264 0.33
#> 10 2011 0.612 0 0.0111 0.377
#> 11 2012 0.0352 0 0.541 0.423
#> 12 2013 0.280 0 0.250 0.47
#> 13 2014 0.149 0 0.335 0.517
#> 14 2015 0.0977 0 0.339 0.563
#> 15 2016 0.296 0 0.0945 0.61
#> 16 2017 0.0128 0 0.331 0.657
#> 17 2018 0.255 0 0.0415 0.703
#> 18 2019 0.126 0 0.124 0.75
#> 19 2020 0.121 0 0.129 0.75
#> 20 2022 0.0110 0 0.239 0.75
Input the custom frequencies into
get_imprinting_probabilities()
## Check that all frequencies sum to 1
rowSums(US_frequencies[,2:5])
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
rowSums(Germany_frequencies[,2:5])
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# Wrap the country-specific frequencies into a named list
input_list = list("United States" = US_frequencies,
"Germany" = Germany_frequencies)
## Calculate probabilities
get_imprinting_probabilities(observation_years = 2022,
countries = c("United States", "Germany"),
annual_frequencies = input_list,
df_format = "wide")
#> # A tibble: 210 × 8
#> year country birth_year `A/H1N1` `A/H2N2` `A/H3N2` vaccination naive
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2022 United States 1918 1 0 0 0 0
#> 2 2022 United States 1919 1 0 0 0 0
#> 3 2022 United States 1920 1 0 0 0 0
#> 4 2022 United States 1921 1 0 0 0 0
#> 5 2022 United States 1922 1 0 0 0 0
#> 6 2022 United States 1923 1 0 0 0 0
#> 7 2022 United States 1924 1 0 0 0 0
#> 8 2022 United States 1925 1 0 0 0 0
#> 9 2022 United States 1926 1 0 0 0 0
#> 10 2022 United States 1927 1 0 0 0 0
#> # … with 200 more rows