intro-to-imprinting

1. Install the imprinting package

The imprinting package isn’t yet available on CRAN, so it can’t yet be installed using the normal install.packages("imprinting") command in R. (But this feature is coming soon!)

For now, we have to install the package from github using the following steps.

1.1 Install the devtools package

If prompted: you do not need to researt R prior to lading, and you do not need to install from souce.

install.packages("devtools", repos='http://cran.us.r-project.org')

1.2 Load the devtools package

library(devtools) # Load the package
#> Loading required package: usethis

Now that we have devtools installed, we can use the install_github() function to install the imprinting package.

1.3 Install the imprinting package from github

This builds and installs the package using source files from https://github.com/cobeylab/imprinting.

devtools::install_github("cobeylab/imprinting")

1.4 Load the imprinting package

library(imprinting)

Now the package should be installed and loaded into your R workspace.

2. Use the package to calculate imprinting probabilities

The main reason to use the imprinting package is to calculate birth year-specific probabilities of imprinting to a specific subtype of influenza A. You can read more about the biology and methods behind these calculations in Gostic et al. (2016).

2.1 Calculate imprinting probabilities for one country and year of observation

Use the function get_imprinting_probabilities(). Run ?get_imprinting_probabilities for help.

get_imprinting_probabilities(observation_years = 2022, countries = "United States")
#> # A tibble: 420 × 5
#>     year country       birth_year subtype imprinting_prob
#>    <dbl> <chr>              <dbl> <chr>             <dbl>
#>  1  2022 United States       2022 A/H1N1        0.0000297
#>  2  2022 United States       2021 A/H1N1        0.0000679
#>  3  2022 United States       2020 A/H1N1        0.0702   
#>  4  2022 United States       2019 A/H1N1        0.152    
#>  5  2022 United States       2018 A/H1N1        0.171    
#>  6  2022 United States       2017 A/H1N1        0.147    
#>  7  2022 United States       2016 A/H1N1        0.225    
#>  8  2022 United States       2015 A/H1N1        0.169    
#>  9  2022 United States       2014 A/H1N1        0.308    
#> 10  2022 United States       2013 A/H1N1        0.321    
#> # … with 410 more rows

The function returns a tibble wtih five columns: subtype, year, country, birth_year, and imprinting_prob. The column imprinting_prob gives the probability that someone born in birth_year and observed in year has imprinted to subtype.

We can run the same command use the df_format='wide' option to output the same results in wide format. This displays all imprinting probabilities for the cohort side-by-side

get_imprinting_probabilities(observation_years = 2022, 
                             countries = "United States", 
                             df_format = 'wide')
#> # A tibble: 105 × 7
#>     year country       birth_year `A/H1N1` `A/H2N2` `A/H3N2` naive
#>    <dbl> <chr>              <dbl>    <dbl>    <dbl>    <dbl> <dbl>
#>  1  2022 United States       1918        1        0        0     0
#>  2  2022 United States       1919        1        0        0     0
#>  3  2022 United States       1920        1        0        0     0
#>  4  2022 United States       1921        1        0        0     0
#>  5  2022 United States       1922        1        0        0     0
#>  6  2022 United States       1923        1        0        0     0
#>  7  2022 United States       1924        1        0        0     0
#>  8  2022 United States       1925        1        0        0     0
#>  9  2022 United States       1926        1        0        0     0
#> 10  2022 United States       1927        1        0        0     0
#> # … with 95 more rows

2.2 Why do we specify the `observation_year`?

The observation_year affects imprinting probabilities in birth cohorts who are young enough to still be in the process of imprinting. Our model assumes that everyone has been infected by influenza before age 12, so in cohorts <12 years of age at the time of observation, imprinting probabilities depend on the observation year.

E.g. consider the cohort born in 2000:

If the observation year is 2005, then this birth cohort is five years old, and still young enough to remain naive to influenza. The outputs below show that 22.9% of the cohort has not yet imprinted to any subtype.
If we calculate imprinting probabilities for the same birth cohort observed in 2011 (at age 11), the model shows that most of the cohort has imprinted, but there is still a very small non-zero probability of remaining naive.
When the cohort reaches age 12, (observation year 2012) the model enforces the assumption that everyone in the cohort has imprinted. It normalizes the probabilities of subtype-specific imprinting so that the H1N1, H2N2, and H3N2-specific probabilities sum to 1, and it sets the naive probability to zero.
Once the cohort reaches age 12, imprinting probabilities are fixed. The outputs below show that imprinting probabilities for the 2000 cohort are identical if observed at age 12 (in 2012), or at age 22 (in 2022).

Note: we added the age_at_observation column to the outputs below for clarity.

get_imprinting_probabilities(observation_years = c(2005, 2011, 2012, 2022), 
                             countries = "United States", 
                             df_format = 'wide') %>%
  dplyr::filter(birth_year == 2000) %>%
  mutate(age_at_observation = year-birth_year) %>%
  select(c(1,2,3,8,4:7))
#> # A tibble: 4 × 8
#>    year country    birth_year age_at_observat… `A/H1N1` `A/H2N2` `A/H3N2`  naive
#>   <dbl> <chr>           <dbl>            <dbl>    <dbl>    <dbl>    <dbl>  <dbl>
#> 1  2005 United St…       2000                5    0.196        0    0.574 0.229 
#> 2  2011 United St…       2000               11    0.322        0    0.667 0.0104
#> 3  2012 United St…       2000               12    0.324        0    0.676 0     
#> 4  2022 United St…       2000               22    0.324        0    0.676 0

2.2 View a list of the available countries

We can only calculate imprinting probabilities for countries with data in WHO Flu Net. The input name and spelling must match the outputs below:

show_available_countries() %>%
  print(n = 200)
#> # A tibble: 175 × 1
#>     country                         
#>     <chr>                           
#>   1 Afghanistan                     
#>   2 Albania                         
#>   3 Algeria                         
#>   4 Angola                          
#>   5 Anguilla                        
#>   6 Antigua and Barbuda             
#>   7 Argentina                       
#>   8 Armenia                         
#>   9 Aruba                           
#>  10 Australia                       
#>  11 Austria                         
#>  12 Azerbaijan                      
#>  13 Bahamas                         
#>  14 Bahrain                         
#>  15 Bangladesh                      
#>  16 Barbados                        
#>  17 Belarus                         
#>  18 Belgium                         
#>  19 Belize                          
#>  20 Bermuda                         
#>  21 Bhutan                          
#>  22 Bolivia                         
#>  23 Bosnia and Herzegovina          
#>  24 Brazil                          
#>  25 British Virgin Islands          
#>  26 Bulgaria                        
#>  27 Burkina Faso                    
#>  28 Cabo Verde                      
#>  29 Cambodia                        
#>  30 Cameroon                        
#>  31 Canada                          
#>  32 Cayman Islands                  
#>  33 Central African Republic        
#>  34 Chad                            
#>  35 Chile                           
#>  36 China                           
#>  37 Colombia                        
#>  38 Congo                           
#>  39 Costa Rica                      
#>  40 Cote d'Ivoire                   
#>  41 Croatia                         
#>  42 Cuba                            
#>  43 Cyprus                          
#>  44 Czechia                         
#>  45 Democratic Republic of Congo    
#>  46 Denmark                         
#>  47 Dominica                        
#>  48 Dominican Republic              
#>  49 Ecuador                         
#>  50 Egypt                           
#>  51 El Salvador                     
#>  52 Estonia                         
#>  53 Ethiopia                        
#>  54 Fiji                            
#>  55 Finland                         
#>  56 France                          
#>  57 French Guiana                   
#>  58 Gambia                          
#>  59 Georgia                         
#>  60 Germany                         
#>  61 Ghana                           
#>  62 Greece                          
#>  63 Grenada                         
#>  64 Guadeloupe                      
#>  65 Guatemala                       
#>  66 Guinea                          
#>  67 Guinea-Bissau                   
#>  68 Guyana                          
#>  69 Haiti                           
#>  70 Honduras                        
#>  71 Hungary                         
#>  72 Iceland                         
#>  73 India                           
#>  74 Indonesia                       
#>  75 Iran                            
#>  76 Iraq                            
#>  77 Ireland                         
#>  78 Israel                          
#>  79 Italy                           
#>  80 Jamaica                         
#>  81 Japan                           
#>  82 Jordan                          
#>  83 Kazakhstan                      
#>  84 Kenya                           
#>  85 Kosovo                          
#>  86 Kuwait                          
#>  87 Kyrgyzstan                      
#>  88 Laos                            
#>  89 Latvia                          
#>  90 Lebanon                         
#>  91 Lithuania                       
#>  92 Luxembourg                      
#>  93 Madagascar                      
#>  94 Malaysia                        
#>  95 Maldives                        
#>  96 Mali                            
#>  97 Malta                           
#>  98 Martinique                      
#>  99 Mauritania                      
#> 100 Mauritius                       
#> 101 Mexico                          
#> 102 Moldova                         
#> 103 Mongolia                        
#> 104 Montenegro                      
#> 105 Morocco                         
#> 106 Mozambique                      
#> 107 Myanmar                         
#> 108 Namibia                         
#> 109 Nepal                           
#> 110 Netherlands                     
#> 111 New Caledonia                   
#> 112 New Zealand                     
#> 113 Nicaragua                       
#> 114 Niger                           
#> 115 Nigeria                         
#> 116 North Korea                     
#> 117 North Macedonia                 
#> 118 Norway                          
#> 119 Oman                            
#> 120 Pakistan                        
#> 121 Palestine                       
#> 122 Panama                          
#> 123 Papua New Guinea                
#> 124 Paraguay                        
#> 125 Peru                            
#> 126 Philippines                     
#> 127 Poland                          
#> 128 Portugal                        
#> 129 Qatar                           
#> 130 Romania                         
#> 131 Russia                          
#> 132 Rwanda                          
#> 133 Saint Kitts and Nevis           
#> 134 Saint Lucia                     
#> 135 Saint Vincent and the Grenadines
#> 136 Saudi Arabia                    
#> 137 Senegal                         
#> 138 Serbia                          
#> 139 Seychelles                      
#> 140 Sierra Leone                    
#> 141 Singapore                       
#> 142 Slovakia                        
#> 143 Slovenia                        
#> 144 South Africa                    
#> 145 South Korea                     
#> 146 South Sudan                     
#> 147 Spain                           
#> 148 Sri Lanka                       
#> 149 Sudan                           
#> 150 Suriname                        
#> 151 Sweden                          
#> 152 Switzerland                     
#> 153 Syria                           
#> 154 Tajikistan                      
#> 155 Tanzania                        
#> 156 Thailand                        
#> 157 Timor                           
#> 158 Togo                            
#> 159 Trinidad and Tobago             
#> 160 Tunisia                         
#> 161 Turkey                          
#> 162 Turkmenistan                    
#> 163 Turks and Caicos                
#> 164 Uganda                          
#> 165 Ukraine                         
#> 166 United Arab Emirates            
#> 167 United Kingdom                  
#> 168 United States                   
#> 169 Uruguay                         
#> 170 Uzbekistan                      
#> 171 Venezuela                       
#> 172 Vietnam                         
#> 173 Yemen                           
#> 174 Zambia                          
#> 175 Zimbabwe

2.3 Generate probabilities for many countries and observation years

many_probabilities = get_imprinting_probabilities(observation_years = c(2000, 2019, 2022),
                                                  countries = c('Brazil', 'Afghanistan', 'Estonia', 'Finland')) 
## Store the outputs in a variable called many_probabilities
many_probabilities ## View the outputs in the console
#> # A tibble: 5,040 × 5
#>     year country     birth_year subtype imprinting_prob
#>    <dbl> <chr>            <dbl> <chr>             <dbl>
#>  1  2000 Brazil            2022 A/H1N1          0.331  
#>  2  2019 Brazil            2022 A/H1N1          0.197  
#>  3  2022 Brazil            2022 A/H1N1          0      
#>  4  2000 Afghanistan       2022 A/H1N1          0.536  
#>  5  2019 Afghanistan       2022 A/H1N1          0.110  
#>  6  2022 Afghanistan       2022 A/H1N1          0.00138
#>  7  2000 Estonia           2022 A/H1N1          0.485  
#>  8  2019 Estonia           2022 A/H1N1          0.188  
#>  9  2022 Estonia           2022 A/H1N1          0      
#> 10  2000 Finland           2022 A/H1N1          0.239  
#> # … with 5,030 more rows

Alternatively, you can view the outputs in a separate window or save them as a .csv file on your hard drive.

# View the outputs in a separate window.
View(many_probabilities) 

# Save the outputs as a .csv file in your current working directory.
write_csv(many_probabilities, 'many_probabilities.csv')

3. Plot the outputs

3.1 Plot the first country-year combination in the output data frame

plot_one_country_year() takes a long-formatted output data frame and plots the first country and year combination.

head(many_probabilities)
#> # A tibble: 6 × 5
#>    year country     birth_year subtype imprinting_prob
#>   <dbl> <chr>            <dbl> <chr>             <dbl>
#> 1  2000 Brazil            2022 A/H1N1          0.331  
#> 2  2019 Brazil            2022 A/H1N1          0.197  
#> 3  2022 Brazil            2022 A/H1N1          0      
#> 4  2000 Afghanistan       2022 A/H1N1          0.536  
#> 5  2019 Afghanistan       2022 A/H1N1          0.110  
#> 6  2022 Afghanistan       2022 A/H1N1          0.00138
plot_one_country_year(many_probabilities)

Select a specific country and year to plot

You can use filter() to select a specific country and year for plotting.

plot_one_country_year(many_probabilities %>% 
                        dplyr::filter(country == 'Estonia', year == 2019))

3.2 Plot up to five countires at a time

plot_many_country_years() generates a plot of the first five countries in the imprinting outputs, across an arbitrary number of years.

The plots on the left show imprinting probabilities for the latest observation year
The plots on the right show how H3N2 imprinting probabilities have changed as cohorts grow older from the earliest to the latest observation year in the outputs. The vertical lines and arrow show aging of the 1968 birth cohort.

plot_many_country_years(many_probabilities)

4. Explore the underlying data and components of the imprinting probabilities

Get the fraction of influenza circulation caused by each subtype in each epidemic year from 1918-2022 in the United States using get_country_cocirculation. Run ?get_country_cocirculation_data for notes on data sources.

get_country_cocirculation_data('United States', 2022)
#> # A tibble: 105 × 9
#>     year `A/H1N1` `A/H2N2` `A/H3N2`     A B     group1 group2 data_from         
#>    <dbl>    <dbl>    <dbl>    <dbl> <dbl> <lgl>  <dbl>  <dbl> <chr>             
#>  1  1918        1        0        0     1 NA         1      0 Historical_assump…
#>  2  1919        1        0        0     1 NA         1      0 Historical_assump…
#>  3  1920        1        0        0     1 NA         1      0 Historical_assump…
#>  4  1921        1        0        0     1 NA         1      0 Historical_assump…
#>  5  1922        1        0        0     1 NA         1      0 Historical_assump…
#>  6  1923        1        0        0     1 NA         1      0 Historical_assump…
#>  7  1924        1        0        0     1 NA         1      0 Historical_assump…
#>  8  1925        1        0        0     1 NA         1      0 Historical_assump…
#>  9  1926        1        0        0     1 NA         1      0 Historical_assump…
#> 10  1927        1        0        0     1 NA         1      0 Historical_assump…
#> # … with 95 more rows

Get the circulation intensity of influenza A in each epidemic year using get_country_intensity_data().

Intensity of 1 is average. Higher values indicate more influenza A circulation, lower values indicate less circulation.

See ?get_country_intensity_data for details on underlying data.

get_country_intensity_data(country = 'China', max_year = 2022)
#> # A tibble: 113 × 2
#>     year intensity
#>    <dbl>     <dbl>
#>  1  1911      1   
#>  2  1912      1   
#>  3  1913      1   
#>  4  1914      1   
#>  5  1915      1.12
#>  6  1916      1.29
#>  7  1917      1.23
#>  8  1918      2.5 
#>  9  1919      2.5 
#> 10  1920      1.87
#> # … with 103 more rows

Get the probability that an individual imprinted 0, 1, 2, … 12 years after birth

Use the function get_p_infection_year()

probs = get_p_infection_year(birth_year = 2000,
                     observation_year = 2022,
                     intensity_df = get_country_intensity_data('Mexico', 2022),
                     max_year = 2022)
names(probs) = as.character(2000+(0:12))
probs
#>        2000        2001        2002        2003        2004        2005 
#> 0.014797424 0.038348694 0.014337222 0.078764203 0.015525076 0.093914147 
#>        2006        2007        2008        2009        2010        2011 
#> 0.007673805 0.058215925 0.017624843 0.462559063 0.119102374 0.013750798 
#>        2012 
#> 0.036377222
sum(probs) ## Raw probabilities are not yet normalized.
#> [1] 0.9709908

norm_probs = probs/sum(probs) ## Normalize
sum(norm_probs)
#> [1] 1