Skip to contents


imprinting is an R package to reconstruct birth year-specific probabilities of imprinting to influenza A. Reconstructions are done following the methods of Gostic et al. (2016).

By default the package calculates subtype-specific probabilities of imprinting to H1N1, H2N2, and H3N2, but it is possible to calculate other kinds of imprinting probabilities (e.g. clade-level, strain-level, vaccination) using user-provided circulation frequencies. We may consider including influenza B-specific probabilities in future updates if there is demand. Please submit an issue to let us know if you need influenza B-specific functionality.

The generated probabilities may be modified, plotted, and used in independent research. Please cite Gostic et al. (2016), and this package (citation("imprinting")).


imprinting is not yet available on CRAN, but the development version can be installed from github, using the devtools package:

# Install devtools
# Use devtools to install impritning from github



## Get imprinting probabilities
probs = get_imprinting_probabilities(observation_years = c(2000, 2022),
                             countries = c('United States', 'Brazil', 'Vietnam', 'Indonesia'))
#> # A tibble: 3,360 × 5
#>    year country       birth_year subtype imprinting_prob
#>   <dbl> <chr>              <dbl> <chr>             <dbl>
#> 1  2000 United States       2022 A/H1N1        0.297    
#> 2  2022 United States       2022 A/H1N1        0.0000297
#> 3  2000 Brazil              2022 A/H1N1        0.331    
#> 4  2022 Brazil              2022 A/H1N1        0        
#> 5  2000 Vietnam             2022 A/H1N1        0.267    
#> # … with 3,355 more rows
## Plot imprinting probabilities 
##  - up to five countries
##  - show aging of birth cohorts from first to last observation year

plot of chunk unnamed-chunk-3

## Plot imprinting probabilities
##  - select the first country year in the probs tibble by default

plot of chunk unnamed-chunk-4


There are two main kinds of data used to reconstruct birth year-specific imprinting probailities. Because there is no single epidemiological data source that documents the entire circulation history of influenza from 1918-present, we pull from various sources.

1. Cocirculation data

Cocirculation data describes the relative dominance of each influenza A subtype (H1N1, H2N2, H3N2) in circulation each year. It is informed by the following data:

  • 1918-1976 Based on the available historical data, we assume only one subtype circulated each year:

    • 1918-1956: H1N1
    • 1957-1967: H2N2
    • 1968-1976: H3N2
  • 1977-1996 Two subtypes, H1N1 and H3N2, co-circulated globally, but country-specific influenza surveillance data are not publicly available for this time period. We use data on the number of influenza specimens that tested positive for each influenza subtype in US surveillance (Table 1 of Thompson et al. JAMA, 2003) to calculate the fraction of circulation caused by each subtype.

  • 1996-present Two subtypes, H1N1 and H3N2, continue to co-circulate globally. We pull country-specific data from WHO Flu Mart() to estimate the fraction of specimens that tested positive for subtype H1N1 or H3N2. If a country reported fewer than min_samples (default 30) influenza A-positive specimens in a given year, we substitute data from the country’s WHO Region to ensure a sufficient sample size.

2. Intensity data

Children are more likely to imprint in years with high levels of influenza A circulation. We define an annual intensity score to characterize annual circulation intensity, and to scale the annual probability of primary infection. By definition, a circulation intensity of 1 indicates average levels of influenza ciruclation. We use three kinds of data to characterize annual circulation intensity:

  • For 1918-1996, we calculate annual intensity using the methods of Gostic et al. (2016). From 1918-1976, intensity scores are informed by Pneumonia and Influenza excess mortaltity estimates Housworth et al. 1974, and from 1976-1977, intensity scores are informed by the influenza A test-positive fraction reported in Table 1 of Thompson et al. JAMA, 2003.

  • For 1997-present, we calculate country or region-specific intensities using surveillance data from WHO Flu Mart. Intensity is calculated as: [fraction of processed samples positive for flu A]/[mean fraction of processed samples positive for flu A]. Country-specific data are used by default. Regional data are substituted when there are an insufficient number of country-specific specimens.


Development of this R package was supported by the NIH Centers of Excellence in Influenza Research and Response (CEIRR) Network of the National Institutes of Health under contract number 75N93021C00015. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.


To report bugs, please file an issue with a reproducible example at