When validating a risk model in an independent cohort, some predictors may be missing for some subjects. Missingness can be unplanned or by design, as in case-cohort or nested case–control studies, in which some covariates are measured only in subsampled subjects. Weighting methods and imputation are used to handle missing data. We propose methods to increase the efficiency of weighting to assess calibration of a risk model (i.e. bias in model predictions), which is quantified by the ratio of the number of observed events, |$\mathcal{O}$|⁠, to expected events, |$\mathcal{E}$|⁠, computed from the model. We adjust known inverse probability weights by incorporating auxiliary information available for all cohort members. We use survey calibration that requires the weighted sum of the auxiliary statistics in the complete data subset to equal their sum in the full cohort. We show that a pseudo-risk estimate that approximates the actual risk value but uses only variables available for the entire cohort is an excellent auxiliary statistic to estimate |$\mathcal{E}$|⁠. We derive analytic variance formulas for |$\mathcal{O}/\mathcal{E}$| with adjusted weights. In simulations, weight adjustment with pseudo-risk was much more efficient than inverse probability weighting and yielded consistent estimates even when the pseudo-risk was a poor approximation. Multiple imputation was often efficient but yielded biased estimates when the imputation model was misspecified. Using these methods, we assessed calibration of an absolute risk model for second primary thyroid cancer in an independent cohort.

This work is written by (a) US Government employee(s) and is in the public domain in the US.