The Challenge of Missing Data
Missing data is a pervasive issue in research, particularly systematic reviews1. It arises when expected information is unavailable for analysis, due to factors like participant dropout, data loss, or incomplete reporting2. The presence of missing data can severely compromise the validity and reliability of systematic review findings3. Improperly addressed missing data can skew statistical analyses, introduce bias, and lead to flawed conclusions3. Understanding the implications and employing appropriate handling methods is therefore essential.
Types of Missing Data
Missing data is categorized based on the reasons for its absence. A key distinction is made between data "missing at random" (MAR) and "not missing at random" (NMAR)5. Even with plausible assumptions, missing data can significantly impact meta-analyses. One study found that 6% to 22% of meta-analyses crossed the null effect threshold due to missing data, sometimes even reversing the effect's direction6.
Data missing at random (MAR) occurs when missingness correlates with observed data but not the missing data itself. For instance, trial participants experiencing side effects might drop out, causing missing outcome data. If missingness relates to observed side effects, not the missing outcome, the data is considered MAR5.
Data not missing at random (NMAR) occurs when missingness relates to the missing data itself. For example, if relapse in a depression study leads to missed follow-up and missing outcome data, the missingness is directly related to the outcome (relapse), making it NMAR5.
Understanding the type of missing data is crucial for selecting appropriate handling methods.
Missing Data Mechanisms
Three primary mechanisms explain how data can be missing, defining the relationship between missing data and underlying values:
Missing completely at random (MCAR): The probability of missingness is entirely independent of observed and unobserved data. Randomly lost questionnaires exemplify MCAR.
Missing at random (MAR): As previously explained, MAR occurs when the probability of missingness depends on observed, not missing, data.
Missing not at random (MNAR): As previously described, MNAR occurs when missingness is related to the missing values themselves5.
These mechanisms are crucial for selecting appropriate handling methods and interpreting analysis results.
Methods for Handling Missing Data
Several methods can handle missing data, but minimizing it during study design is most effective. This can be achieved by limiting data collection, minimizing follow-up visits, and using user-friendly forms5. When missing data is unavoidable:
1. Analyzing Available Data
This common approach, often the default in statistical software7, analyzes only available data, ignoring missing data. However, it can bias estimates if missing data are not MCAR7.
2. Imputation
Imputation replaces missing values with estimated values. Choosing the right method significantly influences conclusions8. Techniques include:
Mean imputation: Replacing missing values with the mean of observed values. This simple method can underestimate variance and bias results9.
Last observation carried forward: Carrying the last observed value forward. Often used in longitudinal studies, it's unsuitable for non-monotonic data5.
Regression imputation: Predicting missing values using a regression model based on other variables. This can be more accurate but requires careful model specification9.
Multiple imputation: Creating multiple datasets with different plausible missing values and combining results. This robust method accounts for imputation uncertainty10.
The choice of imputation method depends on the data, reasons for missingness, and research question.
3. Sensitivity Analysis
Sensitivity analysis assesses how different missing data assumptions affect results11. Missing data can reduce statistical power, and sensitivity analysis helps identify the extent of this impact12. Analyzing data assuming all missing values represent poor outcomes, then repeating the analysis assuming good outcomes, helps understand the potential impact11.
Reporting Missing Data
Transparent reporting of missing data is crucial13. Authors should detail the extent and patterns of missing data, including the amount missing for each outcome, reasons for missingness, and handling methods13. Studies shouldn't be excluded solely for missing summary data; instead, include them and discuss potential implications14. For comprehensive reporting, follow these steps15:
Plan the analysis, document missing data, and specify imputation and sensitivity analysis.
Considerations for Healthcare Reviews
Missing data is prevalent in healthcare research due to patient dropout, loss to follow-up, and incomplete records16, sometimes reaching 60% prevalence17. In healthcare intervention reviews, missing data's impact on treatment effect estimates is crucial1. If patients with adverse events have more missing data, ignoring it could overestimate treatment effectiveness1.
A study examined missing data handling in healthcare systematic reviews18.
Conclusion
Missing data is an unavoidable challenge in systematic reviews. Understanding the types of missing data, using appropriate handling methods, and transparent reporting can minimize its impact on validity and reliability. Different mechanisms like MCAR, MAR, and MNAR introduce unique challenges requiring careful handling method selection. Sensitivity analysis assesses result robustness to different missing data assumptions. Standardized reporting guidelines ensure transparency and aid interpretation. In healthcare reviews, high missing data prevalence necessitates careful attention to its potential impact on treatment effect estimates. Minimizing missing data during study design and using robust handling and reporting methods are crucial for accurate and reliable evidence in healthcare decision-making.