Methods of Handling Missing Data in Network Meta-Analyses: A Comparison of Multiple Imputation Techniques

Network meta-analysis (NMA) is a powerful statistical technique that synthesizes evidence from multiple studies comparing different interventions. However, missing data is a common challenge in NMAs, potentially leading to biased estimates, reduced precision, and misleading conclusions if not addressed appropriately 1. Multiple imputation (MI) has gained prominence as a robust method for handling missing data in various statistical analyses, including NMAs. MI involves generating multiple plausible replacements for missing values, creating multiple complete datasets. These datasets are then analyzed separately, and the results are pooled to provide estimates that incorporate the uncertainty associated with the missing data 2. This article offers a comprehensive overview of MI techniques employed in NMAs, comparing their strengths and weaknesses, and discussing factors to consider when selecting the most suitable technique.

Research Methodology

This article is based on a comprehensive research process that involved the following steps:

  1. Identifying Relevant Literature: A thorough search was conducted to identify research papers, articles, and studies on multiple imputation techniques used in network meta-analyses. This included searching academic databases, online repositories, and relevant websites.
  2. Gathering Information on Missing Data Handling: Information was collected on different methods of handling missing data in network meta-analyses, including complete-case analysis, single imputation methods, and multiple imputation techniques.
  3. Comparing Multiple Imputation Techniques: Studies comparing the performance of different multiple imputation techniques in network meta-analyses were reviewed and analyzed.
  4. Evaluating Strengths and Weaknesses: Information on the strengths and weaknesses of each multiple imputation technique was gathered, considering factors such as flexibility, efficiency, and computational complexity.
  5. Assessing Impact on Results: Studies investigating the impact of different multiple imputation techniques on the results of network meta-analyses were examined to understand the potential influence on treatment effects, precision, and overall conclusions.
  6. Identifying Decision-Making Factors: Information was gathered on the factors to consider when choosing a multiple imputation technique for network meta-analyses, including the type of missing data, complexity of the data, and available resources.

The findings from these research steps are synthesized and presented in this article to provide a comprehensive overview of multiple imputation techniques for handling missing data in network meta-analyses.

Methods of Handling Missing Data in Network Meta-Analyses

Network meta-analyses frequently encounter missing data, which can arise from various sources, such as participant dropout, incomplete data collection, or study-level exclusions. The presence of missing data can introduce bias and uncertainty into the analysis, potentially leading to inaccurate conclusions. Several methods exist for handling missing data in NMAs, each with its own assumptions and limitations.

The simplest approach is a complete-case analysis, which excludes individuals with any missing data 3. While straightforward, this method can result in a substantial loss of information and biased results, especially if the missing data is not missing completely at random (MCAR). When data are not MCAR, the observed data may not accurately represent the entire population, leading to biased estimates.

More sophisticated approaches include single imputation methods, where missing values are replaced with a single estimate, such as the mean, median, or a value predicted from a regression model 4. However, single imputation methods fail to account for the uncertainty associated with the imputed values, leading to underestimation of standard errors and potentially misleading inferences.

Multiple imputation (MI) addresses this limitation by generating multiple imputed datasets, allowing for the incorporation of imputation uncertainty into the final estimates. MI assumes that the missing data is missing at random (MAR), meaning that the probability of missingness depends on the observed data but not on the missing data itself. While the MAR assumption cannot be definitively tested, it is crucial to consider the plausibility of this assumption and conduct sensitivity analyses to assess the robustness of the results to potential departures from MAR 5.

In addition to MI, pattern-mixture models offer another approach to handle missing outcome data in NMAs 6. Pattern-mixture models explicitly model the distribution of the outcome variable, conditional on the missing data pattern. This approach allows for exploring the potential impact of different missingness mechanisms on the results.

The following table, adapted from 5, summarizes different methods for handling missing data in meta-analysis, including their descriptions and assumptions about missing outcome data:

Method

Description

Assumptions About Missing Outcome Data

Available case analysis

Ignores missing participants

MAR

Impute failure

Imputes missing values as failures

Always failures

Worst (best)-case scenario

Imputes failures in the treatment arm and successes in the control (or vice versa)

Always failures or always successes, depending on arm

Last observation carried forward

Imputes missing values with the participants' last observation

The missing value for a participant has the same mean as the last observed value

Single imputation

Imputes missing values, usually borrowing information from observed outcomes (not necessarily from the same arm or study)

Missing values equal a prespecified value without uncertainty

Multiple imputation

Builds a model to predict missing outcome from the participants' observed outcome, and adds appropriate random error

MAR

Likelihood methods

Fits a model to the observed data

MAR

Likelihood methods

Fits a model to the observed data and the probability of being missing

MNAR

Pattern mixture model

Builds a model for the outcome conditional on whether it is missing or not and a model for the missingness mechanism

Addresses departures from the MAR assumption (MNAR)

Multiple Imputation Techniques in Network Meta-Analyses

Multiple imputation techniques provide a statistically sound approach to address missing data in NMAs by generating multiple plausible replacements for missing values. This accounts for the uncertainty associated with the missing data and leads to more valid inferences. Various MI techniques can be employed in NMAs, each with its own strengths and weaknesses.

Traditional Statistical Methods

  • Multiple Imputation by Chained Equations (MICE): This widely used approach imputes missing values sequentially for each variable, using a series of regression models 7. MICE is flexible and can handle different types of variables (continuous, categorical, etc.). It is particularly useful when the relationships between variables are complex or not well-defined.
  • Joint Modeling Multiple Imputation: This technique involves specifying a joint model for the observed and missing data, assuming a multivariate distribution 2. Joint modeling can be more efficient than MICE when the relationships between variables are well-defined, as it leverages the correlation structure between variables to improve imputation accuracy.

Advanced Techniques

  • Predictive Mean Matching: This method imputes missing values by identifying observed values that are similar to the predicted values for the missing data 8. Predictive mean matching is particularly useful for continuous variables and can preserve the distribution of the observed data. It is less prone to generating implausible values compared to some other methods.
  • Meta-Analysis with Within-Site Multiple Imputation: This approach combines meta-analysis with within-site multiple imputation to estimate the average causal effect in multi-site studies with missing data 9. This method allows for handling missing data without the need for pooling individual-level data across sites, which can be advantageous in situations where data sharing is restricted.

Machine Learning-Based Imputation

Recent advances in machine learning have led to the development of MI techniques based on algorithms like k-Nearest Neighbors (KNN) and Generative Adversarial Networks (GANs) 10. These methods can capture complex relationships in the data and may outperform traditional statistical methods in certain situations, particularly when dealing with high-dimensional data or complex missingness patterns. Deep learning approaches, such as variational auto-encoders (VAEs) and GANs, have shown promise in capturing the underlying distribution of the data and providing robust imputations 11. These methods leverage the power of neural networks to learn complex patterns and relationships in the data, potentially leading to more accurate imputations.

Furthermore, integrative imputation techniques, which leverage information across multiple omics datasets, are expected to outperform single-omics approaches 12. By considering multiple data sources, these methods can capture a more comprehensive picture of the underlying biological processes and improve imputation accuracy.

Comparing and Assessing the Impact of Multiple Imputation Techniques

Studies comparing different MI techniques in NMAs have found that the choice of method can impact the results, particularly when the proportion of missing data is high or the missingness mechanism is complex 10. A study by Miragenews.com found that machine learning techniques, particularly GAN-based methods, outperformed traditional statistical approaches like MICE in handling missing data in both longitudinal and cross-sectional datasets 10. However, a study published on PubMed.ncbi.nlm.nih.gov found that MICE and joint model imputation produced similar results in a multilevel setting, even though they are theoretically different 14. These findings suggest that the best MI technique depends on the specific characteristics of the data and the NMA being conducted.

The choice of MI technique can influence the results of NMAs, affecting the estimated treatment effects, their precision, and the overall conclusions. A study published on PubMed.ncbi.nlm.nih.gov investigated the impact of MI on the relative effectiveness of treatments in psychiatric trials 15. The study found that the choice of imputation method affected the heterogeneity estimates and, in some cases, the ranking of treatments. Another study on ResearchGate.net examined the impact of different MI techniques on meta-regression analyses 16. The authors found that complete-case analysis led to biased estimates, while MI provided unbiased estimates with minimal loss of precision. These studies highlight the importance of carefully considering the potential impact of MI on NMA results.

Strengths and Weaknesses of Multiple Imputation Techniques

Each MI technique has its own strengths and weaknesses, which should be carefully considered when selecting a method for a specific NMA.

MICE is a versatile and widely applicable method, but it may be less efficient than joint modeling when strong relationships exist between variables 1. Joint modeling can be more powerful but requires careful specification of the joint distribution. Predictive mean matching is effective for preserving the distribution of continuous variables but may not be suitable for categorical variables. Machine learning-based methods can capture complex patterns but may be more computationally intensive and require larger datasets.

Method

Strengths

Weaknesses

MICE

Flexible, handles different variable types, widely applicable

May be less efficient than joint modeling when strong relationships exist between variables

Joint Modeling

More efficient when relationships are well-defined, leverages correlation structure for improved accuracy

Requires careful specification of the joint distribution

Predictive Mean Matching

Preserves distribution of continuous variables, less prone to generating implausible values

May not be suitable for categorical variables

Machine Learning-Based

Captures complex patterns, potentially more accurate for high-dimensional data

Computationally intensive, may require larger datasets

Factors to Consider When Choosing a Multiple Imputation Technique

Selecting the most appropriate MI technique for an NMA requires careful consideration of several factors:

Factor

Description

Type of missing data

The nature of the missing data (continuous, categorical, etc.) and the missingness mechanism (MCAR, MAR, MNAR) can influence the choice of imputation method 17. Continuous data may be better suited for methods like predictive mean matching, while categorical data may require different approaches. Understanding the missingness mechanism is crucial for selecting a method that aligns with the assumptions about the missing data.

Complexity of the data

The relationships between variables and the presence of multilevel structures should be considered when selecting an imputation model 18. Joint modeling may be preferred when strong relationships exist between variables, while MICE may be more suitable for complex data with less well-defined relationships.

Software availability

Different MI techniques are implemented in various software packages, and the availability of suitable software may influence the choice of method 7. Researchers should ensure that the chosen method is readily available in the software they are using for the NMA.

Computational resources

Some MI techniques, particularly those based on machine learning, can be computationally intensive, and the available computational resources may be a factor in the decision-making process 19. Researchers should consider the computational demands of different methods and choose a technique that is feasible given their available resources.

Conclusion

Multiple imputation is a valuable tool for handling missing data in network meta-analyses, offering a statistically sound approach to address the challenges posed by missingness. However, the choice of imputation technique can influence the results, and careful consideration should be given to the type of missing data, the complexity of the data, and the available resources. Researchers should carefully evaluate the strengths and weaknesses of different MI techniques and select a method that aligns with the specific characteristics of their data and the research question.

While traditional statistical methods like MICE and joint modeling have been widely used, recent advances in machine learning have led to the development of promising new MI techniques. These methods, particularly those based on deep learning, have the potential to capture complex relationships in the data and improve imputation accuracy. Furthermore, integrative imputation techniques that leverage information across multiple datasets offer another avenue for enhancing the handling of missing data in NMAs.

Future research should focus on further developing and evaluating new MI techniques, particularly those based on machine learning and integrative approaches, to further improve the accuracy and efficiency of NMAs in the presence of missing data. This will contribute to more robust and reliable evidence synthesis in healthcare and other fields.

Works cited

1. A comparison of existing methods for multiple imputation in ..., accessed on January 16, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC5582994/

2. Missing data - IPD Meta-Analysis, accessed on January 16, 2025, https://www.ipdma.co.uk/missing-data

3. www.cmu.edu, accessed on January 16, 2025, https://www.cmu.edu/joss/content/articles/volume10/huisman.pdf

4. 16.1.2 General principles for dealing with missing data, accessed on January 16, 2025, https://handbook-5-1.cochrane.org/chapter_16/16_1_2_general_principles_for_dealing_with_missing_data.htm

5. Dealing with missing outcome data in meta‐analysis - PMC, accessed on January 16, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC7003862/

6. Continuous(ly) missing outcome data in network meta-analysis: A ..., accessed on January 16, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC8209314/

7. Multiple Imputation with the mice and metafor Packages [The ..., accessed on January 16, 2025, https://www.metafor-project.org/doku.php/tips:multiple_imputation_with_mice_and_metafor

8. How should I determine what imputation method to use? - Cross Validated, accessed on January 16, 2025, https://stats.stackexchange.com/questions/541337/how-should-i-determine-what-imputation-method-to-use

9. Combining meta-analysis with multiple imputation for one-step ..., accessed on January 16, 2025, https://pubmed.ncbi.nlm.nih.gov/37527843/

10. AI Beats Stats in Tackling Missing Health Data | Mirage News, accessed on January 16, 2025, https://www.miragenews.com/ai-beats-stats-in-tackling-missing-health-data-1391435/

11. A Comparative Study on Imputation Techniques: Introducing a Transformer Model for Robust and Efficient Handling of Missing EEG Amplitude Data - MDPI, accessed on January 16, 2025, https://www.mdpi.com/2306-5354/11/8/740

12. A Review of Integrative Imputation for Multi-Omics Datasets - Frontiers, accessed on January 16, 2025, https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2020.570255/full

13. A Benchmark for Data Imputation Methods - Frontiers, accessed on January 16, 2025, https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2021.693674/full

14. A Comparison of Existing Methods for Multiple Imputation in ..., accessed on January 16, 2025, https://pubmed.ncbi.nlm.nih.gov/28695667/

15. Evaluating the impact of imputations for missing participant outcome ..., accessed on January 16, 2025, https://pubmed.ncbi.nlm.nih.gov/23321265/

16. Using multiple imputation to estimate missing data in meta-regression - ResearchGate, accessed on January 16, 2025, https://www.researchgate.net/publication/269041723_Using_multiple_imputation_to_estimate_missing_data_in_meta-regression

17. Effective Strategies to Handle Missing Values in Data Analysis, accessed on January 16, 2025, https://www.analyticsvidhya.com/blog/2021/10/handling-missing-value/

18. 6.1 Overview of modeling choices - Stef van Buuren, accessed on January 16, 2025, https://stefvanbuuren.name/fimd/sec-choices.html

19. Missing Data and Multiple Imputation | Columbia University Mailman ..., accessed on January 16, 2025, https://www.publichealth.columbia.edu/research/population-health-methods/missing-data-and-multiple-imputation