Machine Learning for SR Study Selection: Cost & Efficiency Analysis

Published: 📅January 15, 2025 ⌚10 min read

Introduction

Systematic reviews are an essential part of evidence-based healthcare. They involve a comprehensive search for and synthesis of relevant literature to answer a specific research question. However, the process of conducting a systematic review can be time-consuming and resource-intensive, particularly the stage of study selection where reviewers must sift through a large number of potentially relevant citations to identify eligible studies. The increasing availability of data and computing power has made machine learning (ML) a viable approach to automate and improve the efficiency of this process1. This article explores the application of ML in systematic review study selection, focusing on its cost-effectiveness.

Research Methodology

The information presented in this article is based on a comprehensive research process that involved the following steps:

Identifying Relevant Literature: A systematic search was conducted across multiple databases, including PubMed, Google Scholar, and relevant journals, using keywords such as "machine learning," "systematic review," "study selection," and "cost-effectiveness."
Screening and Selection: The initial search yielded a large number of citations. Titles and abstracts were screened to identify studies that specifically addressed the application of ML in systematic review study selection and its cost-effectiveness.
Data Extraction: Relevant data and information were extracted from the selected studies, including details on ML models, accuracy and efficiency measures, cost analyses, and limitations and challenges.
Synthesis and Analysis: The extracted data were synthesized and analyzed to provide a comprehensive overview of the topic, highlighting key findings and insights.

Machine Learning in Systematic Review Study Selection

Systematic reviews involve a rigorous process of identifying, appraising, and synthesizing research evidence to answer a defined research question. Traditionally, study selection has been performed manually by human reviewers, which can be laborious and prone to errors, especially when dealing with large volumes of literature. ML techniques offer the potential to automate and expedite this process, improving both efficiency and accuracy2.

ML algorithms can be trained on labeled datasets of citations, where each citation is classified as either relevant or irrelevant to the review question. The algorithms learn to identify patterns and features in the text that distinguish relevant from irrelevant studies. Once trained, these models can then be used to classify new citations, assisting reviewers in identifying potentially eligible studies4. It is important to note that while ML can significantly assist in the study selection process, human expertise remains crucial for validating and interpreting the results generated by these models2.

Several ML models have been applied to study selection, including support vector machines (SVM), naive Bayes, and random forests. These models have shown promising results in terms of accuracy and efficiency, reducing the workload of human reviewers while maintaining or even improving the quality of study selection5. Studies suggest that increased reviewer agreement during the initial labeling of training data is associated with improved predictive performance of the ML models. This highlights the importance of clear inclusion/exclusion criteria and consistent decision-making by human reviewers to ensure the development of accurate and reliable ML models3.

Benefits and Advantages

The application of ML in systematic review study selection offers several potential benefits and advantages:

Increased Efficiency: ML can automate and expedite the study selection process, freeing up reviewers' time for other tasks, such as data extraction and quality assessment6. This increased efficiency can lead to faster completion of systematic reviews, enabling timely dissemination of research findings and potentially accelerating the translation of evidence into practice2.
Improved Accuracy: ML models can potentially improve the accuracy of study selection by reducing human error and identifying relevant studies that might be missed by manual screening7.
Reduced Costs: By improving efficiency and accuracy, ML can potentially reduce the overall costs associated with conducting systematic reviews8. This includes costs related to personnel time, resources, and potential errors.
Enhanced Reproducibility: ML models can provide a more objective and reproducible approach to study selection compared to manual screening, which can be subjective and prone to variations between reviewers9. This enhanced reproducibility can contribute to greater transparency and confidence in the findings of systematic reviews9.

Limitations and Challenges

Despite the potential benefits, there are limitations and challenges associated with using ML in systematic review study selection:

Data Requirements: ML models require large, labeled datasets for training, which can be time-consuming and expensive to create10. This can be a significant barrier to the implementation of ML, especially for reviews in areas where labeled data are scarce.
Bias and Generalizability: ML models can be biased by the data they are trained on, potentially leading to inaccurate classifications or limited generalizability to new datasets11. It is crucial to carefully consider the potential for bias in training data and to evaluate the generalizability of ML models to different contexts and populations.
Interpretability: Some ML models, such as deep learning models, can be complex and difficult to interpret, making it challenging to understand the reasons behind their classifications12. This lack of interpretability can raise concerns about transparency and accountability, especially in healthcare settings where understanding the basis of decisions is critical.
Ethical Considerations: The use of ML in healthcare raises ethical considerations, such as the potential for bias and discrimination, and the need for transparency and accountability13. It is essential to ensure that ML models are developed and used in a responsible and ethical manner, considering potential biases, data privacy issues, and the impact on healthcare equity14. Additionally, the difficulties in understanding the decision-making processes of deep learning models can pose challenges for ensuring fairness and accountability11.

Addressing these challenges is crucial for the responsible and effective implementation of ML in systematic reviews.

Cost-effectiveness Analysis

While the potential benefits of ML in systematic review study selection are evident, it is essential to consider the cost-effectiveness of implementing these technologies. Cost-effectiveness analyses evaluate the balance between the costs of an intervention and its outcomes, often expressed as a ratio of cost per unit of outcome gained.

Data Acquisition Costs

Acquiring the necessary data for training ML models can be a significant cost factor. This includes the costs of accessing and collecting relevant data, as well as the costs of cleaning, processing, and labeling the data15.

Model Training Costs

Training ML models involves computational resources, software, and expertise, all of which contribute to the overall cost. The complexity of the model and the size of the dataset can influence the training time and associated costs15.

Infrastructure Costs

Implementing ML for study selection may require investments in infrastructure, such as servers, storage, and software tools, to support the computational demands of ML models15.

Several studies have investigated the cost-effectiveness of ML for study selection. One study found that using ML for screening titles and abstracts in systematic reviews of quality improvement studies was more cost-effective than traditional manual screening16. This study specifically analyzed the cost-effectiveness of semi-automated workflows incorporating machine learning for updating living maps of research, demonstrating the potential for ML to improve efficiency and reduce costs in real-world applications16. Another study found that a machine learning-based risk prediction model for lung cancer screening was cost-effective compared to screening the entire population17.

The cost-effectiveness of ML in systematic review study selection can be attributed to several factors:

Reduced Reviewer Time: ML algorithms can significantly reduce the time reviewers spend on screening citations, allowing them to focus on other critical tasks such as data extraction and quality assessment2.
Faster Review Completion: By accelerating the study selection process, ML can contribute to faster completion of systematic reviews, enabling timely dissemination of research findings4. This faster dissemination of knowledge can potentially lead to quicker updates of clinical guidelines and ultimately faster healthcare decisions, potentially improving patient outcomes2.

Machine Learning Models for Study Selection

Different ML models can be employed for study selection in systematic reviews. The choice of ML model depends on various factors, including the characteristics of the dataset, the research question, and the available resources18. Some of the commonly used models can be categorized as follows:

Model Type

Description

Example Algorithms

Classification Models

These models are used to categorize citations as relevant or irrelevant based on their features.

Support Vector Machines (SVM), Naive Bayes, Logistic Regression

Clustering Models

These models group similar citations together, which can help reviewers identify relevant studies more efficiently.

K-means clustering

Accuracy and Efficiency of Machine Learning Models

The accuracy and efficiency of ML models for study selection have been evaluated in various studies. These studies have generally shown that ML models can achieve high accuracy in classifying citations, comparable to or even exceeding that of human reviewers in some cases19. However, it's important to acknowledge that the accuracy of ML models can be influenced by factors such as the quality of the training data, the choice of algorithm, and the specific characteristics of the dataset19.

For example, one study found that decision trees and ensemble techniques achieved 98.7% accuracy in classifying breast cancer tumors20. Another study found that ML models improved the efficiency of detecting high-quality clinical research publications by approximately 25%5.

The efficiency of ML models can be further enhanced by techniques such as feature selection, which involves selecting the most relevant features for classification, and by addressing class imbalance, where one class (e.g., relevant citations) is significantly smaller than the other21.

Conclusion

ML has the potential to revolutionize the process of systematic review study selection by offering significant advantages in terms of efficiency, accuracy, and cost-effectiveness. While challenges and limitations exist, ongoing research and development are actively addressing these issues, paving the way for wider adoption of ML in evidence-based healthcare. As ML technologies continue to evolve, they are likely to play an increasingly important role in ensuring that systematic reviews are conducted efficiently and effectively. This can lead to faster dissemination of research findings, more accurate and reliable reviews, and ultimately, better healthcare decisions. Future research should focus on developing more robust and interpretable ML models, addressing ethical considerations, and evaluating the long-term impact of ML on the quality and efficiency of systematic reviews.

Works cited

1. Full article: Systematic reviews of machine learning in healthcare: a ..., accessed on January 15, 2025, https://www.tandfonline.com/doi/full/10.1080/14737167.2023.2279107

2. Machine Learning Methods for Systematic Reviews: A Rapid ..., accessed on January 15, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC10759980/

3. effectivehealthcare.ahrq.gov, accessed on January 15, 2025, https://effectivehealthcare.ahrq.gov/sites/default/files/pdf/machine-learning-quality_research.pdf

4. Automation of Article Selection Process ... - JMIR Research Protocols, accessed on January 15, 2025, https://www.researchprotocols.org/2021/6/e26448/

5. www.medrxiv.org, accessed on January 15, 2025, https://www.medrxiv.org/content/10.1101/2023.06.18.23291567v1.full.pdf

6. Contributions of Machine Learning Models towards Student ... - MDPI, accessed on January 15, 2025, https://www.mdpi.com/2076-3417/11/21/10007

7. Machine Learning | Types | Benefits - Adservio, accessed on January 15, 2025, https://www.adservio.fr/post/machine-learning-types-benefits

8. The benefits of an artificial intelligence and machine learning model ..., accessed on January 15, 2025, https://www.bakertilly.com/insights/modeling-the-benefits-of-an-artificial-intelligence

9. What Is Machine Learning (ML)? | IBM, accessed on January 15, 2025, https://www.ibm.com/think/topics/machine-learning

10. Challenges and Limitations of Machine Learning: What to Consider ..., accessed on January 15, 2025, https://medium.com/@shruti2402devshatwar/challenges-and-limitations-of-machine-learning-what-to-consider-before-implementation-d2c0af137647

11. 30 Major Machine Learning Limitations, Challenges & Risks, accessed on January 15, 2025, https://onix-systems.com/blog/limitations-of-machine-learning-algorithms

12. 10 Limitations of Machine Learning - Holistic SEO, accessed on January 15, 2025, https://www.holisticseo.digital/ai/machine-learning/limitation/

13. (PDF) Machine Learning, Its Limitations, and Solutions Over IT - ResearchGate, accessed on January 15, 2025, https://www.researchgate.net/publication/344784989_Machine_Learning_Its_Limitations_and_Solutions_Over_IT

14. The Opportunities and Challenges of Machine Learning in ..., accessed on January 15, 2025, https://pubsonline.informs.org/do/10.1287/LYTX.2024.02.08/full/

15. Machine Learning (ML) Costs: Price Factors and Real-World ..., accessed on January 15, 2025, https://itrexgroup.com/blog/machine-learning-costs-price-factors-and-estimates/

16. Cost-effectiveness of Microsoft Academic Graph with machine ..., accessed on January 15, 2025, https://wellcomeopenresearch.org/articles/6-210

17. Cost-effectiveness of a machine learning risk prediction model ..., accessed on January 15, 2025, https://pubmed.ncbi.nlm.nih.gov/39697091/

18. Model Selection for Machine Learning - ScholarHat, accessed on January 15, 2025, https://www.scholarhat.com/tutorial/machinelearning/model-selection-for-machine-learning

19. The Importance of Accuracy in Machine Learning: A ... - Artsyl, accessed on January 15, 2025, https://www.artsyltech.com/blog/Accuracy-In-Machine-Learning

20. Accuracy Assessment of Machine Learning Algorithms Used to Predict Breast Cancer, accessed on January 15, 2025, https://www.mdpi.com/2306-5729/8/2/35

21. [2303.14762] Approaches to Improving the Accuracy of Machine Learning Models in Requirements Elicitation Techniques Selection - arXiv, accessed on January 15, 2025, https://arxiv.org/abs/2303.14762