Machine Learning for Study Selection: Algorithm Performance Compared

Systematic reviews are a cornerstone of evidence-based medicine, providing a comprehensive and unbiased synthesis of research findings on a specific topic. However, the process of conducting a systematic review can be time-consuming and labor-intensive, particularly the stage of study selection where reviewers must sift through thousands of potentially relevant articles to identify those that meet the inclusion criteria for the review. This can be prone to errors and inconsistencies, especially when dealing with a large number of citations1. In recent years, machine learning (ML) has emerged as a promising tool to automate and expedite the study selection process in systematic reviews. This article explores the applications of ML in study selection, compares the performance of different ML algorithms, and discusses the challenges and benefits associated with their use.

Research Methods

The information presented in this article is based on a comprehensive review of research papers and articles on the application of machine learning in study selection for systematic reviews. The research process involved the following steps:

Literature Search: A systematic search was conducted across multiple databases, including PubMed, Scopus, and Web of Science, to identify relevant publications.
Grey Literature Search: A search of grey literature sources, such as conference proceedings and pre-print servers, was performed to identify additional relevant studies.
Hand Searching: Reference lists of included articles were hand-searched to identify any potentially relevant publications that may have been missed in the database searches.

This multi-faceted approach ensured a comprehensive and thorough gathering of information on the topic.

Machine Learning for Study Selection in Systematic Reviews

Traditionally, study selection in systematic reviews involves a manual process where reviewers independently screen titles and abstracts of potentially relevant articles. ML offers a potential solution by automating the screening process and improving the efficiency and accuracy of study selection.

ML algorithms can be trained on a set of labeled data, where each article is classified as either relevant or irrelevant to the review. The algorithm learns to identify patterns and features in the text that distinguish relevant articles from irrelevant ones. Once trained, the ML model can be used to predict the relevance of new, unseen articles, thereby assisting reviewers in the study selection process.

AI-powered Tools for Study Selection

One notable example of an AI-powered tool for study selection is "ASReview." 2 This open-source tool utilizes ML algorithms to prioritize articles for review, potentially leading to significant time savings and improved efficiency in the study selection process. ASReview has been shown to be effective in conducting transparent and reliable systematic reviews, particularly in accelerating the literature selection process through the title and abstract screening phase.

Commonly Used Machine Learning Algorithms

Several ML algorithms have been applied to study selection in systematic reviews. The choice of ML model is influenced by the considered datasets and problem domain4. Some of the most commonly used algorithms include:

Naive Bayes: This algorithm, based on Bayes' theorem, is known for its simplicity 5 and efficiency6. However, it assumes that the features used to classify articles are independent of each other, which can lead to inaccuracies when features are correlated6.
Support Vector Machines (SVMs): SVMs aim to find the optimal hyperplane that separates relevant articles from irrelevant ones. They are effective in handling high-dimensional data and are less prone to overfitting compared to other algorithms8.
Random Forest: This algorithm constructs a multitude of decision trees during training and outputs the class that is the mode of the classes (classification) of the individual trees. Random Forest is robust to noise and outliers and can handle high-dimensional datasets effectively10.

For a detailed comparison of the strengths and weaknesses of these algorithms, refer to the table presented in the "Strengths and Weaknesses of Different Algorithms" section.

Evaluation Metrics for Machine Learning Algorithms

To assess the performance of ML algorithms in study selection, various evaluation metrics are used. These metrics provide insights into how well the model identifies relevant articles. Some commonly used evaluation metrics include:

Precision: This metric measures the proportion of correctly identified relevant articles among all articles predicted as relevant. In other words, it tells us how many of the articles that the model identified as relevant were actually relevant11.
Recall: This metric measures the proportion of correctly identified relevant articles among all actual relevant articles. It tells us how many of the truly relevant articles the model was able to identify12.
F1-score: This metric combines precision and recall into a single score, providing a balanced measure of the model's performance. It is useful when you need to consider both precision and recall13.
Accuracy: This metric measures the overall correctness of the model in classifying articles as relevant or irrelevant. It gives us an overall sense of how well the model is performing14.

Comparing the Performance of Different Algorithms

Studies comparing the performance of different ML algorithms for study selection have reported varying results. The performance of an algorithm can depend on factors such as the specific dataset used, the characteristics of the review topic, and the choice of features used to train the model15.

One study found that neural networks were the most common modeling approach, followed by Support Vector Machines and Random Forest/Decision Trees15. Another study indicated that models trained after downsampling achieved the best results consistently among all algorithms16. Additionally, research has shown that these algorithms can also be compared based on their confusion score, which provides insights into the types of errors made by the model17. However, it is important to note that no single algorithm consistently outperforms others across all scenarios. The choice of algorithm should be based on the specific requirements of the systematic review and the characteristics of the data.

Machine Learning for Data Extraction

In addition to study selection, machine learning can also be applied to the data extraction stage of systematic reviews1. This involves automatically extracting relevant data from the included articles, such as study characteristics, participant demographics, and outcome measures. By automating this process, ML can further reduce the workload of reviewers and potentially improve the accuracy and efficiency of data extraction.

Challenges and Limitations of Using Machine Learning

While ML offers significant potential for improving study selection in systematic reviews, there are also challenges and limitations associated with its use. Some of the key challenges include:

Data dependency: ML models are heavily reliant on the quality and quantity of training data. If the training data is biased or incomplete, the model may not generalize well to new articles18.
Black box nature: Many ML algorithms are often considered "black boxes" as it can be difficult to understand how they arrive at their predictions. This lack of transparency can be a concern in systematic reviews where it is essential to understand the reasoning behind the inclusion or exclusion of articles19.
Generalizability: ML models may not generalize well to different review topics or datasets. A model trained on one specific topic may not perform well on another topic due to differences in terminology, study designs, or other relevant factors4.
Need for expertise: Implementing and evaluating ML algorithms requires specialized knowledge and skills. Reviewers may need training and support to effectively use ML tools for study selection3.
Impact of reviewer disagreements: The predictive performance of machine learning models can be affected by disagreements between reviewers during the initial data labeling process. High levels of reviewer agreement are essential for optimal model performance8.
Technical Debt (TD): ML systems can accumulate technical debt, which refers to the implied cost of additional rework caused by choosing an easy solution now instead of using a better approach that would take longer. This can lead to ongoing maintenance costs and challenges in updating and adapting ML models over time18.
Inadequate reporting: Inadequate reporting of data sources, study design, and modeling processes can create barriers to the use of ML prediction models in clinical practice. Transparent and comprehensive reporting is crucial for the wider adoption and validation of ML models20.

Potential Benefits and Advantages

Despite the challenges, ML offers several potential benefits and advantages for study selection in systematic reviews:

Increased efficiency: ML can significantly reduce the time and effort required for study selection, allowing reviewers to focus on other critical aspects of the review process1. For instance, one study demonstrated a potential reduction in citation screening time by 36.1%8.
Improved accuracy: ML algorithms can potentially improve the accuracy of study selection by reducing human error and bias8.
Reduced workload: By automating the screening process, ML can alleviate the workload of reviewers, particularly when dealing with large numbers of citations3.
Enhanced consistency: ML can help ensure consistency in the application of inclusion and exclusion criteria across different reviewers21.
User-friendliness: The adoption of ML in systematic reviews is heavily influenced by the user-friendliness of the tools. Tools that are easy to use and require minimal technical expertise are more likely to be adopted by reviewers1.

Strengths and Weaknesses of Different Algorithms

| Algorithm | Strengths |

Works cited

1. Machine Learning Methods for Systematic Reviews: A Rapid Scoping Review - PMC, accessed on January 15, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC10759980/

2. AI in action: The role of machine learning in systematic reviews, data organisation and management for medical researchers - Alzyood Public Health, accessed on January 15, 2025, https://alzyoodpublichealth.org/2024/06/04/ai-in-action-the-role-of-machine-learning-in-systematic-reviews-data-organisation-and-management-for-medical-researchers/

3. Artificial intelligence in systematic reviews: promising when appropriately used - BMJ Open, accessed on January 15, 2025, https://bmjopen.bmj.com/content/13/7/e072254

4. Systematic Literature Review on Machine Learning and Student Performance Prediction: Critical Gaps and Possible Remedies - MDPI, accessed on January 15, 2025, https://www.mdpi.com/2076-3417/11/22/10907

5. www.simplilearn.com, accessed on January 15, 2025, https://www.simplilearn.com/tutorials/machine-learning-tutorial/naive-bayes-classifier

6. Naive Bayes Classifiers: Types and Use Cases - Keylabs, accessed on January 15, 2025, https://keylabs.ai/blog/naive-bayes-classifiers-types-and-use-cases/

7. Naive Bayes Classifier Explained With Practical Problems - Analytics Vidhya, accessed on January 15, 2025, https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/

8. effectivehealthcare.ahrq.gov, accessed on January 15, 2025, https://effectivehealthcare.ahrq.gov/sites/default/files/pdf/machine-learning-quality_research.pdf

9. support vector machines (SVMs) - IBM, accessed on January 15, 2025, https://www.ibm.com/think/topics/support-vector-machine

10. Advantages and Disadvantages of Random Forest - Pickl.AI, accessed on January 15, 2025, https://www.pickl.ai/blog/advantages-and-disadvantages-random-forest/

11. 12 Important Model Evaluation Metrics for Machine Learning (2025) - Analytics Vidhya, accessed on January 15, 2025, https://www.analyticsvidhya.com/blog/2019/08/11-important-model-evaluation-error-metrics/

12. Metrics to Evaluate your Machine Learning Algorithm - Towards Data Science, accessed on January 15, 2025, https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234

13. Evaluating machine learning models-metrics and techniques - AI Accelerator Institute, accessed on January 15, 2025, https://www.aiacceleratorinstitute.com/evaluating-machine-learning-models-metrics-and-techniques/

14. What are the metrics to evaluate a machine learning algorithm - Stack Overflow, accessed on January 15, 2025, https://stackoverflow.com/questions/21092188/what-are-the-metrics-to-evaluate-a-machine-learning-algorithm

15. Full article: Systematic reviews of machine learning in healthcare: a literature review, accessed on January 15, 2025, https://www.tandfonline.com/doi/full/10.1080/14737167.2023.2279107

16. Machine learning enables automated screening for systematic reviews and meta-analysis in urology - PMC, accessed on January 15, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC11236840/

17. Comparison of Machine Learning algorithms on detecting the confusion of students while watching MOOCs - DiVA portal, accessed on January 15, 2025, https://www.diva-portal.org/smash/get/diva2:1641701/FULLTEXT02.pdf

18. Maintainability Challenges in ML: A Systematic Literature Review - arXiv, accessed on January 15, 2025, https://arxiv.org/pdf/2408.09196

19. The Challenges of Machine Learning: A Critical Review - MDPI, accessed on January 15, 2025, https://www.mdpi.com/2079-9292/13/2/416

20. Protocol for a systematic review on the methodological and reporting quality of prediction model studies using machine learning techniques | BMJ Open, accessed on January 15, 2025, https://bmjopen.bmj.com/content/10/11/e038832

21. The effect of machine learning tools for evidence synthesis on resource use and time-to-completion: protocol for a retrospective pilot study, accessed on January 15, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC9843684/

Back to Tutorials