AI News, Machine Learning to Assess the Scientific Soundness of Medical Papers

Machine Learning to Assess the Scientific Soundness of Medical Papers

More and more papers are published every day, at an astonishing rate—estimated at about 7000 per day.[1] And yet, by some early estimates, only 1% of studies in the biomedical literature meet the minimum criteria for scientific quality[2].

So when a physician tries to update herself on the best therapies for a given disease, she not only has to slog through all of those papers, but also hone in the “good” ones that really provide pivotal results and avoid the research that wasn’t done rigorously.

group of us from Evid Science, the University of Utah and McMaster University recently proposed a machine learning approach to this problem in our paper, “A Deep Learning Method to Automatically Identify Reports of Scientifically Rigorous Clinical Research from the Biomedical Literature: Comparative Analytic Study.” The original paper was meant for biomedical informatics professionals, but the topic is applicable for data scientists and aspiring data scientists alike.

The most popular, state-of-the-art approach is the set of Clinical Query filters provided within PubMed.[2] Clinical Query filters are combinations of text-words and MeSH terms[3] that can return a higher proportion of rigorous papers than just PubMed searching alone.

And while they are in standard usage currently, and do a good job, they fundamentally rely on human-provided MeSH terms, a process which can take annotators between a week and almost a year after publication to add.

However, they use particular hand-crafted features (e.g., variants on MeSH terms, UMLS concepts, etc.) and other features such as bibliographic measures, which themselves may be proprietary data to a company (and therefore are not freely available).

Which is all to say, they use features that go beyond the text and therefore present the problems of availability (e.g., can you even access bibliographic data for every paper?) and over-fitting (e.g., how can you be sure these features will generalize to other cases?).

For example, a feature might indicate that a word related to experiment is followed by a word related to success, and would only output a high number for text positions where that is true.

So by sliding each window over the text, and recording its maximum value, the CNN can transform the entire input text into a single 300-dimensional 'feature vector,' representing which features are most present in the text.

Each feature is then scaled according to how important it is for the desired classification, and then summed with all the other features to produce a final output classifier score (e.g., for the more experienced, the vector is classified using a simple linear model and softmax layer to produce a final output classifier score).

Fortunately, these CNN approaches are robust to noise – that means that if we could find enough training data, even if it had some noise (e.g., misclassified examples), the model should be able to pick out enough signal to perform well.

Use-Case 1: Developing Evidence-Based Syntheses of the Medical Literature In this scenario, the goal is to retrieve every possible paper (since otherwise a synthesis could be considered incomplete), but filter out as many non-rigorous papers as possible (since a synthesis wouldn’t consider any paper that isn’t rigorous science).

And here, our CNN had equivalent recall to McMaster’s text-word filter (e.g., returns the same number of good articles), but significantly higher precision (+22.2%, which means it filtered out many more non-rigorous ones).

It only looks at abstracts and titles, so it doesn’t rely on features that might be unavailable due to access (e.g., you need a subscription) or timeliness (e.g., no MeSH terms yet or no bibliographic metrics yet).

References [1] MeSH stands for Medical Subject Headings, and it is an ontology of medical related terms and their relationships [2] [3] Annotators for the National Library of Medicine apply MeSH terms to papers that they deem most appropriate in a process known as “indexing.” Matthew MichelsonMatthew Michelson is the CEO of Evid Science, a technology company usingAI to make access to medical evidence as simple and seamless as possible.

His research includes investigating ways to integrate the best available clinical evidence within clinicians’ workflow to support decisions in the care of specific patients.

How Does a Computer Algorithm Diagnose Diabetic Retinopathy?

Machine learning can greatly improve a clinician's ability to deliver medical care. This JAMA video explains how one type of machine learning -- convolutional ...

Peter Bailis: MacroBase, Prioritizing Attention in Fast Data Streams | Talks at Google

Professor Peter Bailis of Stanford provides an overview of his current research project, Macrobase, an analytics engine that provides efficient, accurate, and ...


Thorium is an abundant material which can be transformed into massive quantities of energy. To do so efficiently requires a very ..

TensorFlow Dev Summit 2018 - Livestream

TensorFlow Dev Summit 2018 All Sessions playlist → Live from Mountain View, CA! Join the TensorFlow team as they host the second ..

20-Newsgroups Classification and Prediction by Zihao Ren and Sihan Peng

Machine Learning 2017 final project: 20-Newsgroups Classification and Prediction by Zihao Ren and Sihan Peng.

Integrative Science Symposium: e-Relationships

Recorded 14 March, 2015, at the inaugural International Convention of Psychological Science, Amsterdam 0:19 - Adolescents and Social Media: Attraction, ...

From the Present to the Future of Higher Education featuring Bryan Alexander

From the Present to the Future of Higher Education Originally Held: September 26, 2014 This COIL Fischer Speaker Series event explores the future of higher ...

The Weeknd - D.D.