The academic science is evolutionized during last decade in such a way
that it is not a “classical” i.e. reliable, reproductive and predictive
knowledge anymore. We need to have a simple test or rule how to separate a
precious ontologically consistent finding from a bogus which prevails in a
modern scientific field. Very interesting can be an approach with Begley’s Six
Rules. (An abstract of the article)
The lack of robust
reproducibility in the scientific literature is both shocking and troubling,
and has been a widely covered topic over the past couple years.
Later in 2011, some
real data was added to strengthen the case: a Bayer Healthcare team published
work showing that only 25% of the academic studies they examined could be
replicated (Prinz et al. Nat. Rev. Drug Discov. 10, 712, 2011). And then
earlier this year, Glenn Begley (formerly Amgen) and Lee Ellis (MDACC) showed
that of 53 “landmark” oncology studies from 2001-2011, each highlighting big
new apparent advances in the field, only 11% (only 6!) could be robustly
replicated in work done at Amgen (Begley & Ellis Nature 483,
531–533, 2012). Adding insult to injury, the number of citations for the
unreproducible findings actually outpaced those with reproducible findings
according to the Amgen work: averaging 248 vs 231 citations, respectively, for
papers in high impact factor journals and an even more astonishing 169 vs 13
citations for papers from other journals.
These are
frightening statistics for an industry predicated upon building on the prior
work of others and the integrity of peer review for sorting the good from the
bad.
As we think about
starting new companies, initiating drug discovery campaigns, or even launching
clinical studies, how do we deal with this issue?
First and foremost,
we need to get better at assessing the scientific literature, and all of us
involved in translational medicine need to hold ourselves to a higher standard
– including investigators, their institutions, journal editors, grant-funding
bodies, VCs, Pharma, etc… From an industrial perspective, we should also do
better diligence – assessing science with better filters about what’s robust
and what’s not.
1) Were studies
blinded?
2) Were all results
shown?
3) Were experiments
repeated?
4) Were positive
and negative controls shown?
5) Were reagents
validated?
6) Were the
statistical tests appropriate?
Lets take each one
of these in turn.
1) Most studies
aren’t blinded with experimental and control arms. Furthermore, by my
estimate, less than 20% of Methods sections even mention whether the work was
blinded to prevent experimenter bias, and in most cases the blinding
methodology isn’t included.
2) Results from
multiple studies are rarely shown in the same paper, as its usually only
the “representative” example figure (read = best single result). Outliers often
disappear from figures (e.g., telltale sign are n’s differing randomly between
arms). Many western and northern blots show only a computer-generated slice of
the gel, without size markers. Its also often unclear if the exposures were in
the linear range of the staining.
3) N-of-1
experiments are sadly fairly commonplace in the literature. Assays often
don’t have replicate values included. Nor are aggregate n’s often used. Its
true that some long term animal models are a chore to do multiple times, and
often critical reagents are expensive – but repeating studies before publishing
should be the bar.
4) The use of
both positive and negative controls to benchmark an experimental system is frequently
not done. In fairness, with a novel model, there might not be a positive
control. But if there is, it should be included and described. Selection of the
right controls is also an issue: e.g., when studying the role of a single
kinase in a disease, a promiscuous dirty kinase inhibitor that happens to hit
the target of interest is probably not a great control.
5) Validated
reagents are essential to draw robust conclusions. Unfortunately, Begley
and his colleagues found this to be frequently overlooked, especially the
strength of immunohistochemistry probes and western antibodies (e.g., species
cross-reactivity). Authors should highlight where the validated reagents were
obtained.
6) Statistics is
a big gap for most papers. Proper powering of animal studies with a
pre-agreed stat plan is a rarity. Showing n’s and SEM bars in figures is
important. Also, what’s the right p-value to use; for instance, p-values of
0.05 aren’t relevant to post hoc analysis hunting for signals on a chip.
These Six Rules are
good guidelines for those of us in the business of finding and commercializing
the next cutting edge science. Thinking about these during diligence around an
investigator’s work will undoubtedly improve the outcome of
academic-to-industry translational efforts. Furthermore, Tech Transfer Offices
should hold these Rules up when they are working on invention disclosures and
external outreach for the work. Lastly, more CROs should track the literature
and propose to do reproducibility work in line with these Rules for high impact
science out of top tier academic centers; my guess is many academic
institutions would support those studies.
Importantly,
adherence to these Rules won’t make reproducible translation work 100% of the
time. Fundamentally, there are “language” differences between academic and
industrial work. The often used phrase “safe and well-tolerated” in an academic
animal study means the animals didn’t look sick nor did they die. But it
doesn’t mean that even gross organ pathology was ruled out, much less full
histopathology, chemistry and blood counts, liver enzyme levels, etc… This
language difference is an important factor in translation, but is much more
nuanced than Begley’s Six Rules and needs to be considered in any
academic-to-industry transfer.
But, as a parting
remark, let’s not forget that this is all about cutting-edge science. There
will always be studies that can’t get repeated – that’s part of the iterative
nature of the scientific method of articulating and challenging hypotheses. But
as a system we can’t continue to tolerate ‘hit rates’ of reproducibility below
50% from academic scientific literature, especially from top tier journals and
biomedical institutions.
No comments:
Post a Comment