We’re drowning in data that could transform healthcare. From digitising patient medical records to the increased popularity of genome sequencing, industries are sitting on a wealth of information that could dramatically speed up the way we diagnose and treat diseases.
Compared with the cusp of a tremendous wave of innovation, ‘big data’ is the buzzword that has not fallen on deaf ears in the pharmaceutical industry as it looks to harness the possibilities for drug discovery and portfolio management. But is it all as simple as it sounds? After all, remove the context and data can be rendered meaningless. And what if you know the information is out there, but aren’t able to access it?
Big issue
By definition, ‘big data’ in healthcare refers to electronic health datasets so large they are difficult to manage with traditional software. However, some practitioners think that, while it’s all very well concentrating on zetta and yotta-bytes, there are some much smaller datasets we’re not yet using to the best of our ability. This includes late stage pipeline data, like information from clinical trials. It might not be big, but it could certainly be clever.
"Clinical trial data compressed is never really more than a few hundred megabytes, we’re not even talking giga, but complexity of the data inside is such that it requires very advanced statistics and very careful angling of that information," says a scientific computing expert at a large pharmaceutical company who asked not to be named. The specialist uses advanced statistics to make sense of data from clinical trials.
They believe having systematic integration of such information could result in a promising future for the pharmaceutical industry, so it becomes much more data-driven within the late stage pipeline. We’re already obsessed with information, we just aren’t using it in the best way, and we’re falling behind other industries.
"We should learn from companies such as Google and Facebook," our source opines. "We don’t use historical datasets in a consistent way. We do it for specific questions, but not across the board for every decision we make. This needs to change. Look around you in other types of industries like manufacturing and advertising, or web services, like Amazon and other online stores, which use different types of data to drive the next generation of products. Pharma needs to work more along these lines."
Access to this information is a problem, though. Many clinical studies are either never published, and those that are tend to result in single reports, asking specific questions. The remaining data generated from projects lies unwanted and obsolete.
Knowledge economy
One person’s trash is another’s treasure, however. Some believe the industry is missing a trick, and the extra info could provide additional value, ultimately benefitting patients and their physicians. First, though, a revolution is sorely needed in terms of data access and sharing policy.
America’s Institute of Medicine (IoM), concerned that much clinical data is collected but never made available to other researchers, recently called for dissemination plans to be included as part of the registration of a study. The full data sets should be shared within 18 months of its conclusions, regardless of whether the trial itself is published, it advises. Its latest report suggests the current system "fails to provide an adequate return on the investments of trial participants, investigators, and sponsors".
Jeffrey Drazen, New England Journal of Medicine editor-in-chief, said in an editorial that clinical data should be viewed "as a community resource, much like a shared park, rather than as personal property".
A culture of fear prevents this happening. There is a huge policy issue on patient information, because it’s so sensitive. Likewise, as the initial analysis of the data likely required incredibly complex statistics there is a concern other people could use the information and analyse it incorrectly.
It’s not statistical technology that’s holding us back though. Many innovative companies are building applications and analytical tools that can help us further identify value and opportunities in data.
Generous genes
In addition to industries far from pharma, we can learn from sectors closer to home, such as genomics. The US Government is aware that research in this field advances our understanding of factors influencing health and disease. It believes sharing data from such work provides opportunities to accelerate scientific knowledge by combining large and information-rich databases. The National Institute of Health (NIH) issued its final Genomic Data Sharing Policy in late August 2014.
If your research is publicly funded, you are now required by law to deposit the genomic data into NIH’s repository. Otherwise, you can’t publish the study.
"Advances in DNA sequencing technologies have enabled NIH to conduct and fund research that generates ever-greater volumes of genome-wide association studies and other types of genomic data," says Eric Green, National Human Genome Research Institute director, on the new data-sharing policy. "Access to these data – according to the data management practices laid out in the policy – allows researchers to accelerate research by combining and comparing large and information-rich datasets."
NIH’s policy didn’t happen overnight. Some expressed worries that others might use their data and end up drawing the wrong conclusions from it.
"It didn’t happen like that, though" says our source. "You have to go through a publication process, and journal peer review. The message I’d like to pass is that we should look into other domains in which policies on the public sharing of this kind of data have been enforced by governments. Publishing has not had a bad impact on the scientists: it did not destroy their reputation, or their research. Instead, it reinforced their work, because if no one can see it, data is less valuable than when it is publically available."
Stock question
The crucial difference between pharma and academia, of course, is that every decision might impact the stock value of a company. Taking a drug from bench to bedside is extraordinarily expensive and businesses must remain viable in what is a highly competitive environment. Exposing previously confidential information to rivals is an understandably scary task, but the flipside is that increased transparency means improved trust in the industry as a whole.
Some initiatives actually involve collaborating and sharing data between rivals such as U-BIOPRED (Unbiased Biomarkers for the Prediction of Respiratory Disease Outcomes), a five-year, European-wide project that aims to understand more about severe asthma and uncover new information that could yield new clues as to how to treat it effectively. It uses samples and medical information from hundreds of severe sufferers and compares them with samples from patients with mild symptoms, no asthma and obstructive pulmonary disease (COPD). The scheme hopes to identify different sub-types of the condition and involves scientists from various universities, research institutes and several pharmaceutical companies, including AstraZeneca and GSK.
Alternatively, there is Project Data Sphere, another venture that allows researchers to share, integrate and analyse patient-level data. In this case, it involves third-phase cancer figures, which providers are required to de-identify. Users share everything, from protocols, to data descriptors and even case report form templates.
The project’s goals is to spark innovation through improved trial design and statistical methodology, reduced duplication and smaller trial sizes. It’s hoped the initiative will ultimately unleash the full potential of data to advance research and benefit cancer patients.
Placebo effect
However, the members only share the placebo arm data (from the participants who received a sugar pill), so the value of the info is fairly limited. When we ask why the treated group’s data isn’t also shared, we find ourselves back at the same conundrum: the
fear that precious information could be reinterpreted, or used by a competitor, to the disadvantage of the original company.
Perhaps total transparency could be too much to hope for in the near future, but advantages could well come from sharing datasets within pharmaceutical organisations.
Transparent solution
"It would be good to define sub-populations and new markets, and understand why some people do not respond correctly to a drug," said our source. "We’d also be able to design better trials and learn from the errors of the past, because it’s within the context of other trials if you’re able to integrate multiple datasets together."
Their team is looking into whether old clinical trial data can be reused to drive the business forward, without sharing the information publically. But oddly, that’s not something that routinely happens within pharma.
Perhaps because it hasn’t really been considered before, people are highly suspicious without realising the ample benefits. But when data emerges from projects where sharing is key, things may start to change.
"It’s a fascinating issue," our source concludes. "What prevents us to share has been mainly unknown: it’s what makes people afraid. The same thing happened in other fields at the beginning, so it’s all about communication and understanding the benefits. Starting the dialogue really would change the industry."