We recently organised a workshop on 'Big Data in Cancer', as part of a year-long programme to launch Warwick's new Data Science Institute. It was a fascinating day, with four brilliant speakers (Florian Markowetz, Andrew Teschendorff, Paul Moss, and Sean Grimmond) and covered a lot of important topics.
Florian was first to speak, and led off with some conceptual and philosophical points about Big Data, which I think are hugely important. There is, of course, a lot of hype surrounding Big Data and what it's capable of. The more 'enthusiastic' supporters even argue that it makes the scientific method obsolete. I share Florian's view that this is kind of silly; no matter how much data you have, there is still a difference between correlation and establishing a causal link, for example. And patterns in data are not the same as establishing underlying general rules. Yes, you can map out a rule using data if you have enough of it, but wouldn't you rather just have a simple formula? Rather, Big Data is a technical challenge (how do we handle it), with the pay-off being smaller variance in our estimates, more detail about the things we are studying and the like. For example, getting a much more detailed description of the heterogeneity and evolutionary history of a given tumour.
Andrew told us about some of his work in epigenomics. This is a fascinating topic that I'm trying to learn more about, but is seems that tumours come with an explosion of epigenetic modification throughout the genome, which contains potentially all sorts of information that may allow us to diagnose cancer, its type, and its likely progression. The early detection of cancer is a hugely important area of translational research, which makes this doubly exciting.
Paul gave us a whistle-stop tour of some areas of cancer research. He talked a lot about the potential for electronic health records and Health Episode Statistics data, something in which the Queen Elizabeth Hospital in Birmingham is really leading the way, with a serious informatics infrastucture. To my mind, this is part of a coming revolution in healthcare where all relevant medical data are stored electronically in integrated databases, which turns medical research into a software/algorithms problem. Evidence from other areas of human endeavour suggest that this will be a huge driver of the pace of innovation.
Finally, we were treated to a great talk by Sean, who's heavily involved in the International Cancer Genome Consortium and has recently moved to the UK to pursue the translational aspects of genomic medicine (note: the NHS and the UK's science base makes us world-leading in this area, and this means we can attract the world's best researchers). I was really grabbed by just how rapidly genomic medicine is scaling, in terms of data volume. The cost to sequence a whole human genome has dropped from ~$100m (2001) to ~$1000 (2014), an incredible 5 orders of magnitude in 13 years (think about that for a moment...). With the UK's 100,000 genome project already underway, the US recently announcing a 1,000,000 genome project, and China undertaking a similar scale project, we are just about to be awash in genomic data.
The limiting factor in translational cancer research is about to become our ability to effectively handle and use the flood of data that we're just seeing the first trickles of. This is a hugely exciting time to be involved in cancer research, but we all might need to buy bigger computers...