The limitations of Big Data in life science R&D

Big Data has become an increasingly large presence in the life science R&D world, but as I have blogged about previously, increasingly larger datasets and better machine algorithms alone, will not leverage that data into bankable knowledge and can lead to erroneous inferences.  My Amber Biology colleague, Gordon Webster has a great post over on LinkedIn leavening the hype around Big Data, pointing out that analytics and visualizations alone are insufficient for making progress in extracting knowledge from biological datasets:

Applying the standard pantheon of data analytics and data visualization techniques to large biological datasets, and expecting to draw some meaningful biological insight from this approach, is like expecting to learn about the life of an Egyptian pharaoh by excavating his tomb with a bulldozer

“-omics” such as those produced by transcriptomic and proteomic analyses are ultimately generated by dynamic processes consisting of individual genes, proteins and other molecules in a cellular network.  Done properly,  dynamic simulation and modeling of such networks, especially agent-based approaches, can allow biologists to run in-silico “what-if” experiments.  Not as purely theoretical exercises, but to ask what the next experiment should be:

Modeling in the life sciences, has until now, been largely confined to the realm of theoretical biology – but used as an adjunct to an experimental approach in the laboratory, modeling can do a great deal more than just the kind of predictive “weather forecasting” for which it is generally best known. Models can suggest new hypotheses to test in the laboratory, and answer some of the most critical questions facing anybody who is trying to run a productive life science R&D program. What experiment should I do next?

Read more…