As I blogged about previously my Amber Biology colleague, Gordon Webster and I are writing a book Python For The Life Sciences, and today we are releasing a sample chapter.
In this chapter we show you how to extract and examine data generated from cellular interaction networks, sometimes affectionately known as “hairball” data. In particular, we’ll show you an example of reading in data on transcription factor networks from yeast. We will take you through the steps of reading in files; creation of set data structures and simple queries. To give you a flavour, here’s a brief extract from the Chapter (the full sample chapter is available as a PDF download):
A set, as you might recall from distantly remembered introductory maths classes, contains only unique members. In the context of the data structures for transcription networks, this means for each transcription factor, we only need to record that particular gene once.
So, let’s create an empty set of genes:
genes = set()
So that was easy! Let’s start adding genes to our set:
genes.add(“ABF1”)
genes.add(“ACS1”)
print genes
set(['ABF1', 'ACS1'])
Great, so we now have a set of genes. Sets, however, really come into their own because they will refuse to add duplicates. This is really handy, because instead of having to check whether a set contains an element (as we would with a list), we can just keep adding genes, and the set will magically ensure that there is only one entry. To continue with the example:
genes.add("ABF1")
print genes
set(['ABF1', 'ACS1'])
So note that there is still only one copy of the ABF1 gene! Pretty cool, eh? This turns out to come into its own when adding large numbers of genes to your dataset, because sometimes, (and this will come as shocking fact to biologists), there may be errors or duplicates in your input data!
Read more and download the PDF…
More about the book
The book is written primarily for life scientists with little or no experience writing computer code, who would like to develop enough programming knowledge to be able to create software and algorithms that they can use to advance or accelerate their own research.
The aim of the book
The aim of the book is to teach the working biologist enough Python that he or she can get started using this incredibly versatile programming language in their own research, whether in academia or in industry. It also aims to provide a Python foundation upon which the biologist can build by extrapolating from the broad set of Python fundamentals that the book provides.
This book is not another comprehensive guide to the Python programming language or reference. The purpose is to introduce computational tools to jump-start your biological imaginations. We will show the reader a range of quantitative biology questions that can be addressed using just one language from a range of life sciences. The examples are deliberately eclectic and include bioinformatics, structural biology, systems biology to modeling cellular dynamics, ecology, evolution and artificial life.
Like a good tour, these biological examples have been chosen to be simple enough not to impede the reader’s ability to assimilate the Python coding principles being presented – but at the same time each scientific problem illustrates a simple, but important biological principle or idea. By covering a wide variety of examples from different parts of biology, we also hope that the reader can identify common features between different kinds of models and data and encounter useful ideas and approaches that they may not have previously considered.
We believe that exploring biological data and biological systems should be fun! We want to take you from the nuts-and-bolts of writing Python code, to the cutting edge as quickly as possible, so that you can get up and running quickly on your own creative scientific projects.
In short, this is the book that would like to have read when we were learning computational biology.