Counting bacteria : confidence intervals from one measurement

Here are two biologically-inspired statistics problem, with very simple solutions. They show that, when you have informations about the way the observations are collected, one observation may be enough to give a confidence interval.

Question 1

From a bacterial solution I sampled a small volume v in which I found no bacteria. Give an upper bound (with confidence 95%) for the concentration of bacteria in the solution.

Question 2

From a bacterial solution I sampled a small volume v in which I found n bacteria (n > 20). Give a 95% confidence interval for the concentration of bacteria in the solution.

Solutions

Short answers : for the first question c < (3/v) and for the second c \in [\frac{ n \pm 2\sqrt{n}}{v}] .

Yep, it’s that simple ! See the solutions.

Finding a subnetwork with a given topology in E. Coli

A few weeks ago I posted about how easy it was, using Python’s Networkx module and the RegulonDB database, to get and navigate through the genetic network of E. Coli. I didn’t mention at the time that you can also find all the subnetworks of E. Coli sharing a given topology with just a few lines of code:

import networkx.algorithms.isomorphism as iso
def find_pattern(graph,pattern,sign_sensitive = True):

    if sign_sensitive:
        edge_match = lambda e1,e2 : (e1['sign']==e2['sign'])
    else:
        edge_match = None
    matcher = iso.DiGraphMatcher(graph,pattern,edge_match=edge_match)
    return list(matcher.subgraph_isomorphisms())

Feedforward loops in E. coli

As an example, let us have a look at the feedforward loops in E. coli. A feedforward loop can be described as a gene having an action on another gene, both directly and through the intermediary of another gene, like in the following sketch, where the arrows can represent activations or repressions

A feedforward !

If you want to print all the feedforward loops in E. coli’s network (as represented by RegulonDB), try this :

import networkx as nx
feedforward = nx.DiGraph()
feedforward.add_edges_from([('A','I'),('I','C'),('A','C')])
ffwd_in_ecoli = find_pattern(ECN,feedforward,sign_sensitive=False)
for subgraph in ffwd_in_ecoli:
    print subgraph

This prints each feedforward loop with the respective roles of the different genes (A,I, or C). As you can see there are 63 feedforward loops in regulonDB’s network, which is many. Let us now get sign-specific :

import itertools as itt
labels = []
frequencies = []
for signs in itt.product(('+','-'),repeat = 3):

    ffwd = nx.DiGraph()
    ffwd.add_edges_from([('A','I',{'sign':signs[0]}),
                         ('I','B',{'sign':signs[1]}),
                         ('A','B',{'sign':signs[2]})])
    frequency = len(list(find_pattern(ECN,ffwd,True)))
    frequencies.append(frequency)
    labels.append(" ".join(signs))

frequencies = array([float(f) for f in frequencies])/sum(frequencies)
fig,ax = subplots(1, figsize=(5,5))
title('feedforward loops in E. coli, \n by signs of A->I,I->B,A->B')
pie(frequencies, labels=labels, autopct='%1.1f%%', shadow=True)
show()

Relative frequencies of the different feedforward loops in E. coli

I’m not the first one to do such reasearch, but that won’t prevent me from commenting 🙂 The most frequent pattern can be seen as a double repression : gene A represses gene B, and, as an additional punishment, also represses the activator I of B. Such feedforwards are called coherent, as the two actions of the gene A have the same effect on the gene B. The second most frequent feedforward is incoherent: gene A activates gene B while activating a repressor of B, which seems a little foolish, but can be a way of creating bumps of activity (B is awaken just a few minutes before its repressor puts it down again).

A word of caution

Note that the algorithm didn’t find any feedforward loop with activations only (+++), while some have been reported in the literature. The same way, I couldn’t find any mutual interaction between two genes (A has an action on B and B has an action on A). So if you are going to use regulonDB’s network of interactions for reasearch purposes, always have in mind :

  • The database is INCOMPLETE, meaning that you can use it (carefully) to deduce the existence of stuff, but not to prove the absence of something. In particular, many gene interactions that occur through the intermediary of metabolites are omited ! For instance, the gene cyaA produces the cyclic AMP that is necessary to activate the genes regulated by crp. However only crp appears as a regulator of these genes.
  • In the context of synthetic biology, the few genes added to the bacteria are often supposed to work independently of the rest of the bacteria (some biologists say that they are orthogonal to the rest of the system). Thus, if you design a feedforward loop, it will be a feedforward loop and nothing else. But keep in mind that when the pattern-matching algorithm finds a feedforward loop in E. coli’s networks, each gene of the triangle could also be under the influence of many other, and maybe the behavior of the circuit is not what could be inferred by looking only at these three genes.
  • The algorithm consider isomorphism between subnetworks, which implies that it won’t mind if the subnetwork is under the influence of external genes, but it will mind if the subnetwork has auto-regulations that do not appear on the provided pattern. For instance, fis regulates crp and vice-versa, but this won’t be reported if you simply look for the pattern A\leftrightarrow B because, in addition, fis and crp regulate themselves !

Overstressed ? Here, take some bacteria !

The European jamboree of the 2012 iGEM competition (a synthetic biology contest for undergraduates) is over. And our team didn’t make it through to the world finals at Boston’s MIT 😦 .
Given that the event took place in Amsterdam, maybe this small project of mine would have given us more chances:

Hallucicoli, symbiosis meets happiness

Not sure, however, that it would be appreciated in Boston, as the FBI is one of iGEM’s main sponsors !

So what do you think… Feasible ? Marketable ?

It is a little akward to put such things on the internet, expecially if you, reader, are my future employer, so here is a disclaimer: this is just for fun (note, however, that right now real people are leading a real project to make bacteria produce THC). I do not support the use of drugs under any form and I never, ever had weed in my life (which may not be the case of many of the European iGEMers, who spent three very happy days in Amsterdam !)

E. coli versus the sun

I am really happy to see that the Zurich 2012 iGEM team has been designing a bacteria for sun protection. I had a similar idea a few months ago and I made a few fun slides out of it:

Bioshield – Natural Sun Protection

It seems that the team focused on sun detection and sunscreen agent production, while my concern with a bacteria-based sunscreen had been things like how to make a bacterial biofilm skin-adhesive and water-resistant. Has anyone any idea ?

E. coli’s regulation network from RegulonDB to Python

RegulonDB provides a continuously updated network of interactions of the bacterium E. coli. In this post I will show how easy it is to explore this network using Python with its package NetworkX. In case you are in a hurry, I put the whole script with enough comments at the end of this post.

Getting the Graph

First we create an empty graph:

import networkx as nx
eColiNetwork = nx.DiGraph(name = 'E. Coli Network')

Then we fetch the RegulonDB file containing the informations on the network. To get an internet file in Python we use urllib2:

from urllib2 import urlopen
url = "http://regulondb.ccg.unam.mx/data/network_tf_gene.txt"
regulonFile = urlopen(url)

Now we will read this file line by line to fill our graph with nodes (genes) and edges (interactions). The file contains uninformative lines (comments begining with ‘#’ or empty lines) and lines of the form ‘gene1 gene2 sign …’ where the sign is ‘+’ or ‘-‘ according to whether the interaction is an activation or an inhibition.

for line in regulonFile :

	# ignore comments and empty lines

	if not line.startswith(('\n','\t','#')):

		# A line is of the form "gene1	gene2 sign ..."

		g1,g2, sign = line.split('\t')[:3]

		# standardize gene names ( all characters to lower case )

		g1,g2 = g1.lower(),g2.lower()

		# add the edge to the network

		eColiNetwork.add_edge(g1,g2)
		eColiNetwork[g1][g2]['sign'] = sign

Easy as pie with Python ! You can then save this graph for offline use :

nx.write_gpickle(eColiNetwork,'EcoliNetwork.gn') # To save
eColiNetwork = nx.read_gpickle('EcoliNetwork.gn') # To load

Study and Statistics

It is very easy to get informations on a NetworkX graph. Let us start with some basic statistics:

nx.info(eColiNetwork)
>>> Name: E. Coli Network
    Type: DiGraph
    Number of nodes: 1696
    Number of edges: 3809
    Average in degree:   2.2459
    Average out degree:   2.2459

Note that there are only 1700 genes out of the circa 4000 genes of E. coli. Moreover it doesn’t have so much edges, so we can expect it not to be fully connected, i.e. maybe you cannot always go from one gene to another in this graph. Let us check that out.

# undirected version of E. coli's network
undirected_net = eColiNetwork.to_undirected()
 
connected_components = nx.connected_components(undirected_net)

print 'Number of connected components : ', \
		len(connected_components )

print 'Length of the connected components : ', \
		[ len(c) for c in connected_components]

 

>>> Number of connected components :  22
>>> Length of the connected components :  [1603, 13, 7, 6, 6, 6, 6, 5, 5, 5, 4, 4, 4, 3, 3, 3, 3, 3, 2, 2, 2, 1]

Indeed we have 21 small satellite networks and one big central component. Let’s learn more about the latter:

main_component = nx.connected_component_subgraphs(undirected_net)[0]

print "Average shortest path in E. coli's main connected component : ", \
	   nx.average_shortest_path_length(main_component)

print "Diameter of E. coli's main connected component : ", \
	   nx.diameter(main_component)

 

>>>  Average shortest path in E. coli's main connected component :  3.61111539459
>>>  Diameter of E. coli's main connected component :  9

We can also get information on a specific gene, like its regulators (which are its predecessors in the graph, i.e. the genes pointing at it) and the genes that it regulates (its successors). Let’s have a look at acs and allr :

acs_regulators = eColiNetwork.predecessors('acs')
print "Genes regulating acs : ", " ".join( acs_regulators)

allr_targets = eColiNetwork.successors('allr')
print "Genes regulated by allr : ", " ".join(allr_targets)
>>> Genes regulating acs :  fis crp ihf
>>> Genes regulated by allr :  allb alla hyi gcl glxr alls ybby ybbw glxk

Sketching a Subgraph

Because it is so easy, let us draw some networks with NetworkX. Displaying the whole graph, or even its main component, would be ugly, as they are too big, so we choose the second connected component:

component = nx.connected_component_subgraphs(undirected_net)[1]

We use pylab for its plotting capabilities :

from pylab import *

figure()
nx.draw(component, node_size = 1000)
show()

And voilà !

E. coli subnetwork

E. coli subnetwork

Complete Code

Have fun 🙂


# (F)utilities to explore the network of Escherichia coli
# as given on RegulonDB


##### FETCH THE NETWORK ON THE INTERNET ################################


import networkx as nx
from urllib2 import urlopen

# Fetch E. coli's network on Regulon DB as a .txt file

url = "http://regulondb.ccg.unam.mx/data/network_tf_gene.txt"
regulonFile = urlopen(url)

# Create an (empty) directed graph

eColiNetwork = nx.DiGraph(name = 'E. Coli Network')

# Read the file and fill our network

for line in regulonFile :

	# ignore comments and empty lines

	if not line.startswith(('\n','\t','#')):

		# A line is of the form "gene1	gene2	+ "

		g1,g2, sign = line.split('\t')[:3]

		# standardize gene names ( all characters to lower case )

		g1,g2 = g1.lower(),g2.lower()

		# add the edge to the network

		eColiNetwork.add_edge(g1,g2)
		eColiNetwork[g1][g2]['sign'] = sign


########### SAVE AND LOAD ##############################################


import pickle

# Save the graph as EcoliNetwork.gn

with open('EcoliNetwork.gn', 'wb') as output:
	pickle.dump(eColiNetwork, output, pickle.HIGHEST_PROTOCOL)

# Load the graph

eColiNetwork = pickle.load(open('EcoliNetwork.gn', 'r'))


#### PRINT INFOS ON THE NETWORK ########################################


# undirected version of E. coli's network

undirected_net = eColiNetwork.to_undirected()

# Stats

print '-- General informations : \n', \
		nx.info(eColiNetwork)

connected_components = nx.connected_components(undirected_net)

print 'Number of connected components : ', \
		len(connected_components )

print 'Length of the connected components : ', \
		[ len(c) for c in connected_components]

main_component = nx.connected_component_subgraphs(undirected_net)[0]

print "-- Average shortest path in E. coli's main connected component : ", \
	   nx.average_shortest_path_length(main_component)

print "-- Diameter of E. coli's main connected component : ", \
	   nx.diameter(main_component)

# Infos on a specific gene :

print "-- Genes regulating acs : ", \
	   " ".join( eColiNetwork.predecessors('acs') )

print "-- Genes regulated by allr : ", \
	   " ".join( eColiNetwork.successors('allr') )


###########  D R A W  ##################################################


from pylab import *

figure()
component = nx.connected_component_subgraphs(undirected_net)[1]
nx.draw(component, node_size = 1000)
show()

What synthetic biology (and your cat) can do for you

As they would say in my engineering school, a good scientific project is a scientific project with a good business model. Here is an idea that I developed one year ago for a start-up incubator’s workshop in Vienna, just  after I first heard of synthetic biology and the iGEM competition  :

Moppycat, your house’s best companion (pdf)

The potential market is just enormous… Anyone here wanting to invest ?