E. coli’s regulation network from RegulonDB to Python

RegulonDB provides a continuously updated network of interactions of the bacterium E. coli. In this post I will show how easy it is to explore this network using Python with its package NetworkX. In case you are in a hurry, I put the whole script with enough comments at the end of this post.

Getting the Graph

First we create an empty graph:

import networkx as nx
eColiNetwork = nx.DiGraph(name = 'E. Coli Network')

Then we fetch the RegulonDB file containing the informations on the network. To get an internet file in Python we use urllib2:

from urllib2 import urlopen
url = "http://regulondb.ccg.unam.mx/data/network_tf_gene.txt"
regulonFile = urlopen(url)

Now we will read this file line by line to fill our graph with nodes (genes) and edges (interactions). The file contains uninformative lines (comments begining with ‘#’ or empty lines) and lines of the form ‘gene1 gene2 sign …’ where the sign is ‘+’ or ‘-‘ according to whether the interaction is an activation or an inhibition.

for line in regulonFile :

	# ignore comments and empty lines

	if not line.startswith(('\n','\t','#')):

		# A line is of the form "gene1	gene2 sign ..."

		g1,g2, sign = line.split('\t')[:3]

		# standardize gene names ( all characters to lower case )

		g1,g2 = g1.lower(),g2.lower()

		# add the edge to the network

		eColiNetwork.add_edge(g1,g2)
		eColiNetwork[g1][g2]['sign'] = sign

Easy as pie with Python ! You can then save this graph for offline use :

nx.write_gpickle(eColiNetwork,'EcoliNetwork.gn') # To save
eColiNetwork = nx.read_gpickle('EcoliNetwork.gn') # To load

Study and Statistics

It is very easy to get informations on a NetworkX graph. Let us start with some basic statistics:

nx.info(eColiNetwork)
>>> Name: E. Coli Network
    Type: DiGraph
    Number of nodes: 1696
    Number of edges: 3809
    Average in degree:   2.2459
    Average out degree:   2.2459

Note that there are only 1700 genes out of the circa 4000 genes of E. coli. Moreover it doesn’t have so much edges, so we can expect it not to be fully connected, i.e. maybe you cannot always go from one gene to another in this graph. Let us check that out.

# undirected version of E. coli's network
undirected_net = eColiNetwork.to_undirected()
 
connected_components = nx.connected_components(undirected_net)

print 'Number of connected components : ', \
		len(connected_components )

print 'Length of the connected components : ', \
		[ len(c) for c in connected_components]

 

>>> Number of connected components :  22
>>> Length of the connected components :  [1603, 13, 7, 6, 6, 6, 6, 5, 5, 5, 4, 4, 4, 3, 3, 3, 3, 3, 2, 2, 2, 1]

Indeed we have 21 small satellite networks and one big central component. Let’s learn more about the latter:

main_component = nx.connected_component_subgraphs(undirected_net)[0]

print "Average shortest path in E. coli's main connected component : ", \
	   nx.average_shortest_path_length(main_component)

print "Diameter of E. coli's main connected component : ", \
	   nx.diameter(main_component)

 

>>>  Average shortest path in E. coli's main connected component :  3.61111539459
>>>  Diameter of E. coli's main connected component :  9

We can also get information on a specific gene, like its regulators (which are its predecessors in the graph, i.e. the genes pointing at it) and the genes that it regulates (its successors). Let’s have a look at acs and allr :

acs_regulators = eColiNetwork.predecessors('acs')
print "Genes regulating acs : ", " ".join( acs_regulators)

allr_targets = eColiNetwork.successors('allr')
print "Genes regulated by allr : ", " ".join(allr_targets)
>>> Genes regulating acs :  fis crp ihf
>>> Genes regulated by allr :  allb alla hyi gcl glxr alls ybby ybbw glxk

Sketching a Subgraph

Because it is so easy, let us draw some networks with NetworkX. Displaying the whole graph, or even its main component, would be ugly, as they are too big, so we choose the second connected component:

component = nx.connected_component_subgraphs(undirected_net)[1]

We use pylab for its plotting capabilities :

from pylab import *

figure()
nx.draw(component, node_size = 1000)
show()

And voilà !

E. coli subnetwork

E. coli subnetwork

Complete Code

Have fun 🙂


# (F)utilities to explore the network of Escherichia coli
# as given on RegulonDB


##### FETCH THE NETWORK ON THE INTERNET ################################


import networkx as nx
from urllib2 import urlopen

# Fetch E. coli's network on Regulon DB as a .txt file

url = "http://regulondb.ccg.unam.mx/data/network_tf_gene.txt"
regulonFile = urlopen(url)

# Create an (empty) directed graph

eColiNetwork = nx.DiGraph(name = 'E. Coli Network')

# Read the file and fill our network

for line in regulonFile :

	# ignore comments and empty lines

	if not line.startswith(('\n','\t','#')):

		# A line is of the form "gene1	gene2	+ "

		g1,g2, sign = line.split('\t')[:3]

		# standardize gene names ( all characters to lower case )

		g1,g2 = g1.lower(),g2.lower()

		# add the edge to the network

		eColiNetwork.add_edge(g1,g2)
		eColiNetwork[g1][g2]['sign'] = sign


########### SAVE AND LOAD ##############################################


import pickle

# Save the graph as EcoliNetwork.gn

with open('EcoliNetwork.gn', 'wb') as output:
	pickle.dump(eColiNetwork, output, pickle.HIGHEST_PROTOCOL)

# Load the graph

eColiNetwork = pickle.load(open('EcoliNetwork.gn', 'r'))


#### PRINT INFOS ON THE NETWORK ########################################


# undirected version of E. coli's network

undirected_net = eColiNetwork.to_undirected()

# Stats

print '-- General informations : \n', \
		nx.info(eColiNetwork)

connected_components = nx.connected_components(undirected_net)

print 'Number of connected components : ', \
		len(connected_components )

print 'Length of the connected components : ', \
		[ len(c) for c in connected_components]

main_component = nx.connected_component_subgraphs(undirected_net)[0]

print "-- Average shortest path in E. coli's main connected component : ", \
	   nx.average_shortest_path_length(main_component)

print "-- Diameter of E. coli's main connected component : ", \
	   nx.diameter(main_component)

# Infos on a specific gene :

print "-- Genes regulating acs : ", \
	   " ".join( eColiNetwork.predecessors('acs') )

print "-- Genes regulated by allr : ", \
	   " ".join( eColiNetwork.successors('allr') )


###########  D R A W  ##################################################


from pylab import *

figure()
component = nx.connected_component_subgraphs(undirected_net)[1]
nx.draw(component, node_size = 1000)
show()

Advertisements

2 comments on “E. coli’s regulation network from RegulonDB to Python

  1. […] The Sugar High HomeAbout « E. coli’s regulation network from RegulonDB to Python […]

  2. […] A few weeks ago I posted about how easy it was, using Python’s Networkx module and the RegulonDB database, to get and navigate through the genetic network of E. Coli. I didn’t mention at the time that you can also find all the subnetworks of E. Coli sharing a given topology with just a few lines of code: […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s