Heterocyst Simulation

I first came across heterocysts in an undergraduate lecture, where they appeared in a paragraph about nitrogen metabolism. It was shortly before my finals, so I wasn't able to investigate further until after the exams were over. I was intrigued by the division of labour between cells in the filament, which was analogous to a multi-cellular organism. The patterning of heterocysts along the filament also reminded me of cellular automata and seemed to provide the simplest model for developmental biology.

After my exams, I created a simple mathematical model of a filament of cells containing heterocysts, and later, my first full scale cell simulation. Despite its limitations, it is still the most sophisticated evolution simulation that I've created. It's also different from my previous simulations of evolution in that the function of the enzymes and their rate is immutable; instead evolution works only by altering the amount of each enzyme in the cell. This is achieved by changing the rate of enzyme degradation, the promoter regions of genes and the activity of transcription factors. Evolution therefore works on the regulation of enzymes rather than the enzymes themselves.

It has been suggested that non-coding DNA accounts for more differences between organisms that coding DNA and in this simulation I hoped to explore that idea.

Heterocysts in the real world

Over the years I have started several cell simulations, some of which have even 'lived', but by far the most complete and complex simulation, was of heterocysts (and their evolution). The main reason for its completion was that I managed to curb my natural tendency to cram in ever more details. I hope to write about that simulation and its results at some point, and maybe even add the Java program, but first I want to explain what heterocysts are and why I think they are an interesting subject to simulate. You could of course read the Wikipedia page about heterocysts, which is actually one of the few pages on Wikipedia that I created. But I will try to write a better description of heterocysts here.

Vital atoms

Nearly all organic molecules are made from only a few types of atom: carbon, nitrogen, hydrogen and oxygen. These atoms are abundant in our environment, however not all the atoms are available for organisms to metabolise. For example, diamond is pure carbon, but I don’t think there is a single organism that can metabolise diamond and use the carbon to built its own molecules. Similarly, carbon dioxide is freely available in the atmosphere (albeit at only 0.038%), but we animals are unable to utilise it. Fortunately for us, plants (and other organisms, such as algae and some bacteria), can photosynthesise; they use energy from light to rip carbon atoms out of carbon dioxide (releasing oxygen) and use them to build sugars, fats and proteins, which we can metabolise.

Hydrogen and oxygen are readily available to all organisms in the form of water, which just leaves nitrogen. Nitrogen gas makes up 70% of the atmosphere, so is readily abundant. However, the nitrogen atoms in nitrogen gas are joined by a triple bond, which makes the molecule very inert. In fact, one source of fixed nitrogen is through lightning strikes, which are one of the few phenomena powerful enough to tear the atoms apart. Fortunately, some organisms have evolved an enzyme called nitrogenase, which converts nitrogen gas into ammonia, which they and other organisms can metabolism.

Nitrogenase uses a lot of energy, hydrolysing 16 molecules of ATP (the cell’s short-term energy store) for every molecule of nitrogen gas fixed. Nitrogenase is – necessarily – a very reactive enzyme, with a transition metal complex (which contains molybdenum, vanadium or iron), and so is highly susceptible to damage by oxygen. As a result, most organisms that make nitrogenase live in anaerobic conditions, such as deep under the soil.

Combining Photosynthesis with Nitrogenase

Since all organisms require fixed carbon and fixed nitrogen, for a organism to be self-sufficient, it must combine photosynthesis with nitrogenase. This poses organisms with a challenge, since photosynthesis produces oxygen, which nitrogenase is so sensitive to. As such few organisms can fix both carbon and nitrogen.

One group of organisms that appear to achieve this feat of fixing nitrogen and carbon are the legumes, such as pea and soyabean plants. These plants photosynthesise like all plants, but also produce fixed nitrogen, which is why they were traditionally used in crop rotation – they increase the amount of fixed nitrogen in the soil, thus reduce the need for nitrogen-containing fertilizers. However, legumes do not actually fix nitrogen themselves; they allow themselves to become infected by nitrogen-fixing bacteria called rhizobia, forming a symbiotic relationship with them. Rhizobia form cysts or nodules in the roots of legumes, metabolising fixed carbon (i.e. sugars) produced by the plant. In return, rhizobia use nitrogenase to fix nitrogen gas, providing themselves and the plant with fixed nitrogen. The root nodules have a low oxygen concentration, partly due to a protein called leghaemoglobin, which is produced by the plant. Like haemoglobin in our blood, leghaemoglobin binds oxygen (only more strongly), thus helping to protect the nitrogenase enzyme.

Some cyanobacteria (such as Anabaena sperica and Nostoc punctiforme), however, have accomplished the seemingly impossible task of fixing both carbon and nitrogen. Like legumes, these bacteria solve the problem by separating the tasks of carbon fixation and nitrogen fixation into separate compartments. Rather than have different organisms carry out the two reactions, individual cells specialise to carry out either photosynthesis or nitrogen-fixation and then exchange the products (a bit like two people or economies specialising in two different type of production then trading). The cells specialised in nitrogen-fixation are called heterocysts, due to the fact that they are different from the normal, vegetative (photosynthetic) cells and they have a cyst-like appearance.

In becoming a heterocyst, cells must synthesise the enzyme nitrogenase, but they must also halt photosynthesis. They acheive this by degrading the group of enzymes that make up photosystem II, which is the part of the photosynthetic machinery that produces oxygen. Heterocysts also produce additional cell walls, which help to protect them from oxygen (which is still produce by neighbouring cells). They also produce proteins that bind oxygen like leghaemoglobin in legumes. In order to cope with the increased energy demand of nitrogen-fixation, heterocysts also increase their rate of glycolysis – an set of reactions that produce energy from glucose. Since heterocyts are unable to produce glucose by photosynthesis, it is drawn in from neighbouring cells through pores. Fixed nitrogen flows through these pores in the opposite direction, from the heterocysts to the vegetative cells.


Specialisation (or differentiation) of cells is common (maybe ubiquitous) in multicellular organisms. For example, humans have cells that are specialised for roles in the heart, lung, liver etc.. However, it is very unusual for single-celled organisms, such as bacteria to differentiation. In fact, a filament of cyanobacteria containing heterocysts could be viewed as a single multicellular organism with two tissue types. In further support of this idea is the fact that the formation of heterocysts is a terminal differentiation – that is, not only is a heterocyst unable to revert back to being a vegetative cell, it is also unable to divide. This makes a heterocyst like the majority of the cells in an animal’s body – only stem cells are able to divide and replace cells such a nerves or red blood cells, and only the germ cells (sperm and eggs) can go on to produce a new organism. Some heterocyst-forming bacteria also differentiate to form spore-like cells called akinetes or motile cells called hormogonia.

For all collections of cells that specialise (such as multicellular organisms), it is vital that the right number of cells specialise in each task and that the cells specialised for each task are in the right position. For example, in humans, we don’t want all of our cells to become lung tissue, and nor do we want our cells in our brain to become liver cells. How multicellular organisms develop is a very complex question. Heterocysts provide a simple model that can help us to understand the principles of development since filaments of heterocysts-forming bacteria have only two cell types, and the pattern of cells is one-dimensional.

When fixed nitrogen becomes limited, it is important that the right number of cells become heterocysts: too few and the filament of cells will still be limited by fixed nitrogen, too many and fixed carbon will become limited as fewer cells photosynthesise and more sugar is metabolised to power nitrogenase. It is also important that the right cells in the filament become heterocysts: it is optimum to have the heterocysts evenly spaced along the filament, so fixed nitrogen is more evenly spread along the filament. If two heterocyst are next to one another on the filament then they will compete with each other for fixed carbon. In nature, heterocysts normally form about once every 9 to 19 cells.

How can a filament of cells create the right pattern when there is no central control? Each cell acts autonomously and can only directly communicate with the cells on either side of itself, yet each must decide whether or not to become a heterocyst. To make the problem more complex, the vegetative cells are constantly dividing, so even once the pattern is established, heterocysts will move further away from one another. Once heterocysts are about 19 cells apart, the cell halfway between them must somehow ‘know’ it is in the middle and become a heterocyst, thus reducing the distance between heterocyst to nine cells. [I should probably draw a diagram to illustrate this.]

I’ll write in a later post (or three) about how this is done, how I modelled heterocysts and this problem, and an analysis of what happened when I allowed my model to evolve a solution to the problem.

Virtual biochemistry


The first step in creating the simulation was to define the metabolites. I limited myself to five metabolites, which I have given one-letter abbreviations. Carbon dioxide and nitrogen gas were assumed to be present in the environment at fixed concentrations. Oxygen was also present in the atmosphere at a fixed amount, but its concentration in cells could be changed by cellular reactions. The five metabolites in cells were:

  • C: fixed carbon, e.g. carbohydrates, such as sugars
  • N: fixed nitrogen, e.g. ammonium and amino acids
  • E: energy, e.g. ATP, NADH and NAPDH
  • O: oxygen gas
  • P: protein and other biomolecules


To keep the simulation simple (for once), the reactions I included in the simulation corresponded to pathways of multiple reactions in real life. Each of these reactions were catalysed by a single enzyme, which I after the pathway itself or main enzyme in the pathway:

  1. Photosystem II: lightO + E
  2. Rubisco: 2EC
  3. Catabolism: O + C → E
  4. Nitrogenase: 2EN
  5. Anabolism: C + NP

The reactions (numbered) form the metabolic network below:

Heterocyst metabolism network

There were two further processes: diffusion and cell division. Diffusion caused the amounts of C, N and O to move between neighbouring cells, from where they were more concentrated to where they were less concentrated. Oxygen also diffused between cells and the environment, where its concentration was fixed. Cell division occurred when a cell accumulated 1000 units of P. At this point, 1000 units of P were removed and the amount of the other metabolites was halved. Then a daughter cell was created with the same amount of each metabolite as its mother.

Since the success of a genome was determined by the speed at which the filament reach a threshold number of cells, evolution was effectively selecting for cells (or filaments of cells) that maximised protein production.

Virtual genetics

In this simulation, the aim was to study evolution when enzyme function was fixed. Instead, all evolution could work with was the amount of enzyme present in a cell. This could be altered by modifying the rate at which proteins degraded and the rate at which there were expressed (transcription and translation were combined).

Organisation of the virtual genome

LEGEND Enz: enzyme; Op: operator: TF: transcription factor

Each of the five reactions described in the previous post is catalysed by an enzyme that is associated with four parameters. These parameters could be randomly mutated between generations, so act like genes in the simulation. The four genes determine the stability of an enzyme, and the strength of the operator region for three transcription factors.

Stability genes

The first gene for each enzyme determined its stability. The function of these genes is relatively straightforward: they have a value between 0 and 0.96, and during every unit of time, the amount of each enzyme is multiplied by this value. This number represents either the inherent stability of the enzyme or the rate at which it is bound and degraded by a peptidase. Since this value is always less than 1, the amount of enzyme decreases every time unit and cells constantly have to fight protein degradation in order to survive.

A small change in a stability gene can have quite a large effect on the amount of enzyme that can accumulate. If 1 unit of enzyme is produced every time unit, then with a stability of 0.9, the enzyme will accumulate to 10 units, while with a stability of 0.96, the enzyme will accumulate to 25 units.

The stability genes are also crucial for gene regulation because degradation is the only way the cell can reduce the amount of an enzyme in this simulation. This greatly limited how cells could regulate reactions, especially since heterocyst formation requires the quick removal of Photosystem II (reaction 1). Adding enzyme regulation by, for example, phosphorylation, would make cell much more effective, but is complex to implement. In this simulation the lack of other means to reduce the rate of reaction created a regulatory problem for evolution to overcome.

Operator genes

The amount of an enzyme was increased by transcription of the gene by transcription factors (the rate of translation was assumed to be constant). There were three transcription factors in the simulation, because that's the fewest I thought I could get away with. If I were to rerun the simulation, I might add a fourth just to see if it could be made useful.

Transcription Factors

In addition to the five enzymes, there were three transcription factors. The properties of each transcription factor was controlled by five genes. These genes determined which metabolite the transcription factor responded and how the amount of that signal metabolite affected the transcription factors activity. 

Graph of transcription activity

Developer diary 1

Because I created my heterocyst simulation a long time before I got a blog, I can't write about my progress in creating the simulation. However, I have collected some fragments of some emails I sent Guy while I was working on the program. Some of the analysis I write about, and include in later updates. I also found a brief history of the program at the top of the code. As you can see it took me quite a long time to get anywhere.

Jul 04  Got working with simpler reactions
Aug 04  Added enzyme kinetics and diffusion
Oct 04  Added protein degradation and transcription factors
Jan 05  Tidied enzyme and metabolite variables; added replication
Jan 05  Changed regulation system and added (im-/ex- portable) genetics
Feb 05  Integrated crossing program for evolution
Feb 05  Changed regulation to consensus system and separate inhibitors

July 2004

Wikipedia has no results for heterocysts. I'm tempted to write one later.

I did write one later; it's here. It remains one of the few articles that I've created from scratch.

20th January 2005

Those damn heterocysts kept me up until passed 4 the other morning. I can't remember why, but it probably had something to do with avoiding work, that I decided to look at my heterocyst program again and mess about with it. I managed to sort a few things out and of course thought of hundreds more things I should do...

... just as I hope to evolve some heterocysts some day.

30th January 2005

I finally got my heterocysts to replicate and spontaneous form along the growing filament at regular intervals. It's mesmerising to watch them appear, which is why I've spent most of the day staring at a screen of numbers and coloured boxes with excitement. I've also managed to generalise lots of variable for the regulation system and so I've started a little evolution. Of the fifteen mutants of my original heterocyst, three performed ever so slightly more efficiently (grew faster), and loads performed terribly. I think it will take hours to get any really nice results, especially as you need quite a large population for any decent evolution. And loads of generations.

It would be nice to start with no regulation system and try and evolve it, but that would take a long time.

7th Feburary 2005

My heterocyst evolution program is becoming more and more autonomous, so I can leave it evolving all day while I am at lectures. And they're doing well. It's amazing how fast they seem to improve. I wonder if they'll reach a limit some time soon.

8th Feburary 2005

I may email you stuff about heterocyst evolution that I have found. Last night, there was a great, super fast mutation, but after the program had been running for hours and I stopped the program to copy the data to Excel, I accidentally ran the program again, re-writing all the data. Man, I was annoyed. Fortunately, I have since got a mutation that I think is as fast, and if not another generation should do it. I suppose in all of evolution fit individuals have been unlucky (struck by lightening or whatever), so this just mimics that.

Developer diary 2

I've put this email separately as it's a bit longer. Later I'll add the graphs as part of the analysis of the simulation.

15th Feburary 2005

I've attached the promised graphs. I haven't labelled them (GCSE science sin) so this is what they are. The first shows the time it takes for the eight fastest grower of the population to go from 5 to 50 cells. I didn't put all the mutants of the generations there because the earlier generations have fewer cells than the later ones (16 compared to 64) and some of the mutants don't grow at all. As you can see the increase is pretty much linear for the first ten generations at least (it works out at roughly 500 time units increase per generation). However, they seem to reaching the maximum growth rate by Gen 13 and the population is getting more homogenous. They just about managed to break the 10 000 mark. Hooray!

The second graph shows the (nicely exponential) growth rates of the fastest growing mutants of every other generation (to make the graph clearer). I'm not sure what the best way to compare growth rates is. Adding an exponential trend line (final graph, which I wasn't going to add), you can see the first mutant (the ancestor) grew with an e to the 0.1619(time/1000) relationship. The best grower grew with an e to the 0.2396(time/1000) relationship. So what does that mean? Would it be right to say that the fastest grower is 48% faster (0.2369/0.1619)?

I've also started to analyse the mutations in the fastest grower (so this is where all my time's been going). I added them one by one to the ancestor to see the effect on growth. Most have some small effect, a couple actually make it grow slower, and one kills the whole filament, which is nice, because it shows how the mutations are dependent on each other. The one that kills the filament actually reduces the degradation of a certain enzyme thus massively increasing the concentration it reaches. The ancestor cell can't cope with this increase because it doesn't have a good enough regulation system to deal with it. This mutation occurred a few times, but died out. Then it occurred in the 7th generation (in 17th mutant of this generation to be precise), which could cope, and this individual was the fastest grower of the generation (it's about 1000 time units faster, as you can see on the first graph), and so spread the gene throughout the subsequent generations until everyone had it (unless they mutated it again, in which case they were much slower). I've also noticed that the enzyme concentrations in heterocysts are not stable as in the first organisms but oscillate quite a lot. Quite why this is or how it helps I don't know. So much to find out. So much more to try.


Developer diary 3


16th Feburary 2005

... gave me lots of time to finally get my new heterocyst program up and running. It's a little complicated to get an initial regulatory system up and running so I might try and evolve one from scratch, though it could take a long time. Especially now the program takes even longer to run. Still, we shall see. Or at least, I shall see and probably tell you all about it. It has quite a nice and realistic transcription interaction algorithm. The program is currently running with a 2% mutation rate to give me lots of diversity to work with. I'd still like to run the previous program from scratch to see which mutations occur again and if its rate of improvement is the same. All experiments need to be replicated.

17th Feburary 2005

Interesting (to me) development of heterocyst data visualisation techniques. It involves me drawing graph type things pixel by pixel in Paint. Fun.

Thankful I have improved my programming skills since then and can now automatically create these images as SVGs, so can avoid having to use Paint.

18th Feburary 2005

I really wish I had two laptops. One to run the program the other to analyse the data. It seems such a waste to do one when I could be doing the other.

I still wish that I had a computer dedicated to number crunching. Maybe one day.


Developer diary 4

10th May 2005

I got back just in time to see the last heterocyst of generation XVI do it's stuff. I've started playing with them again, and there still seems to be some evolution left in them, though the nice straight line of progress is now bent. Unsurprising really, given that it measures the time it takes to reach 50 cells from 5, and there is definite a limit to how little time that is possible. But they are still progressing and showing some weird mutations, that I'll have to check out some time. I stumbled upon the explanation guide looking for a picture of a heterocyst with a9. I don't know why it was there before Wikipedia, and I was a little surprised to see my work there.

19th May 2005

Annoyingly, I forgot to leave my heterocyst program running while I was out. It would be nice if I could make it a screensaver, like SETI and the Oxford drugs one. I was reading about it in Science just yesterday. Distributed computers are better than supercomputers and free.

31st May 2005

I found a blinking heterocyst. I think lots of the heterocyst that evolved go through oscillations, which might simply be because time is not continuous in the simulation, but in this one it's oxygen levels were oscillating right on the border between values coloured red, and those coloured black, so it looked like it was flashing.

3rd June 2005

I'm currently on generation 6 of a third run of heterocyst evolution. I've made also a made a visual basic program that saves me so much time in data manipulation. I wish I'd done it earlier.

Analysis of stability genes

The first group of genes I've analysed are the stability genes. Below shows the value of the five stability genes in the top organism after four separate runs of evolution. The horizontal line indicates the value of the gene in the ancestor organism. The colour of the bar gives an indication of the fitness of the top organism relative to the other runs of evolution (the darker the bar, the faster the filament of bacteria grew).

I'm part way through looking at detail at these genes and have been developing a way to visualise how genes change in the population over time, which I'll add to this analysis later.

Photosystem II

The stability of Photosystem II in the ancestral organism was much lower than the stability of the other enzymes. There were two reasons for this. First, the amount of Photosystem II was capped at 2.7 units, representing the amount of enzyme that could fit in the plastid membranes. Second, the reaction catalysed by Photosystem II produces oxygen, so it is vital that is enzyme is quickly removed from cells that are transforming into heterocysts.


The evolution of the nitrogenase stability gene is simplest to understand. In all four runs of evolution, the fastest organism has a mutation that increases the stability of nitrogenase to the maximum value of 0.96. In each case, this mutation appeared early on in evolution and quickly spread through the population. This is because the availability of fixed nitrogen is the limiting factor for cell growth, so the more that can be produced, the faster a filament of cells will grow.


The stability of the catabolism enzyme is also the same in all four runs of evolution, however, in this case, the optimum value appears to be the original value of 0.9. At first glance, this is surprising, since the rate of catabolism directly controls the rate of cell growth, so we might predict that the more enzyme the better.

I think the reason that a higher stability enzyme wasn't selected for is that this enzyme uses up fixed carbon and nitrogen. More enzyme will therefore reduce the amount of fixed carbon available to heterocysts to produce energy. In addition, the more stable the catabolism enzyme, the greater the chance that a heterocyst will replicate, which will result in two heterocysts next to one another, which is inefficient for filament growth. It is still surprising that the optimum value exactly 0.9, the value in the ancestral organism.