Science Simulation

One reason to create artificial life is to experiment on it – to see in what way it behaves like real life and what, if anything, it can tell us about real life. While considering what sort of experiments I would like to do, and, what tests I would require to ensure that the simulation was working as expected, it occurred to me that in order to test the artificial cells, I would have to build various tools – additional programs that could slot into the simulation and extract and display information about what was going on. For example, I might want to test the function of various proteins by removing them and measuring the effect on the cell (growth rate, metabolite concentrations etc.).

It struck me that an interesting way to display the information would be to mimic the output a real-life, analogous experiment would give. For example, if I wanted to see where various proteins bound the DNA, I could simulate a DNA footprinting experiment. I then thought that this idea could be expanded to simulate a complete basic molecular toolkit for a biochemist or molecular biologist. This would of course make analysing the organism much harder that simply peering into the code and getting a print out of the variables I was interested in, but it might be more fun and could be used to teach students about various laboratory techniques. It could even be the basis of a game in which players basically play at being microbiologists (although I’m not sure how many people would find that fun, or what the competitive element would be).


Simulating a protein gel

The blue bar is protein shortly after the gel run is started

I have started a simulation of a protein gel. As it runs, a blue bar of protein moves down the gel, getting increasingly diffuse - a little too diffuse at the moment. And the bar eventually disappears, which is annoying. Still, it's a start, and the program is all nice and object oriented. I just need to work out a better equation for simulating the movement of proteins in a gel. After that: more proteins in more lanes with different percentage gels and different colours (makes it less realistic, but more fun). Also, some lateral diffusion might be nice, but could make things unnecessarily computationally complex.

I have also started to reread Barbieri's The Organic Codes: An Introduction to Semantic BiologyI seem to recall that it has a lot of interesting ideas, even if they are presented as being more revolutionary than I think they are. Maybe I just haven't seen their full implications. Nonetheless, I think it will be useful to read while I try to simulate life.

Protein gel update

Four proteins of different sizes and concentrations

I have improved my protein gel. Firstly, I made it much more object-oriented, thus making it easier to create new proteins and samples. As the image shows, I can add a mixture of proteins of different sizes and concentrations (in arbitrary units). These will separate according to their size and show up with an intensity proportional to their concentration. I also made a tiny difference to the code (changing the concentrations from integers to floats) so the proteins no longer disappear and halt, and protein movement works much better. I think I could still alter the equation to get tighter bands and a more realistic simulation. I think it is now sufficiently developed that I should leave it and concentrate more on the artificial life simulation.

There is still more to do so I can add several samples in different lanes. I would also like the ability to change the percentage of the gel, the voltage (maybe with a needlessly complicated simulated voltage pack) and the 'exposure' of the gel. However, I am now leaning more towards making the simulation be for a column, partly so I don't have to deal with diffusion at the edge of the bands and partly because the proteins bands look too spread out to be from a gel.

Simulating electrophoresis

Comparison of virtual gels with different exposures and running times

I made a start on Pygame simulation of an agarose gel on which virtual DNA can be run. The simulation has a number of stages:

  • A gel is made with a given concentration of agarose and size (length and number of lanes).
  • Any of the lanes can be loaded with a solution of DNA. Solution are currently lists of DNA strand lengths and their concentrations, but this will have to improved allow for more complex experiments involving metabolites and proteins.
  • The gel is then run for a number for a specified number of minutes and at a chosen voltage. This moves the position of the various lengths of DNA. At present, they move at a rate proportional to 1 / length, which isn’t a very good simulation.
  • Finally, once the gel has been run, it can be exposed on a UV transilluminator for a certain length of time. The longer the exposure, the brighter the signal, but also the background.

In the gel image, the bottom band corresponds to 100 base pairs (bp) and the second band up to 200 bp. The 100 bp DNA has travelled twice as far as the 200 bp using my function, which is not a very realistic simulation. I have some ideas about how to improve it (and to take into account the agarose concentration), but I’m surprised how difficult it is to think of a sensible formula for movement [see the next post for a much better movement formula]. I have heard that the size separation actually only occurs because of the time it takes different lengths of DNA to line up in the gel, and that once lined up they travel the same speed regardless of length (hence why pulsed field gel electrophoresis can separate larger lengths of DNA), but I’m not convinced.

There is lots more I can do – adding noise, allowing magnification on the transilluminator, changing the aperture, adding diffusion, but first I should sort of the equation for determining how fast DNA of different lengths moves. The image produced is returned by the gel object, where it can be displayed or saved – perhaps to go into a virtual lab book.

Below is an updated version of the program based on an equation described at:

simGel.txt6.44 KB

The mathematics of gel electrophoresis

I have spent some time recently coming up with an equation that can model the movement of DNA through an agarose gel. I've written a brief introduction of how I think agarose gel electrophoresis works, which will help explain my model for the simulation.

Real gels

An agarose gel typically contains 0.7% – 2% agarose (a polymer of sugars). The agarose is dissolved in a heated buffered solution (typically TAE). As the solution cools, the agarose forms cross-links with itself, creating a networked structure like a sponge (which is why it forms a gel). DNA is added to slits at one end of the gel called lanes. Then an electric current is applied, which causes the negatively-charged DNA to move towards the positive electrode.

As the DNA moves through the gel, it is impeded by the agarose cross-links. The higher the concentration of agarose, the smaller the holes in the gel and so the more DNA is impeded. The shorter the DNA, the less likely it is to be impeded, which is why agarose gel electrophoresis can separate DNA of different lengths. Higher concentrations of agarose (say 2% agarose) are better for resolving shorter lengths of DNA, while lower concentrations are better for resolving longer lengths of DNA.

Mathematical Gels

We can assume that DNA is spherical with a diameter proportional to its length (assuming things are spherical seems to be the basis of various mathematical models, though in the case of DNA, it seems a particularly bad assumption, but I’ll explain my reasons later). Then we can view its passage through the gel as multiple encounters with holes of different sizes. If the sphere of DNA is smaller than the hole then it moves forward, otherwise it is impeded. I’m not quite sure what happens in real gels, but I think the idea is that the longer the DNA is, the more likely smaller holes are to impede its flow, perhaps because the DNA is more likely to drag along the sides of the holes.

I decided to describe the distribution of hole sizes in a gel with the gamma distribution.

There was no particular statistical reason for choosing the gamma distribution (though maybe there is a good statistical reason for it working). Instead I chose because it ranges from 0 to infinity (as opposed to, say, minus infinity to infinity in the case of the normal distribution) so will cover the range of possible lengths of DNA. In this model, the probability of a strand of DNA with length, x, moving through a random hole is the probability that the hole is smaller than the DNA (which I’m imagining is spherical), which corresponds to 1 minus the value for the cumulative distribution. This value of 1 minus the cumulative distribution value can also be considered the percentage of the maximum possible distance that the DNA is expected to travel. The maximum possible value will correspond to the gel front and should be proportional to the time the gel is run multiplied by the voltage at which the gel is run.

All that remains is to pick some values for the other parameters of the gamma function. I chose shape parameter (k) value of 2 because that seemed to give a nicely shaped graph and made the calculations easier. The scaling parameter (θ) determines the where the peak of the distribution is, so describes the mean size of holes. This parameter should therefore be a function of the concentration of agarose. I figured that since the size will depend on a molecule of agarose cross-linking with another in three dimensional space, θ should be proportional to 1/(concentration of agarose)³.

Virtual Gels

Using this system and some suitable constants, I got a motility function that is shown in the graph below. The graph shows how smaller DNA moves faster (with infinitely small strands moving at the maximum rate). It also  shows how in a 0.8 % agarose gel, very small lengths of DNA will move at about the same rate (since the curve is nearly flat), so be poorly resolved.  By contrast, in a 1.2 % agarose gel, longer lengths of DNA (7 Kb+) will be poorly resolved. DNA of lengths ~0.5 Kb – ~3 Kb will be resolved better since the curve is steeper, so a small difference in length will result in a large difference in distance travelled.

Electrophoresis motility graph

Interestingly, another way to ask what length of DNA a given gel can best resolve is to ask what lengths of DNA are most differentiated. We can answer this by mathematical differentiation (which is where the word comes from). The graph below is a differentiation of the first graph, which is also the non-cumulative gamma function and can be considered the graph of hole size in different gels. The graph shows that a 1.2 % gel differentiates DNA of length ~1 Kb best, whereas a 0.8 % gel generally differentiates lengths of DNA less well, though its peak is ~4 Kb.

Differentiation of DNA by different gels

Implementing this equation creates a pleasingly realistic result. I added in some unrealistic, but functional diffusion to soften the edges and make the image look that bit more realistic. The first two gels were loaded with a solution of containing lengths of DNA 0.5 – 10 Kb at 0.5 Kb intervals. DNA with lengths a multiple of 2 Kb are at a higher concentration to make measuring easier. The third and fourth gels loaded with a solution of containing lengths of DNA 0.2 – 4 Kb at 0.2 Kb intervals, with multiples of 1 Kb highlighted.

The virtual gels show that, as expected (and as in real life, give or take):

  • A 0.8 % agarose gel gives a good resolution of DNA between 2 – 4 Kb.
  • A 1.0 % agarose gel gives a good resolution of DNA between 1 – 2 Kb, while DNA >4 Kb is poorly resolved and bunches up at the top of the gel in quite a realistic way.
  • A 1.2 % agarose gel gives a good resolution of DNA 0.2 – 1 Kb, while DNA >2 Kb is poorly resolved.
Example gels

In the simulation, the gels were run for 55 minutes and at 20 volts, though both are arbitrary. By tweaking various parameters and constants I think this model could accurately mimic real agarose gel electrophoresis results.