February 23, 2011

Biochemical simulations on GPU

My second project is a CUDA implementation of Smoldyn simulator. Smoldyn is cell-scale biochemical simulator which simulates each molecule of interest individually to capture natural stochasticity and for nanometer-scale spatial resolution. This is a particle-based method that allows simulation of chemical molecules with Brownian motion and different reaction types.
The current version of GPU Smoldyn supports the following features:
  • Isotropic diffusion;
  • Zero/First/Second order reactions;
  • User commands;
  • Surfaces interactions;
So, using this functionality different molecule to surface interactions can be simulated. Reaction-diffusion systems, like a molecular mutation (A->B), synthesis of molecules, or even bi-molecular reactions (A+B->C) can be simulated as well.

Implementation details

This simulation is implemented for GPU using CUDA. It is implemented for Linux, so OpenGL was used for visualization.

I widely used Thrust in this simulation, especially for operations like stream compaction, reduction and scan. For second order reaction processing I needed sorting for spatial hashing. Firstly I used thrust::sort, however, it turned out, that NVIDIA's radix sort kernel is much faster than thrust::sort, because in NVIDIA's implementation you can specify maximum number of sorting bits, which is very convenient for cases, when the maximum value for a sorting key is known. In my case, it is a number of cells-1.

The tricky aspect of this simulation is a dynamic memory allocation, since molecules can die and re spawn during the simulation. Every molecule has a type field. In fact, a molecule is described as:

struct molecule_t
{
       float3 pos;
       int      type;
};

Type equal to -1 represents dead molecule. I allocate enough memory to handle new particles creation during the simulation. At the initialization stage the molecule array is filled with a desired number of molecules, and the rest of it is initialized with dead molecules (with type == -1). A pointer to the last alive molecule is maintained in a global memory. If during the simulation I need to allocate a molecule, the pointer is advanced using atomicAdd and new molecule is generated (new molecules are spawned at the end of the array, maintaining proper order of alive molecules). If I need to deallocate molecule, I just set its type to -1 and then do compaction after the kernel finishes. I use thrust::remove_if operation to move dead molecules to the end of the array.

I achieved average speedup of 200x for this implementation against original Smoldyn and this implementation is statistically correct. I wrote a paper about this implementation and I'll go to SpringSim 2011 to present it, so if you're interested, you can catch me there (or send me an email). I'll post the paper here after it is published in the proceedings.

Some screenshots




No comments: