SR-ASM (Short Reads ASseMbly) algorithm is designed for DNA assembly of the short sequences coming from 454 sequencers.
Here you can download the source code of the SR-ASM (Short Reads ASseMbly) algorithm, together with the sample data. The algorithm was implemented in C++ language, and tested under UNIX system (SunOS 5.9). To build the source, you will need to unpack the archive, and type 'make' in the directory where the source files were unpacked. See the file "readme.txt" for more information.
Usefulness of the algorithm has been proven in tests on raw data generated during sequencing of the whole 1.84 Mbp genome of bacteria Prochlorococcus marinus. The tests of the SR-ASM algorithm were carried out on SUN Fire 6800 in Poznan Supercomputing and Networking Center
: 23.84 KBreadme.txt
: 1.57 KBsample.tar.gz
: 59.55 KB
Detailed information about the algorithm is available here
. The paper with its description and computational results is:
* J. Blazewicz, M. Bryja, M. Figlerowicz, P. Gawron, M. Kasprzak, E. Kirton, D. Platt, J. Przybytek, A. Swiercz, L. Szajkowski, "Whole genome assembly from 454 sequencing output via modified DNA graph concept", Computational Biology and Chemistry 33 (2009) 224-230.
The newest version of the algorithm, which optionally can be compiled for GPU:Download the sourcecode
The paper including implementation details is to be published in 2013 in the journal Foundations of Computing and Decision Sciences.
Instruction how to compile the program is present in readme.txt. The algorithm can be run with different heuristics for searching for the solution ('greedy', 'flow' or 'acyclic'). Greedy is the default one, to choose the other you need to execute the program with the parameter--path-algorithm <heuristic>. For example, if you would like to run the program for the data file 'dataset.fasta' using GPU, and the 'flow' algorithm:
./alignment.exe --path-algorithm=flow dataset.fasta
Run program with no parameters if you'd like to display help.