What kind of scientists are those who do not invent?
Taste the fruits of our labour.

We hope you will like our tools
Introduction GRASShopPER (GPU overlap GRaph ASSembler using Paired End Reads) is the novel assembly method that follows the approach of overlap–layout–consensus (OLC). In the method, a very efficient GPU  implementation of the exact reads alignment algorithm has been used for calculating the scores and shifts on the arcs of the graph. Two-part fork detection strategy has been introduced, which highly reduces misassembly rate in the resulting contigs. The first part is carried out during the graph traversal. In the second part, a greedy hyper-heuristic identifies undetected forks on the basis of paired-end reads information. The results of computational experiments show high coverage of the tested genome.   Download GRASShopPER can be downloaded at https://sourceforge.net/projects/grasshopper-assembler/ For the complete list of parameters, please follow Readme.txt file under the download link. System requirements GRASShopPER requires a computer with graphics processing units, and possibly the environment to run program in parallel manner. Resources used in the assembly process depend on the size of the input library. For example, a genome of bacteria of length 2Mbp requires 17 GB RAM, while one of the human chromosomes requires 82 GB.   Publication To reference GRASShopPER, please cite: A. Swiercz, W. Frohmberg, M. Kierzynka, P. Wojciechowski, P. Zurkowski, J. Badura, A. Laskowski, M. Kasprzak, J. Blazewicz, „GRASShopPER – a hybrid DNA de novo assembly algorithm”, submitted.
CLAIM-MS - CLAIM Multi Source, an expanded version of CLAIM.Authors: Marek Blazewicz1,2, Giovanni Felici3, Aleksandra Swiercz1,4, Daniele Santoni3, Marcin Jaroszewski1, Agnieszka Zmienko1,4, Marta Kasprzak1,4CLAIM-MS is a method for finding functionally related genes. The novelty of this proposition is in its flexibility, as the method integrates information from many input data sources of different types. We successfully validated it on gene expression data produced by diff erent technologies (microarray, RNA-seq) and experiment setups (case-control or multi-class, single-time-point or time-series), on protein-protein interaction networks and Gene Ontology annotations. For each dataset, a gene-gene distance metric needs to be derived in accordance with its nature and the experiment setup.  This approach expands our previous work with, among others: the ability to handle more than two data sources at once; a new robustly converging clustering algorithm (a neural gas method); a more efficient clique detection algorithm; deep analysis of underlying distance matrices, which allow tuning up the evaluation of gene clusters with respect to a particular biological dataset; this procedure significantly improves the overall quality of the outcomes. The instruction on how to run the application can be found at: README The research was supported by grant No. 2012/05/B/ST6/03026 from the National Science Centre,  Poland. A publication presenting both the method and the results is in preparation.1 Institute of Computing Science,Poznan University of Technology, Poznan, Poland. 2 Poznan Supercomputing and Networking Center, Poznan, Poland. 3 Institute for Systems Analysis and Computer Science “Antonio Ruberti”, National Research Council of Italy, Rome, Italy. 4 Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.
CLAIM - coupling co-expression data and protein-protein interaction networks for functional protein analysisCLAIM (CLusterAnalysis Integration Method) is a new method for integrating co-expression data obtained through microarray experiments (MA) and protein-protein interaction (PPI) network data. Microarray and PPI data are separately clustered; the clusters are then merged in a special graph; cliques of this graph would identify a group of functionally related proteins. The biological insight provided by these groups is analyzed on the basis of co-localization and mRNA developmental expression, pointing out the new information that can be obtained by this method.CLAIM can be also used to assign proteins whose functional role is unknown to pathways using the cliques that are strongly associated with known pathways. The basic assumption is that, if a protein belongs to a clique and the other proteins in that clique are in a known pathway, then that protein is likely to belong to that pathway. Based on this assumption, pathway assignment was performed through a score prediction function, based on the presence of a protein in pathway enriched cliques.The prediction power of the algorithm appears to be sufficiently high to make this method a useful semi-automated tool for protein functional analysis.Method CLAIM has been tested on the model organism Arabidopsis thaliana.For more detailed information please read CLAIM README.Daniele Santoni3, Aleksandra Swiercz1,4, Agnieszka Żmieńko1,4, Marta Kasprzak1,4, Marek Blazewicz1,2, Paola Bertolazzi3, Giovanni Felici3, An Integrated Approach (CLuster Analysis Integration Method) to Combine Expression Data and Protein–Protein Interaction Networks in Agrigenomics: Application on Arabidopsis thaliana, OMICS: A Journal of Integrative Biology. January 2014, 18(2): 155-165. doi:10.1089/omi.2013.0050.1 Institute of Computing Science,Poznan University of Technology, Poznan, Poland. 2 Poznan Supercomputing and Networking Center, Poznan, Poland. 3 Institute for Systems Analysis and Computer Science “Antonio Ruberti”, National Research Council of Italy, Rome, Italy. 4 Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.
SR-ASM algorithm
SR-ASM (Short Reads ASseMbly) algorithm is designed for DNA assembly of the short sequences coming from 454 sequencers. Here you can download the source code of the SR-ASM (Short Reads ASseMbly) algorithm, together with the sample data. The algorithm was implemented in C++ language, and tested under UNIX system (SunOS 5.9). To build the source, you will need to unpack the archive, and type 'make' in the directory where the source files were unpacked. See the file "readme.txt" for more information.Usefulness of the algorithm has been proven in tests on raw data generated during sequencing of the whole 1.84 Mbp genome of bacteria Prochlorococcus marinus. The tests of the SR-ASM algorithm were carried out on SUN Fire 6800 in Poznan Supercomputing and Networking Center.sr_asm.tar.gz: 23.84 KBreadme.txt: 1.57 KBsample.tar.gz: 59.55 KBDetailed information about the algorithm is available here. The paper with its description and computational results is:* J. Blazewicz, M. Bryja, M. Figlerowicz, P. Gawron, M. Kasprzak, E. Kirton, D. Platt, J. Przybytek, A. Swiercz, L. Szajkowski, "Whole genome assembly from 454 sequencing output via modified DNA graph concept", Computational Biology and Chemistry 33 (2009) 224-230.The newest version of the algorithm, which optionally can be compiled for GPU:Download the sourcecodeThe paper including implementation details is to be published in 2013 in the journal Foundations of Computing and Decision Sciences.Instruction how to compile the program is present in readme.txt. The algorithm can be run with different heuristics for searching for the solution ('greedy', 'flow' or 'acyclic'). Greedy is the default one, to choose the other you need to execute the program with the parameter--path-algorithm <heuristic>. For example, if you would like to run the program for the data file 'dataset.fasta' using GPU, and the 'flow' algorithm:cd algorithm./configuremake GPU=YES./alignment.exe --path-algorithm=flow dataset.fastaRun program with no parameters if you'd like to display help.