Debian Med Project
Help us to see Debian used by medical practicioners and researchers! Join us on the Alioth page.
Summary
Biology
Debian Med micro-biology packages

This meta package will install Debian packages related to molecular biology, structural biology and bioinformatics for use in life sciences.

The list to the right includes various software projects which are of some interest to the Debian Med Project. Currently, only a few of them are available as Debian packages. It is our goal, however, to include all software in Debian Med which can sensibly add to a high quality Debian Integrated Solution.

For a better overview of the project's availability as a Debian package, each head row has a color code according to this scheme:

If you discover a project which looks like a good candidate for Debian Med to you, or if you have prepared an unofficial Debian package, please do not hesitate to send a description of that project to the Debian Med mailing list

Links to other tasks

Debian Med Biology packages

Official Debian packages

Adun.app
Molecular Simulator for GNUstep
Version: 0.8.2
License: DFSG free
Official Debian package -
This is a new extendible molecular simulation program that also includes data management and analysis capabilities.
Altree
program to perform phylogeny based analyses
Version: 1.0.1
License: DFSG free
Official Debian package -
ALTree was designed to perform phylogeny based analysis: first, it allows the detection of an association between a candidate gene and a disease, and second, it enables to make hypothesis about the susceptibility loci.
Amap-align
Protein multiple alignment by sequence annealing
Version: 2.2
License: DFSG free
Official Debian package -
AMAP is a command line tool to perform multiple alignment of peptidic sequences. It utilizes posterior decoding, and a sequence-annealing alignment, instead of the traditional progressive alignment method. It is the only alignment program that allows to control the sensitivity / specificity tradeoff. It is based on the ProbCons source code, but uses alignment metric accuracy and eliminates the consistency transformation.
The java visualisation tool of AMAP 2.2 is not yet packaged in Debian.
Arb
Integrated package for sequence database handling and analysis
Version: 0.0.20071207.1
License: non-free
Debian package in non-free -
The ARB software is a graphically oriented package comprising various tools for sequence database handling and data analysis. A central database of processed (aligned) sequences and any type of additional data linked to the respective sequence entries is structured according to phylogeny or other user defined criteria.
The ARB project (latin, "arbor"=tree) is a joint initiative of the Lehrstuhl fuer Mikrobiologie http://www.mikro.biologie.tu-muenchen.de/ and the Lehrstuhl fuer Rechnertechnik und Rechnerorganisation http://wwwbode.informatik.tu-muenchen.de/ of the Technical University of Munich.
Autodock
analysis of ligand binding to protein structure
Version: 4.0.1
License: DFSG free
Official Debian package -
AutoDock is a prime representative of the programs addressing the simulation of the docking of fairly small chemical ligands to rather big protein receptors. Earlier versions had all flexibility in the ligands while the protein was kept rather ridgid. This latest version 4 also allows for a flexibility of selected sidechains of surface residues, i.e., takes the rotamers into account.
The AutoDock program performs the docking of the ligand to a set of grids describing the target protein. AutoGrid pre-calculates these grids.
Autogrid
pre-calculate binding of ligands to their receptor
Version: 4.0.1
License: DFSG free
Official Debian package -
The AutoDockSuite addresses the molecular analysis of the docking of a smaller chemical compounds to their receptors of known three-dimensional structure.
The AutoGrid program performs pre-calculations for the docking of a ligand to a set of grids that describe the effect that the protein has on point charges. The effect of these forces on the ligand is then analysed by the AutoDock program.
Biococoa.app
biological sequence file format conversion applet for GNUstep
Version: 1.6.0
License: DFSG free
Official Debian package -
Demo application to demonstrate the possibilities of the BioCocoa framework.
This package contains a GNUstep applet to convert between sequence file formats. The BioCocoa framework provides developers with the opportunity to add support for reading and writing BEAST, Clustal, EMBL, Fasta, GCG-MSF, GDE, Hennig86, NCBI, NEXUS, NONA, PDB, Phylip, PIR, Plain/Raw, Swiss-Prot and TNT files by writing only three lines of code. The framework is written in Cocoa (Objective-C).
Version 1.6 is the last upstream version that works with GNUstep. If newer versions are needed to work under Linux try to convince upstream to support GNUstep.
Biosquid
utilities for biological sequence analysis
Version: 1.9g+cvs20050121
License: DFSG free
Official Debian package -
SQUID is a library of C code functions for sequence analysis. It also includes a number of small utility programs to convert, show statistics, manipulate and do other functions on sequence files.
The original name of the package is "squid", but since there is already a squid on the archive (a proxy cache), it was renamed to "biosquid".
Blast2
Basic Local Alignment Search Tool
Maintainer: Aaron M. Ucko
Version: 1:2.2.18.20080302
License: DFSG free
Official Debian package -
The famous sequence alignment program. This is "official" NCBI version, #2. The blastall executable allows you to give a nucleotide or protein sequence to the program. It is compared against databases and a summary of matches is returned to the user.
Note that databases are not included in Debian; they must be retrieved manually.
Boxshade
Pretty-printing of multiple sequence alignments
Version: 3.3.1
License: DFSG free
Official Debian package -
Boxshade is a program for creating good looking printouts from multiple-aligned protein or DNA sequences. The program does not perform the alignment by itself and requires as input a file that was created by a multiple alignment program or manually edited with respective tools.
Boxshade reads multiple-aligned sequences from either PILEUP-MSF, CLUSTAL-ALN, MALIGNED-data and ESEE-save files (limited to a maximum of 150 sequences with up to 10000 elements each). Various kinds of shading can be applied to identical/similar residues. Output is written to screen or to a file in the following formats: ANSI/VT100, PS/EPS, RTF, HPGL, ReGIS, LJ250-printer, ASCII, xFIG, PICT, HTML
Clustalw
global multiple nucleotide or peptide sequence alignment
Version: 2.0.9
License: non-free
Debian package in non-free -
This program performs an alignment of multiple nucleotide or amino acid sequences. It recognizes the format of input sequences and whether the sequences are nucleic acid (DNA/RNA) or amino acid (proteins). The output format may be selected from in various formats for multiple alignments such as Phylip or FASTA. Clustal W is very well accepted.
The output of Clustal W can be edited manually but preferably with an alignment editor like SeaView or within its companion Clustal X. When building a model from your alignment, this can be applied for improved database searches. The Debian package hmmer creates such in form of an HMM.
For details and citation purposes see paper "Clustal W and Clustal X version 2.0", Larkin M., et al. Bioinformatics 2007 23(21):2947-2948
Clustalw-mpi
MPI-distributed global sequence alignment with ClustalW
Version: 0.15
License: non-free
Debian package in non-free -
ClustalW is a popular tool for multiple sequence alignment. The alignment is achieved via three steps: pairwise alignment, guide-tree generation and progressive alignment. ClustalW-MPI is an MPI implementation of ClustalW. Based on version 1.82 of the original ClustalW, both the pairwise and progressive alignments are parallelized with MPI, a popular message passing programming standard. The pairwise alignments can be easily parallelized since the many alignments are time independent on each other. However the progressive alignments are essentially not parallelizable because of the time dependencies between each alignment.
Here the recursive parallelism paradigm is applied to the linear space profile-profile alignment algorithm. This approach is more time efficient on computers with distributed memory architecture. Traditional approach that relies on precomputing the profile-profile score matrix has also been implemented. Results shown the latter is indeed more appropriate for shared memory multiprocessor computer.
ClustalX is suggested for its support for local realignments, seaview is a versatile editor of alignments.
The original ClustalW/ClustalX can be found at URL: http://www.clustal.org/download/pre-2/
Clustalx
GUI for Clustal W
Version: 1.83
License: non-free
Debian package in non-free -
This package offers a GUI interface for the Clustal W multiple sequence alignment program. It provides an integrated environment for performing multiple sequence- and profile-alignments to analyse the results. The sequence alignment is displayed in a window on the screen. A versatile coloring scheme has been incorporated to highlight conserved features in the alignment. For professional presentations, one should use the texshade LaTeX package or boxshade.
The pull-down menus at the top of the window allow you to select all the options required for traditional multiple sequence and profile alignment. You can cut-and-paste sequences to change the order of the alignment; you can select a subset of sequences to be aligned; you can select a sub-range of the alignment to be realigned and inserted back into the original alignment.
An alignment quality analysis can be performed and low-scoring segments or exceptional residues can be highlighted.
Dialign
Segment-based multiple sequence alignment
Version: 2.2.1
License: DFSG free
Official Debian package -
DIALIGN2 is a command line tool to perform multiple alignment of protein or DNA sequences. It constructs alignments from gapfree pairs of similar segments of the sequences. This scoring scheme for alignments is the basic difference between DIALIGN and other global or local alignment methods. Note that DIALIGN does not employ any kind of gap penalty. It has been published by Morgenstern B. in Bioinformatics. 1999 Mar;15(3):211-8.
Dialign-tx
Segment-based multiple sequence alignment
Version: 1.0.1
License: DFSG free
Official Debian package -
DIALIGN-TX is a command line tool to perform multiple alignment of protein or DNA sequences. It is a complete reimplementation of the segment-base approach including several new improvements and heuristics that significantly enhance the quality of the output alignments compared to DIALIGN 2.2 and DIALIGN-T. For pairwise alignment, DIALIGN-TX uses a fragment-chaining algorithm that favours chains of low-scoring local alignments over isolated high-scoring fragments. For multiple alignment, DIALIGN-TX uses an improved greedy procedure that is less sensitive to spurious local sequence similarities.
DIALIGN-TX has been published in Amarendran R. Subramanian, Michael Kaufmann, Burkhard Morgenstern: Improvement of the segment-based approach for multiple sequence alignment by combining greedy and progressive alignment strategies, Algorithms for Molecular Biology 3:6, 2008
Emboss
The European Molecular Biology Open Software Suite
Version: 5.0.0
License: DFSG free
Official Debian package -
EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web. Also, as extensive libraries are provided with the package, it is a platform to allow other scientists to develop and release software in true open source spirit. EMBOSS also integrates a range of currently available packages and tools for sequence analysis into a seamless whole. EMBOSS breaks the historical trend towards commercial software packages.
Reference for EMBOSS: Rice,P. Longden,I. and Bleasby,A. "EMBOSS: The European Molecular Biology Open Software Suite" Trends in Genetics June 2000, vol 16, No 6. pp.276-277
Exonerate
generic tool for pairwise sequence comparison
Version: 2.1.0
License: DFSG free
Official Debian package -
Exonerate allows you to align sequences using a many alignment models, using either exhaustive dynamic programming, or a variety of heuristics. Much of the functionality of the Wise dynamic programming suite was reimplemented in C for better efficiency. Exonerate is an intrinsic component of the building of the Ensembl genome databases, providing similarity scores between RNA and DNA sequences and thus determining splice variants and coding sequences in general.
An In-silico PCR Experiment Simulation System (see the ipcress man page) is packaged with exonerate.
This package also comes with a selection of utilities for performing simple manipulations quickly on fasta files beyond 2Gb
Fastdnaml
Tool for construction of phylogenetic trees of DNA sequences
Version: 1.2.2
License: DFSG free
Official Debian package -
fastDNAml is a program derived from Joseph Felsenstein's version 3.3 DNAML (part of his PHYLIP package). Users should consult the documentation for DNAML before using this program.
fastDNAml is an attempt to solve the same problem as DNAML, but to do so faster and using less memory, so that larger trees and/or more bootstrap replicates become tractable. Much of fastDNAml is merely a recoding of the PHYLIP 3.3 DNAML program from PASCAL to C.
Fastlink
A faster version of pedigree programs of Linkage
Version: 4.1P
License: DFSG free
Official Debian package -
Fastlink is much faster than the original Linkage but does not implement all the programs.
Garlic
A visualization program for biomolecules
Maintainer: Debichem Team
Version: 1.6
License: DFSG free
Official Debian package -
Garlic is written for the investigation of membrane proteins. It may be used to visualize other proteins, as well as some geometric objects. This version of garlic recognizes PDB format version 2.1. Garlic may also be used to analyze protein sequences.
It only depends on the X libraries, no other libraries are needed.
Features include:
 - The slab position and thickness are visible in a small window.
 - Atomic bonds as well as atoms are treated as independent drawable
   objects.
 - The atomic and bond colors depend on position. Five mapping modes
   are available (as for slab).
 - Capable to display stereo image.
 - Capable to display other geometric objects, like membrane.
 - Atomic information is available for atom covered by the mouse
   pointer. No click required, just move the mouse pointer over the
   structure!
 - Capable to load more than one structure.
 - Capable to draw Ramachandran plot, helical wheel, Venn diagram,
   averaged hydrophobicity and hydrophobic moment plot.
 - The command prompt is available at the bottom of the main window.
   It is able to display one error message and one command string.
Gdpc
visualiser of molecular dynamic simulations
Version: 2.2.4
License: DFSG free
Official Debian package -
gpdc is a graphical program for visualising output data from molecular dynamics simulations. It reads input in the standard xyz format, as well as other custom formats, and can output pictures of each frame in JPG or PNG format.
Gff2aplot
pair-wise alignment-plots for genomic sequences in PostScript
Version: 2.0
License: DFSG free
Official Debian package -
A program to visualize the alignment of two genomic sequences together with their annotations. From GFF-format input files it produces PostScript figures for that alignment. The following menu lists many features of gff2aplot:
 * Comprehensive alignment plots for any GFF-feature. Attributes are defined
   separately so you can modify only whatsoever attributes for a given file or
   share same customization across different data-sets.
 * All parameters are set by default within the program, but it can be also
   fully configured via gff2ps-like flexible customization files. Program can
   handle several of such files, summarizing all the settings before producing
   the corresponding figure. Moreover, all customization parameters can be set
   via command-line switches, which allows users to play with those parameters
   before adding any to a customization file.
 * Source order is taken from input files, if you swap file order you can
   visualize alignment and its annotation with the new input arrangement.
 * All alignment scores can be visualized in a PiP box below gff2aplot area,
   using grey-color scale, user-defined color scale or score-dependent
   gradients.
 * Scalable fonts, which can also be chosen among the basic PostScript default
   fonts. Feature and group labels can be rotated to improve readability in
   both annotation axes.
 * The program is still defined as a Unix filter so it can handle data from
   files, redirections and pipes, writing output to standard-output and
   warnings to standard error.
 * gff2aplot is able to manage many physical page formats (from A0 to A10, and
   more -see available page sizes in its manual-), including user-defined ones.
   This allows, for instance, the generation of poster size genomic maps, or
   the use of a continuous-paper supporting plotting device, either in portrait
   or landscape.
 * You can draw different alignments on same alignment plot and distinguish
   them by using different colors for each.
 * Shape dictionary has been expanded, so that further feature shapes are now
   available (see manual).
 * Annotation projections through alignment plots (so called ribbons) emulate
   transparencies via complementary color fill patterns. This feature allows
   to show color pseudo-blending when horizontal and vertical ribbons overlap.
Gff2ps
produces PostScript graphical output from GFF-files
Version: 0.98d
License: DFSG free
Official Debian package -
gff2ps is a script program developed with the aim of converting gff-formatted records into high quality one-dimensional plots in PostScript. Such plots maybe useful for comparing genomic structures and to visualizing outputs from genome annotation programs. It can be used in a very simple way, because it assumes that the GFF file itself carries enough formatting information, but it also allows through a number of options and/or a configuration file, for a great degree of customization.
Ghemical
A GNOME molecular modelling environment
Maintainer: Debichem Team
Version: 2.95
License: DFSG free
Official Debian package -
Ghemical is a computational chemistry software package written in C++. It has a graphical user interface and it supports both quantum- mechanics (semi-empirical) models and molecular mechanics models. Geometry optimization, molecular dynamics and a large set of visualization tools using OpenGL are currently available.
Ghemical relies on external code to provide the quantum-mechanical calculations. Semi-empirical methods MNDO, MINDO/3, AM1 and PM3 come from the MOPAC7 package (Public Domain), and are included in the package. The MPQC package is used to provide ab initio methods: the methods based on Hartree-Fock theory are currently supported with basis sets ranging from STO-3G to 6-31G**.
Glam2
gapped protein motifs from unaligned sequences
Version: 1064
License: DFSG free
Official Debian package -
GLAM2 is a software package for finding motifs in sequences, typically amino-acid or nucleotide sequences. A motif is a re-occurring sequence pattern: typical examples are the TATA box and the CAAX prenylation motif. The main innovation of GLAM2 is that it allows insertions and deletions in motifs.
The package includes these programs:
 glam2:       discovering motifs shared by a set of sequences;
 glam2scan:   finding matches, in a sequence database, to a motif discovered
              by glam2;
 glam2format: converting glam2 motifs to  standard alignment formats;
 glam2mask:   masking glam2 motifs out of sequences, so that weaker motifs
              can be found;
 glam2-purge: removing highly similar members of a set of sequences.
In this package, the fast Fourier algorithm (FFT) was enabled for glam2.
If you use GLAM2, please cite: MC Frith, NFW Saunders, B Kobe, TL Bailey (2008) Discovering sequence motifs with arbitrary insertions and deletions, PLoS Computational Biology (in press).
Gromacs
Molecular dynamics simulator, with building and analysis tools
Maintainer: Debichem Team
Version: 3.3.3
License: DFSG free
Official Debian package -
GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.
It is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non- biological systems, e.g. polymers.
GROMACS offers entirely too many features for a brief description to do it justice. A more complete listing is available at <http://www.gromacs.org/content/view/12/176/>.
Hmmer
profile hidden Markov models for protein sequence analysis
Version: 2.3.2
License: DFSG free
Official Debian package -
HMMER is an implementation of profile hidden Markov model methods for sensitive searches of biological sequence databases using multiple sequence alignments as queries.
Given a multiple sequence alignment as input, HMMER builds a statistical model called a "hidden Markov model" which can then be used as a query into a sequence database to find (and/or align) additional homologues of the sequence family.
Kalign
Global and progressive multiple sequence alignment
Version: 2.03
License: DFSG free
Official Debian package -
Kalign is a command line tool to perform multiple alignment of biological sequences. It employs the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of the alignment. It uses global, progressive alignment approach, enriched by employing an approximate string-matching algorithm to calculate sequence distances and by incorporating local matches into the otherwise global alignment. In comparisons made by its authors, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods. It has been published in Lassmann and Sonnhammer, BMC Bioinformatics 2005, 6:298.
Loki
MCMC linkage analysis on general pedigrees
Version: 2.4.7.4
License: DFSG free
Official Debian package -
Performs Markov chain Monte Carlo multipoint linkage analysis on large, complex pedigrees. The current package supports analyses on quantitative traits only, although this restriction will be lifted in later versions. Joint estimation of QTL number, position and effects uses Reversible Jump MCMC. It is also possible to perform affected only IBD sharing analyses.
The homepage of this project used to be at http://loki.homeunix.net but the project is dead now and the homepage vanished. The Homepage field above points to the web archive.
Maq
maps short fixed-legth polymporphic DNA sequence reads to reference sequences
Version: 0.6.7
License: GPL
Official Debian package -
Maq (short for Mapping and Assembly with Quality) builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has a preliminary functionality to handle ABI SOLiD data. Maq is previously known as mapass2.
With Maq you can:
 - Fast align Illumina/SOLiD reads to the reference genome. With the
   default options, one million pairs of reads can be mapped to the
   human genome in about 10 CPU hours with less than 1G memory.
 - Accurately measure the error probability of the alignment of each
   individual read.
 - Call the consensus genotypes, including homozygous and heterozygous
   polymorphisms, with a Phred probabilistic quality assigned to each base.
 - Find short indels with paired end reads.
 - Accurately find large scale genomic deletions and translocations with
   paired end reads.
 - Discover potential CNVs by checking read depth.
 - Evaluate the accuracy of raw base qualities from sequencers and help
   to check the systematic errors.
However, Maq can NOT:
 - Do de novo assembly. (Maq can only call the consensus by mapping reads
   to a known reference.)
 - Map shorts reads against themselves. (Maq can only find complete overlap
   between reads.)
 - Align capillary reads or 454 reads to the reference. (Maq cannot align
   reads longer than 63bp.)
This package is likely to be useful for users working with genetics or genomic studies in biology who need to assembly DNA sequences from fixed-length sequencers.
Melting
computing the melting temperature of nucleic acid duplex
Version: 4.2h
License: DFSG free
Official Debian package -
This program computes, for a nucleic acid duplex, the enthalpy, the entropy and the melting temperature of the helix-coil transitions. Three types of hybridisation are possible: DNA/DNA, DNA/RNA, and RNA/RNA. The program first computes the hybridisation enthalpy and entropy from the elementary parameters of each Crick's pair by the nearest-neighbor method. Then the melting temperature is computed. The set of thermodynamic parameters can be easily changed, for instance following an experimental breakthrough. Melting was published in Le Novère N. (2001), Bioinformatics, 17: 1226-1227.
Mipe
Tools to store PCR-derived data
Version: 1.1
License: DFSG free
Official Debian package -
MIPE provides a standard format to exchange and/or storage of all information associated with PCR experiments using a flat text file. This will:
 * allow for exchange of PCR data between researchers/laboratories
 * enable traceability of the data
 * prevent problems when submitting data to dbSTS or dbSNP
 * enable the writing of standard scripts to extract data (e.g. a
   list of PCR primers, SNP positions or haplotypes for different animals)
Although this tool can be used for data storage, it's primary focus should be data exchange. For larger repositories, relational databases are more appropriate for storage of these data. The MIPE format could then be used as a standard format to import into and/or export from these databases.
MIPE was published in: Aerts J & Veenendaal T. MIPE - a XML-format to facilitate the storage and exchange of PCR-related data. Online Journal of Bioinformatics 6(2): 114-120 (2005).
Molphy
Program Package for MOLecular PHYlogenetics
Version: 2.3b3
License: non-free
Debian package in non-free -
ProtML is a main program in MOLPHY for inferring evolutionary trees from PROTein (amino acid) sequences by using the Maximum Likelihood method. Other programs (C language)
 NucML:  Maximum Likelihood Inference of Nucleic Acid Phylogeny
 ProtST: Basic Statistics of Protein Sequences
 NucST:  Basic Statistics of Nucleic Acid Sequences
 NJdist: Neighbor Joining Phylogeny from Distance Matrix
Utilities (Perl)
 mollist:  get identifiers list        molrev:   reverse DNA sequences
 molcat:   concatenate sequences       molcut:   get partial sequences
 molmerge: merge sequences             nuc2ptn:  DNA -> Amino acid
 rminsdel: remove INS/DEL sites        molcodon: get specified codon sites
 molinfo:  get varied sites            mol2mol:  MOLPHY format beautifer
 inl2mol:  Interleaved -> MOLPHY       mol2inl:  MOLPHY -> Interleaved
 mol2phy:  MOLPHY -> Sequential        phy2mol:  Sequential -> MOLPHY
 must2mol: MUST -> MOLPHY              etc.
Mozilla-biofox
extension of bioinformatics tools to Iceape and Iceweasel browsers
Version: 1.1.4
License: DFSG free
Official Debian package -
Code bioFOX aims at implementing various bioinformatics tools as an extension on the Iceape and Iceweasel browsers. Analysis of your favorite gene(s) usually require(s) retrieving it from a database like NCBI or Swiss-Prot and then performing one or more tasks including but not limited to:
 * Translation of a nucleotide sequence;
 * Blast search (eg. blastn, blastp etc.) of the desired nucleotide/protein
   sequence;
 * Calculation of properties (like PI, charge, molecular weight, AT/GC content
   etc.) of a protein/nucleotide sequence;
 * Conversion between formats (Genbank, Fasta, Swiss-Prot etc.);
 * Prediction of sequence for sub-cellular localization (PREDOTAR, TargetP,
   pSORT etc).
Mummer
Efficient sequence alignment of full genomes
Version: 3.20
License: DFSG free
Official Debian package -
MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. For example, MUMmer 3.0 can find all 20-basepair or longer exact matches between a pair of 5-megabase genomes in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer. MUMmer can also align incomplete genomes; it handles the 100s or 1000s of contigs from a shotgun sequencing project with ease, and will align them to another set of contigs or a genome using the NUCmer program included with the system. If the species are too divergent for DNA sequence alignment to detect similarity, then the PROmer program can generate alignments based upon the six-frame translations of both input sequences.
Muscle
Multiple alignment program of protein sequences
Version: 3.70+fix1
License: DFSG free
Official Debian package -
MUSCLE is a multiple alignment program for protein sequences. MUSCLE stands for multiple sequence comparison by log-expectation. In the authors tests, MUSCLE achieved the highest scores of all tested programs on several alignment accuracy benchmarks, and is also one of the fastest programs out there.
Ncbi-epcr
Tool to test a DNA sequence for the presence of sequence tagged sites
Version: 2.3.10
License: DFSG free
Official Debian package -
Electronic PCR (e-PCR) is computational procedure that is used to identify sequence tagged sites(STSs), within DNA sequences. e-PCR looks for potential STSs in DNA sequences by searching for subsequences that closely match the PCR primers and have the correct order, orientation, and spacing that could represent the PCR primers used to generate known STSs.
The new version of e-PCR implements a fuzzy matching strategy. To reduce likelihood that a true STS will be missed due to mismatches, multiple discontigous words may be used instead of a single exact word. Each of this word has groups of significant positions separated by 'wildcard' positions that are not required to match. In addition, it is also possible to allow gaps in the primer alignments.
The main motivation for implementing reverse searching (called Reverse e-PCR) was to make it feasible to search the human genome sequence and other large genomes. The new version of e-PCR provides a search mode using a query sequence against a sequence database.
Ncbi-tools-bin
NCBI libraries for biology applications (text-based utilities)
Maintainer: Aaron M. Ucko
Version: 6.1.20080302
License: DFSG free
Official Debian package -
This package includes various utilities distributed with the NCBI C SDK. None of the programs in this package require X; you can find the X-based utilities in the ncbi-tools-x11 package. BLAST and related tools are in a separate package (blast2).
Ncbi-tools-x11
NCBI libraries for biology applications (X-based utilities)
Maintainer: Aaron M. Ucko
Version: 6.1.20080302
License: DFSG free
Official Debian package -
This package includes some X-based utilities distributed with the NCBI C SDK: Cn3D, Network Entrez, Sequin, ddv, and udv. These programs are not part of ncbi-tools-bin because they depend on several additional library packages.
Njplot
A tree drawing program
Version: 2.2
License: DFSG free
Official Debian package -
NJplot is able to draw any tree expressed in the standard phylogenetic tree format (e.g., the format used by the Phylip package). NJplot is especially convenient for rooting the unrooted trees obtained from parsimony, distance or maximum likelihood tree-building methods.
Perlprimer
Graphical design of primers for PCR
Version: 1.1.14
License: DFSG free
Official Debian package -
PerlPrimer is a free, open-source GUI application written in Perl that designs primers for standard Polymerase Chain Reaction (PCR), bisulphite PCR, real-time PCR (QPCR) and sequencing. It aims to automate and simplify the process of primer design.
If operated online, the tool nicely communicates with the Ensembl project for further insights into the gene structure, i.e., allowing for taking the location of exons and introns into account for the design of the primers. The sequences themselves can be retrieved, too.
Phylip
[Biology] A package of programs for inferring phylogenies
Version: 1:3.68
License: non-free
Debian package in non-free -
The PHYLogeny Inference Package is a package of programs for inferring phylogenies (evolutionary trees) from sequences. Methods that are available in the package include parsimony, distance matrix, and likelihood methods, including bootstrapping and consensus trees. Data types that can be handled include molecular sequences, gene frequencies, restriction sites, distance matrices, and 0/1 discrete characters.
Plasmidomics
draw plasmids and vector maps with PostScript graphics export
Version: 0.2.0
License: DFSG free
Official Debian package -
Plasmidomics is written for easy drawing of plasmids and vector maps to use them in theses, presentations or other forms of publications. It natively supports PostScript as output format.
Poa
Partial Order Alignment for multiple sequence alignment
Version: 2.0+20060928
License: DFSG free
Official Debian package -
POA is Partial Order Alignment, a fast program for multiple sequence alignment (MSA) in bioinformatics. Its advantages are speed, scalability, sensitivity, and the superior ability to handle branching / indels in the alignment. Partial order alignment is an approach to MSA, which can be combined with existing methods such as progressive alignment. POA optimally aligns a pair of MSAs and which therefore can be applied directly to progressive alignment methods such as CLUSTAL. For large alignments, Progressive POA is 10-30 times faster than CLUSTALW. POA is published in Bioinformatics. 2004 Jul 10;20(10):1546-56.
Primer3
Tool to design flanking oligo nucleotides for DNA amplification
Version: 1.1.4
License: DFSG free
Official Debian package -
Primer3 picks primers for Polymerase Chain Reactions (PCRs), considering as criteria oligonucleotide melting temperature, size, GC content and primer-dimer possibilities, PCR product size, positional constraints within the source sequence, and miscellaneous other constraints. All of these criteria are user-specifiable as constraints, and some are specifiable as terms in an objective function that characterizes an optimal primer pair.
It has been published in Rozen S and Skaletsky H, "Primer3 on the WWW for general users and for biologist programmers.", Methods Mol Biol. 2000;132:365-86.
The Whitehead Institute for Biomedical Research provides a web-based front end to Primer3.
Probcons
PROBabilistic CONSistency-based multiple sequence alignment
Version: 1.12
License: DFSG free
Official Debian package -
Tool for generating multiple alignments of protein sequences. Using a combination of probabilistic modeling and consistency-based alignment techniques, PROBCONS has achieved the highest accuracies of all alignment methods to date. On the BAliBASE benchmark alignment database, alignments produced by PROBCONS show statistically significant improvement over current programs, containing an average of 7% more correctly aligned columns than those of T-Coffee, 11% more correctly aligned columns than those of CLUSTAL W, and 14% more correctly aligned columns than those of DIALIGN. Probcons is published in Do, C.B., Mahabhashyam, M.S.P., Brudno, M., and Batzoglou, S. 2005. Genome Research 15: 330-340.
Proda
multiple alignment of protein sequences
Version: 1.0
License: DFSG free
Official Debian package -
ProDA is a system for automated detection and alignment of homologous regions in collections of proteins with arbitrary domain architectures. Given an input set of unaligned sequences, ProDA identifies all homologous regions appearing in one or more sequences, and returns a collection of local multiple alignments for these regions.
ProDA is published in: Phuong T.M., Do C.B., Edgar R.C., and Batzoglou S. Multiple alignment of protein sequences with repeats and rearrangements. Nucleic Acids Research 2006 34(20), 5932-5942.
Pymol
Molecular Graphics System
Maintainer: Debichem Team
Version: 1.1
License: DFSG free
Official Debian package -
PyMOL is a molecular graphics system targetted at medium to large biomolecules like proteins. It can generate high-quality publication-ready molecular graphics images and animations.
Features include:
 * Visualization of molecules, molecular trajectories and surfaces
   of crystallography data or orbitals
 * Molecular builder and sculptor
 * Internal raytracer and movie generator
 * Fully extensible and scriptable via a python interface
File formats PyMOL can read include PDB, XYZ, CIF, MDL Molfile, ChemDraw, CCP4 maps, XPLOR maps and Gaussian cube maps.
R-cran-qtl
GNU R package for genetic marker linkage analysis
Maintainer: Steffen Moeller
Version: 1.09
License: DFSG free
Official Debian package -
R/qtl is an extensible, interactive environment for mapping quantitative trait loci (QTLs) in experimental crosses. It is implemented as an add-on-package for the freely available and widely used statistical language/software R (see http://www.r-project.org).
The development of this software as an add-on to R allows to take advantage of the basic mathematical and statistical functions, and powerful graphics capabilities, that are provided with R. Further, the user will benefit by the seamless integration of the QTL mapping software into a general statistical analysis program. The goal is to make complex QTL mapping methods widely accessible and allow users to focus on modeling rather than computing.
A key component of computational methods for QTL mapping is the hidden Markov model (HMM) technology for dealing with missing genotype data. We have implemented the main HMM algorithms, with allowance for the presence of genotyping errors, for backcrosses, intercrosses, and phase-known four-way crosses.
The current version of R/qtl includes facilities for estimating genetic maps, identifying genotyping errors, and performing single-QTL genome scans and two-QTL, two-dimensional genome scans, by interval mapping (with the EM algorithm), Haley-Knott regression, and multiple imputation. All of this may be done in the presence of covariates (such as sex, age or treatment). One may also fit higher-order QTL models by multiple imputation.
R-other-bio3d
GNU R package for biological structure analysis
Version: 1.0
License: DFSG free
Official Debian package -
The bio3d package contains utilities to process, organize and explore protein structure, sequence and dynamics data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform atom summaries, atom selection, re-orientation, superposition, rigid core identification, clustering, torsion analysis, distance matrix analysis, structure and sequence conservation analysis, and principal component analysis (PCA). In addition, various utility functions are provided to enable the statistical and graphical power of the R environment to work with biological sequence and structural data.
Rasmol
Visualize biological macromolecules
Version: 2.7.4.2
License: DFSG free
Official Debian package -
RasMol is a molecular graphics program intended for the visualisation of proteins, nucleic acids and small molecules. The program is aimed at display, teaching and generation of publication quality images.
The program reads in a molecule coordinate file and interactively displays the molecule on the screen in a variety of colour schemes and molecule representations. Currently available representations include depth-cued wireframes, 'Dreiding' sticks, spacefilling (CPK) spheres, ball and stick, solid and strand biomolecular ribbons, atom labels and dot surfaces.
Supported input file formats include Protein Data Bank (PDB), Tripos Associates' Alchemy and Sybyl Mol2 formats, Molecular Design Limited's (MDL) Mol file format, Minnesota Supercomputer Center's (MSC) XYZ (XMol) format, CHARMm format, CIF format and mmCIF format files.
This package installs two versions of RasMol, rasmol-gtk has a modern GTK-based user interface and rasmol-classic is the version with the old Xlib GUI.
Readseq
[Biology] Conversion between sequence formats
Version: 1
License: DFSG free
Official Debian package -
Reads and writes nucleic/protein sequences in various formats. Data files may have multiple sequences. Readseq is particularly useful as it automatically detects many sequence formats, and converts between them.
Seaview
Multiple sequence alignment editor
Version: 1:2.4
License: DFSG free
Official Debian package -
SeaView is a graphical multiple sequence alignment editor developed by Manolo Gouy. Multiple alignment formats (NEXUS, MSF, CLUSTAL, FASTA, PHYLIP, MASE) are supported for reading and writing. Alignments can be manually edited. The user is further supported by an integration of external programs, i.e., to run DOT-PLOT or MUSCLE, to locally improve the alignment.
When using SeaView for investigations that lead to a publication, please cite the following reference:
Galtier, N., Gouy, M. and Gautier, C. (1996) "SeaView and Phylo_win, two graphic tools for sequence alignment and molecular phylogeny." Comput. Applic. Biosci. 12:543-548.
Sibsim4
align expressed RNA sequences on a DNA template
Version: 0.17
License: DFSG free
Official Debian package -
The SIBsim4 project is based on sim4, which is a program designed to align an expressed DNA sequence with a genomic sequence, allowing for introns. SIBsim4 is a fairly extensive rewrite of the original code with the following goals:
 * speed improvement;
 * allow large, chromosome scale, DNA sequences to be used;
 * provide more detailed output about splice types;
 * provide more detailed output about polyA sites;
 * misc code cleanups and fixes.
Sigma-align
Simple greedy multiple alignment of non-coding DNA sequences
Version: 1.1.1
License: DFSG free
Official Debian package -
Sigma ("Simple greedy multiple alignment") is an alignment program with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. It uses a strategy of seeking the best possible gapless local alignments, at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA.
Sigma has been published in BMC Bioinformatics. 2006 Mar 16;7:143.
Sim4
tool for aligning cDNA and genomic DNA
Version: 0.0.20030921
License: DFSG free
Official Debian package -
sim4 is a similarity-based tool for aligning an expressed DNA sequence (EST, cDNA, mRNA) with a genomic sequence for the gene. It also detects end matches when the two input sequences overlap at one end (i.e., the start of one sequence overlaps the end of the other).
sim4 employs a blast-based technique to first determine the basic matching blocks representing the "exon cores". In this first stage, it detects all possible exact matches of W-mers (i.e., DNA words of size W) between the two sequences and extends them to maximal scoring gap-free segments. In the second stage, the exon cores are extended into the adjacent as-yet-unmatched fragments using greedy alignment algorithms, and heuristics are used to favor configurations that conform to the splice-site recognition signals (GT-AG, CT-AC). If necessary, the process is repeated with less stringent parameters on the unmatched fragments.
T-coffee
Multiple Sequence Alignment
Version: 5.72
License: DFSG free
Official Debian package -
T-Coffee is a multiple sequence alignment package. Given a set of sequences (Proteins or DNA), T-Coffee generates a multiple sequence alignment. Version 2.00 and higher can mix sequences and structures.
T-Coffee allows the combination of a collection of multiple/pairwise, global or local alignments into a single model. It also allows to estimate the level of consistency of each position within the new alignment with the rest of the alignments. See the pre-print for more information
T-Coffee has a special called M-Coffee that makes it possible to combine the output of many multiple sequence alignment packages. In its published version, it uses MUSCLE, PROBCONS, POA, DiAlign-TS, MAFFT, Clustal W, PCMA and T-Coffee. A special version has been made for Debian, DM-Coffee, that uses only free software by replacing Clustal W by Kalign. Using the 8 Methods of M-Coffee can sometimes be a bit heavy. You can use a subset of your favorite methods if you prefer.
Tigr-glimmer
Gene detection in archea and bacteria
Version: 3.02
License: DFSG free
Official Debian package -
Developed by the TIGR institute this software detects coding sequences in bacteria and archea.
Glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria and archaea. Glimmer (Gene Locator and Interpolated Markov Modeler) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA.
Tree-ppuzzle
Parallelized reconstruction of phylogenetic trees by maximum likelihood
Version: 5.2
License: DFSG free
Official Debian package -
TREE-PUZZLE (the new name for PUZZLE) is an interactive console program that implements a fast tree search algorithm, quartet puzzling, that allows analysis of large data sets and automatically assigns estimations of support to each internal branch. TREE-PUZZLE also computes pairwise maximum likelihood distances as well as branch lengths for user specified trees. Branch lengths can also be calculated under the clock-assumption. In addition, TREE-PUZZLE offers a novel method, likelihood mapping, to investigate the support of a hypothesized internal branch without computing an overall tree and to visualize the phylogenetic content of a sequence alignment.
This is the parallelized version of tree-puzzle.
Tree-puzzle
Reconstruction of phylogenetic trees by maximum likelihood
Version: 5.2
License: DFSG free
Official Debian package -
TREE-PUZZLE (the new name for PUZZLE) is an interactive console program that implements a fast tree search algorithm, quartet puzzling, that allows analysis of large data sets and automatically assigns estimations of support to each internal branch. TREE-PUZZLE also computes pairwise maximum likelihood distances as well as branch lengths for user specified trees. Branch lengths can also be calculated under the clock-assumption. In addition, TREE-PUZZLE offers a novel method, likelihood mapping, to investigate the support of a hypothesized internal branch without computing an overall tree and to visualize the phylogenetic content of a sequence alignment.
Treetool
interactive tool for displaying phylogenetic trees
Version: 2.0.2a
License: non-free
Debian package in non-free -
Treetool is an interactive tool for displaying, editing, and printing phylogenetic trees. The tree is displayed visually on screen, in various formats, and the user is able to modify the format, structure, and characteristics of the tree. Trees may be viewed, compared, formatted for printing, constructed from smaller trees, etc.
The development of this software has stopped in 1995.
Treeviewx
Displays and prints phylogenetic trees
Version: 0.5.1
License: DFSG free
Official Debian package -
TreeView X is an open source and multi-platform program to display phylogenetic trees. It can read and display NEXUS and Newick format tree files (such as those output by PAUP*, ClustalX, TREE-PUZZLE, and other programs). It allows to order the branches of the trees, and to export the trees in SVG format.
The program was written by Rod Page r.page@bio.gla.ac.uk using the wxWidgets C++ library. It was published in Computer Applications in the Biosciences. 1996 12: 357-358.
Wise
comparison of biopolymers, commonly DNA and protein sequences
Version: 2.4.1