Summary
Next generation sequencing
Debian Med bioinformatics applications usable in Next Generation Sequencing
It aims at gettting packages which specializes in alignment of
sequences produced by next generation sequencing.
The list to the right includes various software projects which are of some interest to the Debian Med Project. Currently, only a few of them are available as Debian packages. It is our goal, however, to include all software in Debian Med which can sensibly add to a high quality Debian Pure Blend.
For a better overview of the project's availability as a Debian package, each head row has a color code according to this scheme:
If you discover a project which looks like a good candidate for Debian Med
to you, or if you have prepared an unofficial Debian package, please do not hesitate to
send a description of that project to the Debian Med mailing list
Links to other tasks
|
Debian Med Next generation sequencing packages
Official Debian packages with high relevance
|
Bedtools
suite of utilities for comparing genomic features
|
| Versions of package bedtools |
| Release | Version | Architectures |
| wheezy | 2.16.1-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| jessie | 2.17.0-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 2.17.0-1 | amd64,armel,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| Debtags of package bedtools: |
| field | biology:bioinformatics, biology |
| interface | commandline |
| role | program |
| scope | suite |
| use | filtering, converting, comparing, analysing |
| works-with | biological-sequence |
|
License: DFSG free
|
|
The BEDTools utilities allow one to address common genomics tasks such as
finding feature overlaps and computing coverage. The utilities are largely
based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM. Using
BEDTools, one can develop sophisticated pipelines that answer complicated
research questions by streaming several BEDTools together.
The groupBy utility is distribued in the filo package.
|
|
|
Bowtie
Ultrafast memory-efficient short read aligner
|
| Versions of package bowtie |
| Release | Version | Architectures |
| wheezy | 0.12.7-3 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,powerpc,s390,s390x,sparc |
| jessie | 1.0.0-5 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 1.0.0-5 | amd64,armel,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| Debtags of package bowtie: |
| biology | nuceleic-acids |
| field | biology:bioinformatics |
| interface | commandline |
| role | program |
| science | calculation |
| scope | utility |
| use | comparing, analysing |
| works-with | biological-sequence |
|
License: DFSG free
|
|
This package addresses the problem to interpret the results from the
latest (2010) DNA sequencing technologies. Those will yield fairly
short stretches and those cannot be interpreted directly. It is the
challenge for tools like Bowtie to give a chromosomal location to the
short stretches of DNA sequenced per run.
Bowtie aligns short DNA sequences (reads) to the human genome at a rate
of over 25 million 35-bp reads per hour. Bowtie indexes the genome with
a Burrows-Wheeler index to keep its memory footprint small: typically
about 2.2 GB for the human genome (2.9 GB for paired-end).
|
|
|
Bwa
|
| Versions of package bwa |
| Release | Version | Architectures |
| squeeze | 0.5.8c-1 | amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc |
| wheezy | 0.6.2-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| jessie | 0.6.2-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 0.6.2-1 | armel,armhf,hurd-i386,i386,ia64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 0.7.5a-2 | amd64,kfreebsd-amd64 |
| Debtags of package bwa: |
| biology | peptidic, nuceleic-acids |
| field | biology:bioinformatics, biology |
| interface | text-mode, commandline |
| role | program |
| use | comparing, analysing |
|
License: DFSG free
|
|
BWA is a software package for mapping low-divergent sequences against
a large reference genome, such as the human genome. It consists of
three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first
algorithm is designed for Illumina sequence reads up to 100bp, while
the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM
and BWA-SW share similar features such as long-read support and split
alignment, but BWA-MEM, which is the latest, is generally recommended
for high-quality queries as it is faster and more accurate. BWA-MEM
also has better performance than BWA-backtrack for 70-100bp Illumina
reads.
|
|
|
Fastx-toolkit
FASTQ/A short nucleotide reads pre-processing tools
|
| Versions of package fastx-toolkit |
| Release | Version | Architectures |
| wheezy | 0.0.13.2-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| jessie | 0.0.13.2-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 0.0.13.2-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| Debtags of package fastx-toolkit: |
| role | program |
|
License: DFSG free
|
|
The FASTX-Toolkit is a collection of command line tools for preprocessing
short nucleotide reads in FASTA and FASTQ formats, usually produced by
Next-Generation sequencing machines. The main processing of such FASTA/FASTQ
files is mapping (aligning) the sequences to reference genomes or other
databases using specialized programs like BWA, Bowtie and many others.
However, it is sometimes more productive to preprocess the FASTA/FASTQ files
before mapping the sequences to the genome—manipulating the sequences to
produce better mapping results. The FASTX-Toolkit tools perform some of these
preprocessing tasks.
|
|
|
Filo
FILe and stream Operations - operacje na plikach i strumieniach
|
| Versions of package filo |
| Release | Version | Architectures |
| wheezy | 1.1+2011020401.2 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| jessie | 1.1+2011020401.2 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 1.1+2011020401.2 | amd64,armel,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
|
License: DFSG free
|
|
Następujące narzędzia dostępne są jako część pakietu filo:
groupBy - imituje klauzulę "GROUP BY" w systemach bazodanowych.
shuffle - losowo rozmieszcza wiersze w pliku.
stats - oblicza statystyki opisowe w danej kolumnie z rozdzielonych
znakami tabulacji danych pliku lub strumienia.
Ponieważ nazwy tych narzędzi są zbyt ogólnikowe, shuffle i stats zostały
przeniesione do /usr/lib/filo.
|
|
|
Last-align
Porównywanie sekwencji biologicznych w skali genomu
|
| Versions of package last-align |
| Release | Version | Architectures |
| squeeze | 128-1 | amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc |
| wheezy | 199-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| jessie | 199-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 199-1 | amd64,armel,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| Debtags of package last-align: |
| field | biology:bioinformatics, biology |
| role | program |
|
License: DFSG free
|
|
LAST jest oprogramowaniem służącym do porównywania i dopasowywania
sekwencji, zazwyczaj DNA lub białek. LAST jest podobny do programu BLAST,
ale lepiej radzi sobie z bardzo dużymi ilościami danych sekwencyjnych.
Poniżej wymieniono dwie czynności, w których LAST dobrze się spisuje:
- Porównywanie dużych genomów (np. u ssaków).
- Odwzorowywanie wielu znaczników sekwencji z genomu.
Główna innowacja techniczna polega na tym, że LAST wyszukuje wstępne
dopasowania na podstawie liczby ich wystąpień zamiast używania stałego
rozmiaru (np. BLAST używa 10 mer). LAST pozwala na pojedyncze
odwzorowywanie znaczników z genomów bez konieczności wielokrotnego
maskowania oraz przeciążania programu powtarzającymi się wynikami. Do
wyszukiwania tych zmiennorozmiarowych dopasowań wykorzystuje tablicę
sufiksów (inspirowaną na Vmatch). Aby osiągnąć wysoką czułość wykorzystuje
tablicę sufiksów niesąsiadujących, analogicznie do rozmieszczania nasion.
|
|
|
Maq
maps short fixed-length polymorphic DNA sequence reads to reference sequences
|
| Versions of package maq |
| Release | Version | Architectures |
| squeeze | 0.7.1-3 | amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc |
| wheezy | 0.7.1-5 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| jessie | 0.7.1-5 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 0.7.1-5 | amd64,armel,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| Debtags of package maq: |
| biology | nuceleic-acids |
| field | biology:bioinformatics, biology |
| interface | commandline |
| role | program |
| scope | utility |
| use | searching, comparing, analysing |
| works-with-format | plaintext |
|
License: DFSG free
|
|
Maq (short for Mapping and Assembly with Quality) builds mapping assemblies
from short reads generated by the next-generation sequencing machines. It was
particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has a
preliminary functionality to handle ABI SOLiD data. Maq is previously known as
mapass2.
Developmemt of Maq stopped in 2008. Its successors are BWA and SAMtools.
|
|
|
Mira-assembler
Whole Genome Shotgun and EST Sequence Assembler
|
| Versions of package mira-assembler |
| Release | Version | Architectures |
| wheezy | 3.4.0.1-3 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| jessie | 3.4.0.1-3 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 3.9.17-1 | amd64,armel,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| Debtags of package mira-assembler: |
| role | program |
|
License: DFSG free
|
|
The mira genome fragment assembler is a specialised assembler for
sequencing projects classified as 'hard' due to high number of similar
repeats. For expressed sequence tags (ESTs) transcripts, miraEST is
specialised on reconstructing pristine mRNA transcripts while
detecting and classifying single nucleotide polymorphisms (SNP)
occuring in different variations thereof.
The assembler is routinely used for such various tasks as mutation
detection in different cell types, similarity analysis of transcripts
between organisms, and pristine assembly of sequences from various
sources for oligo design in clinical microarray experiments.
The package provides the following executables:
Binaries provided:
- mira: for assembly of genome sequences
- miramem: estimating memory needed to assemble projects. Realised through
link to mira.
- convert_project: for converting project file types into other types
- caf2fasta, caf2gbf, caf2text, caf2html, gbf2caf and gbf2fasta are some
frequently used file converters (realised through links to convert_project)
- scftool: set of tools useful when working with SCF trace files
- fastatool: set of tools useful when working with FASTA trace files
Scripts provided:
- fasta2frag: fragmenting sequences into smaller, overlapping
subsequences. Useful for simulating shotgun sequences. Can create
subsequences in both directions (/default) and also paired-end sequences.
- fastaselect: given a FASTA file (and possibly a FASTA quality file) and
a file with names of reads, select the sequences from the input FASTA (and
quality file) and writes them to an output FASTA
- fastqselect: like fastaselect, only for FASTQ
- fixACE4consed: Consed has a bug which incapacitates it from reading
consensus tags in ACE files written by the MIRA assembler (and possibly
other programs). This script massages an ACE file so that consed can read
the consensus tags.
|
|
|
Mothur
Zestaw narzędzi analizy sekwencyjnej do badań nad florą bakteryjną
|
| Versions of package mothur |
| Release | Version | Architectures |
| wheezy | 1.24.1-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| jessie | 1.24.1-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 1.24.1-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| upstream | 1.31.2 |
| Debtags of package mothur: |
| role | program |
|
License: DFSG free
|
|
Projekt Mothur stawia sobie za cel opracowanie pojedynczego,
otwartoźródłowego, rozszerzalnego oprogramowania, służącego do zaspokojenia
bioinformatycznych potrzeb społeczności zajmującej się ekologią
mikroorganizmów. Obsługuje wbudowane funkcje dotur, sons, treeclimber,
s-libshuff, unifrac i wiele innych. Poza poprawą elastyczności tych
algorytmów, dodano wiele innych funkcji, w tym kalkulatory i narzędzia do
wizualizacji.
Please cite:
Patrick D Schloss, Sarah L Westcott, Thomas Ryabin, Justine R Hall, Martin Hartmann, Emily B Hollister, Ryan A Lesniewski, Brian B Oakley, Donovan H Parks, Courtney J Robinson, Jason W Sahl, Blaz Stres, Gerhard G Thallinger, David J Van Horn and Carolyn F Weber:
Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities.
(PubMed)
Appl Environ Microbiol
75(23):7537-7541
(2009)
|
|
|
Picard-tools
Command line tools to manipulate SAM and BAM files
|
| Versions of package picard-tools |
| Release | Version | Architectures |
| squeeze | 1.27-1 | all |
| wheezy | 1.46-1 | all |
| jessie | 1.90-2 | all |
| sid | 1.90-2 | all |
| upstream | 1.92 |
|
License: DFSG free
|
|
SAM (Sequence Alignment/Map) format is a generic format for storing
large nucleotide sequence alignments. Picard Tools includes these
utilities to manipulate SAM and BAM files:
BamToBfq IlluminaBasecallsToSam
BuildBamIndex MarkDuplicates
CalculateHsMetrics MeanQualityByCycle
CleanSam MergeBamAlignment
CollectAlignmentSummaryMetrics MergeSamFiles
CollectGcBiasMetrics NormalizeFasta
CollectInsertSizeMetrics QualityScoreDistribution
CollectRnaSeqMetrics ReplaceSamHeader
CompareSAMs RevertSam
CreateSequenceDictionary SamFormatConverter
ExtractIlluminaBarcodes SamToFastq
EstimateLibraryComplexity SortSam
FastqToSam ValidateSamFile
FixMateInformation ViewSam
|
|
|
Qiime
Quantitative Insights Into Microbial Ecology
|
| Versions of package qiime |
| Release | Version | Architectures |
| wheezy | 1.4.0-2 | amd64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| jessie | 1.4.0-2 | amd64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 1.5.0+dfsg-1 | armel,hurd-i386,mipsel |
| sid | 1.7.0+dfsg-1 | amd64,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,powerpc,s390,s390x,sparc |
| Debtags of package qiime: |
| role | program |
|
License: DFSG free
|
|
QIIME (canonically pronounced ‘Chime’) is a pipeline for performing
microbial community analysis that integrates many third party tools which
have become standard in the field. A standard QIIME analysis begins with
sequence data from one or more sequencing platforms, including
- Sanger,
- Roche/454, and
-
Illumina GAIIx.
With all the underlying tools installed,
of which not all are yet available in Debian (or any other Linux
distribution), QIIME can perform
-
library de-multiplexing and quality filtering;
- denoising with PyroNoise;
- OTU and representative set picking with uclust, cdhit, mothur, BLAST,
or other tools;
- taxonomy assignment with BLAST or the RDP classifier;
- sequence alignment with PyNAST, muscle, infernal, or other tools;
- phylogeny reconstruction with FastTree, raxml, clearcut, or other tools;
- alpha diversity and rarefaction, including visualization of results,
using over 20 metrics including Phylogenetic Diversity, chao1, and
observed species;
- beta diversity and rarefaction, including visualization of results,
using over 25 metrics including weighted and unweighted UniFrac,
Euclidean distance, and Bray-Curtis;
- summarization and visualization of taxonomic composition of samples
using pie charts and histograms
and many other features.
QIIME includes parallelization capabilities for many of the
computationally intensive steps. By default, these are configured to
utilize a mutli-core environment, and are easily configured to run in
a cluster environment. QIIME is built in Python using the open-source
PyCogent toolkit. It makes extensive use of unit tests, and is highly
modular to facilitate custom analyses.
Please cite:
J Gregory Caporaso, Justin Kuczynski, Stombaugh Jesse, Bittinger Kyle, Bushman Frederic D, Costello Elizabeth K, Fierer Noah, Pena Antonio Gonzalez, Goodrich Julia K, Gordon Jeffrey I, Huttley Gavin A, Kelley Scott T, Knights Dan, Koenig Jeremy E, Ley Ruth E, Lozupone Catherine A, McDonald Daniel, Muegge Brian D, Pirrung Meg, Reeder Jens, Sevinsky Joel R, Turnbaugh Peter J, Walters William A, Widmann Jeremy, Yatsunenko Tanya, Zaneveld Jesse and Knight Rob:
QIIME allows analysis of high-throughput community sequencing data.
(PubMed)
Nature Methods
7:335 - 336
(2010)
|
|
|
R-bioc-edger
Empirical analysis of digital gene expression data in R
|
| Versions of package r-bioc-edger |
| Release | Version | Architectures |
| wheezy | 2.6.1~dfsg-1 | all |
| jessie | 2.6.1~dfsg-1 | all |
| sid | 3.2.3~dfsg-1 | amd64,armel,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| Debtags of package r-bioc-edger: |
| field | biology:bioinformatics, biology |
| interface | commandline |
| role | shared-lib, program, plugin |
| scope | utility |
| use | comparing, calculating, analysing |
|
License: DFSG free
|
|
Bioconductor package for differential expression analysis of whole
transcriptome sequencing (RNA-seq) and digital gene expression
profiles with biological replication. It uses empirical Bayes
estimation and exact tests based on the negative binomial
distribution. It is also useful for differential signal analysis with
other types of genome-scale count data.
|
|
|
R-bioc-hilbertvis
GNU R package to visualise long vector data
|
| Versions of package r-bioc-hilbertvis |
| Release | Version | Architectures |
| squeeze | 1.5.0-2 | amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc |
| wheezy | 1.14.0-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| jessie | 1.18.0-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 1.18.0-1 | amd64,armel,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| Debtags of package r-bioc-hilbertvis: |
| biology | nuceleic-acids |
| field | biology:bioinformatics, biology |
| use | analysing |
|
License: DFSG free
|
|
This tool allows one to display very long data vectors in a space-efficient
manner, by organising it along a 2D Hilbert curve. The user can then
visually judge the large scale structure and distribution of features
simultaenously with the rough shape and intensity of individual features.
In bioinformatics, a typical use case is ChIP-Chip and ChIP-Seq,
or basically all the kinds of genomic data, that are conventionally
displayed as quantitative track ("wiggle data") in genome browsers such
as those provided by Ensembl or UCSC.
|
|
|
Samtools
processing sequence alignments in SAM and BAM formats
|
| Versions of package samtools |
| Release | Version | Architectures |
| squeeze | 0.1.8-1 | amd64,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390 |
| wheezy | 0.1.18-1 | amd64,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390 |
| jessie | 0.1.19-1 | amd64,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x |
| sid | 0.1.19-1 | amd64,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x |
| Debtags of package samtools: |
| field | biology |
| interface | commandline |
| network | client |
| role | program |
| scope | utility |
| uitoolkit | ncurses |
| use | filtering, calculating, analysing |
| works-with | biological-sequence |
|
License: DFSG free
|
|
Samtools is a set of utilities that manipulate nucleotide sequence alignments
in the binary BAM format. It imports from and exports to the ascii SAM
(Sequence Alignment/Map) format, does sorting, merging and indexing, and allows
to retrieve reads in any regions swiftly. It is designed to work on a stream,
and is able to open a BAM (not SAM) file on a remote FTP or HTTP server.
|
|
|
Sra-toolkit
Narzędzia do Sequence Read Archive z NCBI
|
| Versions of package sra-toolkit |
| Release | Version | Architectures |
| wheezy | 2.1.7a-1 | amd64,i386,kfreebsd-amd64,kfreebsd-i386 |
| jessie | 2.1.7a-1 | amd64,i386,kfreebsd-amd64,kfreebsd-i386 |
| sid | 2.1.7a-1 | amd64,i386,kfreebsd-amd64,kfreebsd-i386 |
| upstream | 2.3.2-4 |
|
License: DFSG free
|
|
Narzędzia do odczytywania archiwum SRA, zazwyczaj poprzez konwersję
poszczególnych przebiegów do niektórych powszechnie używanych formatów,
takich jak fastq.
Narzędzia do tekstowych zrzutów: sra-dump i vdb-dump zostały zamieszczone w
tym wydaniu jako pomoc w kontroli wizualnej. Jest prawdopodobne, że
efektywne formatowanie ich danych wyjściowych zostanie zmienione w
najbliższej przyszłości do bardziej restrykcyjnych i sformalizowanych
reprezentacji. PROSIMY NIE POLEGAĆ NA FORMACIE WYJŚCIOWYM POSTRZEGANYM W
TYM WYDANIU.
Informacje związane z opcją "help" zostaną ulepszone w wydaniach, które
pojawią się w niedalekiej przyszłości, a opcje narzędzi zostaną
znormalizowane w całym zestawie. Dodatkowa dokumentacja znajduje się
również na stronie NCBI.
Opcje narzędzi mogą ulec zmianie w następnym wydaniu. W miarę możliwości
opcje narzędzi z 1 wydania będą nadal obsługiwane, aby zachować możliwość
funkcjonowania istniejących skryptów.
Please cite:
Rasko Leinonen, Ruth Akhtar, Ewan Birney, James Bonfield, Lawrence Bower, Matt Corbett, Ying Cheng, Fehmi Demiralp, Nadeem Faruque, Neil Goodgame, Richard Gibson, Gemma Hoad, Christopher Hunter, Mikyung Jang, Steven Leonard, Quan Lin, Rodrigo Lopez, Michael Maguire, Hamish McWilliam, Sheila Plaister, Rajesh Radhakrishnan, Siamak Sobhany, Guy Slater, Petra Ten Hoopen, Franck Valentin, Robert Vaughan, Vadim Zalunin, Daniel Zerbino and Guy Cochrane:
Improvements to services at the European Nucleotide Archive.
(PubMed,eprint)
Nucleic Acids Research
38(Database issue):D39-45
(2010)
|
|
|
Ssake
Aplikacja genomiczna do łączenia milionów bardzo krótkich sekwencji DNA
|
| Versions of package ssake |
| Release | Version | Architectures |
| squeeze | 3.5-1 | all |
| wheezy | 3.8-2 | all |
| jessie | 3.8-2 | all |
| sid | 3.8-2 | all |
| Debtags of package ssake: |
| biology | nuceleic-acids |
| field | biology |
| interface | shell |
| role | program |
| scope | utility |
| use | analysing |
|
License: DFSG free
|
|
SSAKE (Short Sequence Assembly by K-mer search and 3' read Extension) jest
aplikacją genomiczną, służącą do agresywnego łączenia milionów krótkich
sekwencji nukleotydowych poprzez progresywne wyszukiwanie 3 najbardziej
optymalnych podciągów o długości k przy użyciu prefiksowego drzewa DNA.
SSAKE został zaprojektowany do wykorzystywania informacji z krótkich
odczytów sekwencji poprzez ścisłe grupowanie ich w kontigach, które mogą
być użyte do opisywania nowych celów sekwencjonowania.
|
|
|
Tabix
Narzędzie do indeksowania plików z tabulacjami rozdzielającymi pozycje genomu
|
| Versions of package tabix |
| Release | Version | Architectures |
| wheezy | 0.2.6-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| jessie | 0.2.6-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 0.2.6-1 | amd64,armel,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| Debtags of package tabix: |
| field | biology |
| interface | commandline |
| network | client |
| role | program |
| scope | utility |
| use | filtering, compressing |
| works-with | text |
|
License: DFSG free
|
|
Tabix indeksuje pliki, w których kolejność współrzędnych jest wskazywana
przez niektóre kolumny, takie jak: name (zwykle z nazwą chromosomu), start
i stop. Plik z danymi wejściowymi powinien być posortowany pod kątem
pozycji (genomów) oraz skompresowany przy użyciu programu bgzip
(dostarczonego w tym pakiecie), oferującego interfejs podobny do gzip. Po
wykonaniu operacji indeksowania, Tabix może szybko wyszukiwać linie danych
według współrzędnych chromosomalnych. Szybkie wyszukiwanie danych działa
także za pośrednictwem sieci, jeśli URI jest podany jako nazwa pliku.
|
|
|
Tophat
fast splice junction mapper for RNA-Seq reads
|
| Versions of package tophat |
| Release | Version | Architectures |
| jessie | 2.0.8-1 | amd64,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x |
| sid | 2.0.8-1 | i386 |
| sid | 2.0.8b-1 | amd64,armhf,hurd-i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x |
|
License: DFSG free
|
|
TopHat aligns RNA-Seq reads to mammalian-sized genomes using the ultra
high-throughput short read aligner Bowtie, and then analyzes the
mapping results to identify splice junctions between exons.
TopHat is a collaborative effort between the University of Maryland
Center for Bioinformatics and Computational Biology and the
University of California, Berkeley Departments of Mathematics and
Molecular and Cell Biology.
The package is enhanced by the following packages:
cufflinks
|
|
|
Uc-echo
error correction algorithm designed for short-reads from NGS
|
| Versions of package uc-echo |
| Release | Version | Architectures |
| jessie | 1.12-1 | amd64,i386,kfreebsd-amd64,powerpc,s390,s390x,sparc |
| sid | 1.12-1 | amd64,i386,kfreebsd-amd64,powerpc,s390,s390x,sparc |
|
License: DFSG free
|
|
ECHO is an error correction algorithm designed for short-reads
from next-generation sequencing platforms such as Illumina's
Genome Analyzer II. The algorithm uses a Bayesian framework to
improve the quality of the reads in a given data set by employing
maximum a posteriori estimation.
|
|
|
Vcftools
Collection of tools to work with VCF files
|
| Versions of package vcftools |
| Release | Version | Architectures |
| wheezy | 0.1.9-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| jessie | 0.1.10+dfsg-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 0.1.10+dfsg-1 | amd64,armel,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| Debtags of package vcftools: |
| role | program |
|
License: DFSG free
|
|
VCFtools is a program package designed for working with VCF files, such as
those generated by the 1000 Genomes Project. The aim of VCFtools is to
provide methods for working with VCF files: validating, merging, comparing
and calculate some basic population genetic statistics.
Please cite:
Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert E. Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T. Sherry, Gilean McVean and Richard Durbin:
The variant call format and VCFtools.
(PubMed,eprint)
Bioinformatics
27(15):2156-8
(2011)
|
|
|
Velvet
Nucleic acid sequence assembler for very short reads
|
| Versions of package velvet |
| Release | Version | Architectures |
| squeeze | 1.0.02~nozlibcopy-1 | amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc |
| wheezy | 1.2.03~nozlibcopy-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| jessie | 1.2.03~nozlibcopy-1 | amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| sid | 1.2.03~nozlibcopy-1 | amd64,armel,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc |
| upstream | 1.2.10 |
| Debtags of package velvet: |
| biology | nuceleic-acids |
| field | biology:bioinformatics, biology |
| interface | commandline |
| role | program |
| use | analysing |
|
License: DFSG free
|
|
Velvet is a de novo genomic assembler specially designed for short read
sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and
Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near
Cambridge, in the United Kingdom.
Velvet currently takes in short read sequences, removes errors then produces
high quality unique contigs. It then uses paired read information, if
available, to retrieve the repeated areas between contigs.
|
|
Debian packages in contrib or non-free
|
Cufflinks
Transcript assembly, differential expression and regulation for RNA-Seq
|
| Versions of package cufflinks |
| Release | Version | Architectures |
| wheezy | 1.3.0-2 (non-free) | amd64 |
| jessie | 2.1.1-2 (non-free) | amd64 |
| sid | 2.1.1-2 (non-free) | amd64 |
| Debtags of package cufflinks: |
| field | biology |
| interface | commandline |
| role | program |
| scope | utility |
| use | analysing |
| works-with | biological-sequence |
|
License: non-free
|
|
Cufflinks assembles transcripts, estimates their abundances, and tests for
differential expression and regulation in RNA-Seq samples. It accepts aligned
RNA-Seq reads and assembles the alignments into a parsimonious set of
transcripts. Cufflinks then estimates the relative abundances of these
transcripts based on how many reads support each one.
|
|
Packaging has started and developers might try the packaging code in VCS
Detection of various kinds of polymorphisms in RNA-seq data.
|
License: CeCILL
Debian package not available
Version: 1.8.1-1
|
|
KisSplice is a piece of software that enables the analysis of RNA-seq data
with or without a reference genome. It is an exact local transcriptome
assembler that allows one to identify SNPs, indels and alternative splicing
events. It can deal with an arbitrary number of biological conditions, and
will quantify each variant in each condition.
It has been tested on Illumina datasets of up to 1G reads.
Its memory consumption is around 5Gb for 100M reads.
|
|
Mosaik-aligner
reference-guided aligner for next-generation sequencing
|
License: MIT
Debian package not available
Version: 1.1.0021-1
|
|
MosaikBuild converts various sequence formats into Mosaik’s native read
format. MosaikAligner pairwise aligns each read to a specified series of
reference sequences. MosaikSort resolves paired-end reads and sorts the
alignments by the reference sequence coordinates. Finally, MosaikText
converts alignments to different text-based formats.
At this time, the workflow consists of supplying sequences in FASTA,
FASTQ, Illumina Bustard & Gerald, or SRF file formats and producing
results in the BLAT axt, the BAM/SAM, the UCSC Genome Browser bed, or
the Illumina ELAND formats.
|
No known packages available
|
Annovar
annotate genetic variants detected from diverse genomes
|
License: Open Source for non-profit
Debian package not available
|
|
ANNOVAR is an efficient software tool to utilize update-to-date information
to functionally annotate genetic variants detected from diverse genomes
(including human genome hg18, hg19, as well as mouse, worm, fly, yeast and
many others). Given a list of variants with chromosome, start position, end
position, reference nucleotide and observed nucleotides, ANNOVAR can perform:
1. Gene-based annotation: identify whether SNPs or CNVs cause protein coding
changes and the amino acids that are affected. Users can flexibly use RefSeq
genes, UCSC genes, ENSEMBL genes, GENCODE genes, or many other gene definition
systems.
2. Region-based annotations: identify variants in specific genomic regions,
for example, conserved regions among 44 species, predicted transcription
factor binding sites, segmental duplication regions, GWAS hits, database
of genomic variants, DNAse I hypersensitivity sites, ENCODE
H3K4Me1/H3K4Me3/H3K27Ac/CTCF sites, ChIP-Seq peaks, RNA-Seq peaks, or many
other annotations on genomic intervals.
3. Filter-based annotation: identify variants that are reported in dbSNP,
or identify the subset of common SNPs (MAF>1%) in the 1000 Genome Project,
or identify subset of non-synonymous SNPs with SIFT score>0.05, or many
other annotations on specific mutations.
4. Other functionalities: Retrieve the nucleotide sequence in any
user-specific genomic positions in batch, identify a candidate gene list
for Mendelian diseases from exome data, identify a list of SNPs from
1000 Genomes that are in strong LD with a GWAS hit, and many other
creative utilities.
In a modern desktop computer (3GHz Intel Xeon CPU, 8Gb memory), for
4.7 million variants, ANNOVAR requires ~4 minutes to perform
gene-based functional annotation, or ~15 minutes to perform stepwise
"variants reduction" procedure, making it practical to handle hundreds
of human genomes in a day.
|
|
Forge
genome assembler for mixed read types
|
License: Apache 2.0
Debian package not available
|
|
Forge Genome Assembler is a parallel, MPI based genome assembler for
mixed read types.
Forge is a classic "Overlap layout consensus" genome assembler written
by Darren Platt and Dirk Evers. Implemented in C++ and using the
parallel MPI library, it runs on one or more machines in a network and
can scale to very large numbers of reads provided there is enough
collective memory on the machines used. It generates a full consensus
alignment of all reads, can handle mixtures of sanger, 454 and illumina
reads. There is some support for solid color space and it includes built
in tools for vector trimming and contamination screening.
Forge and was originally developed at Exelixis and they have kindly
agreed to place the software which underwent much subsequent development
outside Exelixis, into the public domain. Forge works with most of the
common MPI implementations.
|
|