Last data update: 2014.03.03

R: Finds Synteny in a Sequence Database
FindSyntenyR Documentation

Finds Synteny in a Sequence Database

Description

Finds syntenic blocks between groups of sequences in a database.

Usage

FindSynteny(dbFile,
            tblName = "Seqs",
            identifier = "",
            useFrames = FALSE,
            alphabet = c("MF", "ILV", "A", "C", "WYQHP", "G", "TSN", "RK", "DE"),
            geneticCode = GENETIC_CODE,
            sepCost = -0.01,
            gapCost = -0.2,
            shiftCost = -20,
            codingCost = -3,
            maxSep = 5000,
            maxGap = 5000,
            minScore = 200,
            dropScore = -100,
            maskRepeats = TRUE,
            storage = 0.5,
            processors = 1,
            verbose = TRUE)

Arguments

dbFile

A SQLite connection object or a character string specifying the path to the database file.

tblName

Character string specifying the table where the sequences are located.

identifier

Optional character string used to narrow the search results to those matching a specific identifier. If "" then all identifiers are selected.

useFrames

Logical specifying whether to use 6-frame amino acid translations to help find more distant hits. If FALSE (the default) then faster but less sensitive to distant homology.

alphabet

Character vector of amino acid groupings used to reduce the 20 standard amino acids into smaller groups. Alphabet reduction helps to find more distant homologies between sequences. A non-reduced amino acid alphabet can be used by setting alphabet equal to AA_STANDARD.

geneticCode

Either a character vector giving the genetic code to use in translation, or a list containing one genetic code for each identifier. If a list is provided then it must be named by the corresponding identifiers in the database.

sepCost

Cost per nucleotide separation between hits to apply when chaining hits into blocks.

gapCost

Cost for gaps between hits to apply when chaining hits into blocks.

shiftCost

Cost for shifting between different reading frames when chaining reduced amino acid hits into blocks.

codingCost

Cost for switching between coding and non-coding hits when chaining hits into blocks.

maxSep

Maximal separation (in nucleotides) between hits in the same block.

maxGap

The maximum number of gaps between hits in the same block.

minScore

The minimum score required for a chain of hits to become a block.

dropScore

The change from maximal score required to stop extending blocks.

maskRepeats

Logical specifying whether to “soft” mask repeats when searching for hits.

storage

Excess gigabytes available to store objects so that they do not need to be recomputed in later steps. This should be a number between zero and a (modest) fraction of the available system memory. Note that more than storage gigabytes may be required, but will not be stored for later reuse.

processors

The number of processors to use, or NULL to automatically detect and use all available processors.

verbose

Logical indicating whether to display progress.

Details

Long nucleotide sequences, such as genomes, are often not collinear, or may be composed of many smaller segments (e.g., contigs). FindSynteny searches for “hits” between sequences that can be chained into collinear “blocks” of synteny. Hits are defined as k-mer exact nucleotide matches or k-mer matches in a reduced amino acid alphabet (if useFrames is TRUE). Hits are chained into blocks as long as they are: (1) within the same sequence, (2) within maxSep and maxGap distance, and (3) help maintain the score above minScore. Blocks are extended from their first and last hit until their score drops below dropScore from the maximum that was reached. This process results in a set of hits and blocks stored in an object of class “Synteny”.

Value

An object of class “Synteny”.

Author(s)

Erik Wright DECIPHER@cae.wisc.edu

See Also

AlignSynteny, Synteny-class

Examples

db <- system.file("extdata", "Influenza.sqlite", package="DECIPHER")
synteny <- FindSynteny(db, useFrames=TRUE, minScore=50)
synteny
pairs(synteny) # scatterplot matrix

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(DECIPHER)
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: RSQLite
Loading required package: DBI
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/DECIPHER/FindSynteny.Rd_%03d_medium.png", width=480, height=480)
> ### Name: FindSynteny
> ### Title: Finds Synteny in a Sequence Database
> ### Aliases: FindSynteny
> 
> ### ** Examples
> 
> db <- system.file("extdata", "Influenza.sqlite", package="DECIPHER")
> synteny <- FindSynteny(db, useFrames=TRUE, minScore=50)
   |                                                                               |                                                                      |   0%   |                                                                               |====                                                                  |   5%   |                                                                               |=======                                                               |  10%   |                                                                               |==========                                                            |  15%   |                                                                               |==============                                                        |  20%   |                                                                               |==================                                                    |  25%   |                                                                               |=====================                                                 |  30%   |                                                                               |========================                                              |  35%   |                                                                               |============================                                          |  40%   |                                                                               |================================                                      |  45%   |                                                                               |===================================                                   |  50%   |                                                                               |======================================                                |  55%   |                                                                               |==========================================                            |  60%   |                                                                               |==============================================                        |  65%   |                                                                               |=================================================                     |  70%   |                                                                               |====================================================                  |  75%   |                                                                               |========================================================              |  80%   |                                                                               |============================================================          |  85%   |                                                                               |===============================================================       |  90%   |                                                                               |==================================================================    |  95%   |                                                                               |======================================================================| 100%

Time difference of 1.12 secs

> synteny
         H9N2     H5N1     H2N2     H7N9     H1N1
H9N2   8 seqs 67% hits 76% hits 65% hits 70% hits
H5N1 8 blocks   8 seqs 68% hits 65% hits 78% hits
H2N2 8 blocks 8 blocks   8 seqs 73% hits 68% hits
H7N9 8 blocks 8 blocks 8 blocks   8 seqs 71% hits
H1N1 8 blocks 8 blocks 8 blocks 8 blocks   8 seqs
> pairs(synteny) # scatterplot matrix
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>