Last data update: 2014.03.03

R: Input kallisto or kallisto bootstrap results.
readKallistoR Documentation

Input kallisto or kallisto bootstrap results.

Description

readKallisto inputs several kallisto output files into a single SummarizedExperiment instance, with rows corresponding to estimated transcript abundance and columns to samples. readKallistoBootstrap inputs kallisto bootstrap replicates of a single sample into a matrix of transcript x bootstrap abundance estimates.

Usage

readKallisto(files,
    json = file.path(dirname(files), "run_info.json"), 
    h5 = any(grepl("\.h5$", files)), what = KALLISTO_ASSAYS,
    as = c("SummarizedExperiment", "list", "matrix"))

readKallistoBootstrap(file, i, j)

Arguments

files

character() paths to kallisto ‘abundance.tsv’ output files. The assumption is that files are organized in the way implied by kallisto, with each sample in a distinct directory, and the directory containing files abundance.tsv, run_info.json, and perhaps abundance.h5.

json

character() vector of the same length as files specifying the location of JSON files produced by kallisto and containing information on the run. The default assumes that json files are in the same directory as the corresponding abundance file.

h5

character() vector of the same length as files specifying the location of HDF5 files produced by kallisto and containing bootstrap estimates. The default assumes that HDF5 files are in the same directory as the corresponding abundance file.

what

character() vector of kallisto per-sample outputs to be input. See KALLISTO_ASSAYS for available values.

as

character(1) specifying the output format. See Value for additional detail.

file

character(1) path to a single HDF5 output file.

i, j

integer() vector of row (i) and column (j) indexes to input.

Value

A SummarizedExperiment, list, or matrix, depending on the value of argument as; by default a SummarizedExperiment. The as="SummarizedExperiment" rowData(se) the length of each transcript; colData(se) includes summary information on each sample, including the number of targets and bootstraps, the kallisto and index version, the start time and operating system call used to create the file. assays() contains one or more transcript x sample matrices of parameters estimated by kallisto (see KALLISTO_ASSAYS).

as="list" return value contains information simillar to SummarizedExperiment with row, column and assay data as elements of the list without coordination of row and column annotations into an integrated data container. as="matrix" returns the specified assay as a simple R matrix.

Author(s)

Martin Morgan martin.morgan@roswellpark.org

References

http://pachterlab.github.io/kallisto software for quantifying transcript abundance.

Examples

outputs <- system.file(package="SummarizedExperiment", "extdata",
    "kallisto")
files <- dir(outputs, pattern="abundance.tsv", full=TRUE, recursive=TRUE)
stopifnot(all(file.exists(files)))

## default: input 'est_counts'
(se <- readKallisto(files, as="SummarizedExperiment"))
str(readKallisto(files, as="list"))
str(readKallisto(files, as="matrix"))

## available assays
KALLISTO_ASSAYS
## one or more assay
readKallisto(files, what=c("tpm", "eff_length"))

## alternatively: read hdf5 files
files <- sub(".tsv", ".h5", files, fixed=TRUE)
readKallisto(files)

## input all bootstraps
xx <- readKallistoBootstrap(files[1])
ridx <- head(which(rowSums(xx) != 0), 3)
cidx <- c(1:5, 96:100)
xx[ridx, cidx]

## selective input of rows (transcripts) and/or bootstraps
readKallistoBootstrap(files[1], i=c(ridx, rev(ridx)), j=cidx)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SummarizedExperiment)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/SummarizedExperiment/readKallisto.Rd_%03d_medium.png", width=480, height=480)
> ### Name: readKallisto
> ### Title: Input kallisto or kallisto bootstrap results.
> ### Aliases: readKallisto readKallistoBootstrap KALLISTO_ASSAYS
> ### Keywords: file
> 
> ### ** Examples
> 
> outputs <- system.file(package="SummarizedExperiment", "extdata",
+     "kallisto")
> files <- dir(outputs, pattern="abundance.tsv", full=TRUE, recursive=TRUE)
> stopifnot(all(file.exists(files)))
> 
> ## default: input 'est_counts'
> (se <- readKallisto(files, as="SummarizedExperiment"))
Loading required namespace: jsonlite
class: SummarizedExperiment 
dim: 2858 1 
metadata(0):
assays(1): est_counts
rownames(2858): uc010tkp.2 uc001vuz.1 ... uc001ysz.3 uc001yta.1
rowData names(1): length
colnames(1): kallisto
colData names(6): n_targets n_bootstraps ... start_time call
> str(readKallisto(files, as="list"))
List of 3
 $ colData   :'data.frame':	1 obs. of  6 variables:
  ..$ n_targets       : int 2858
  ..$ n_bootstraps    : int 100
  ..$ kallisto_version: chr "0.42.2"
  ..$ index_version   : int 10
  ..$ start_time      : chr "Wed Jun 10 15:03:26 2015"
  ..$ call            : chr "../kallisto/build/src/kallisto quant -i hg19chr14transcripts.idx -o output -b 100 --single -l 150 /dev/fd/63"
 $ rowData   :'data.frame':	2858 obs. of  1 variable:
  ..$ length: int [1:2858] 981 31578 31578 31578 41 30 32 28 37 37 ...
 $ est_counts: num [1:2858, 1] 0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:2858] "uc010tkp.2" "uc001vuz.1" "uc001vva.1" "uc010ahc.1" ...
  .. ..$ : chr "kallisto"
> str(readKallisto(files, as="matrix"))
 num [1:2858, 1] 0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2858] "uc010tkp.2" "uc001vuz.1" "uc001vva.1" "uc010ahc.1" ...
  ..$ : chr "kallisto"
> 
> ## available assays
> KALLISTO_ASSAYS
[1] "est_counts" "tpm"        "eff_length"
> ## one or more assay
> readKallisto(files, what=c("tpm", "eff_length"))
class: SummarizedExperiment 
dim: 2858 1 
metadata(0):
assays(2): tpm eff_length
rownames(2858): uc010tkp.2 uc001vuz.1 ... uc001ysz.3 uc001yta.1
rowData names(1): length
colnames(1): kallisto
colData names(6): n_targets n_bootstraps ... start_time call
> 
> ## alternatively: read hdf5 files
> files <- sub(".tsv", ".h5", files, fixed=TRUE)
> readKallisto(files)
Loading required namespace: rhdf5
class: SummarizedExperiment 
dim: 2858 1 
metadata(0):
assays(1): est_counts
rownames(2858): uc010tkp.2 uc001vuz.1 ... uc001ysz.3 uc001yta.1
rowData names(1): length
colnames(1): kallisto
colData names(6): n_targets n_bootstraps ... start_time call
> 
> ## input all bootstraps
> xx <- readKallistoBootstrap(files[1])
> ridx <- head(which(rowSums(xx) != 0), 3)
> cidx <- c(1:5, 96:100)
> xx[ridx, cidx]
                 bs0       bs1       bs2       bs3       bs4      bs95
uc001vvl.4 2267.7185 2339.5309 2224.4502 2222.3822 2379.8936 2245.8783
uc021rnj.1  551.4016  380.3495  483.0673  407.5056  322.3347  453.9748
uc001vxb.1  269.3232  243.3441  251.5847  250.6204  237.7920  236.1460
                bs96      bs97      bs98      bs99
uc001vvl.4 2293.7795 2178.0311 2070.2899 2275.7140
uc021rnj.1  352.9838  530.6826  418.8782  427.4371
uc001vxb.1  249.9636  247.1526  271.6795  249.3429
> 
> ## selective input of rows (transcripts) and/or bootstraps
> readKallistoBootstrap(files[1], i=c(ridx, rev(ridx)), j=cidx)
                 bs0       bs1       bs2       bs3       bs4      bs95
uc001vvl.4 2267.7185 2339.5309 2224.4502 2222.3822 2379.8936 2245.8783
uc021rnj.1  551.4016  380.3495  483.0673  407.5056  322.3347  453.9748
uc001vxb.1  269.3232  243.3441  251.5847  250.6204  237.7920  236.1460
uc001vxb.1  269.3232  243.3441  251.5847  250.6204  237.7920  236.1460
uc021rnj.1  551.4016  380.3495  483.0673  407.5056  322.3347  453.9748
uc001vvl.4 2267.7185 2339.5309 2224.4502 2222.3822 2379.8936 2245.8783
                bs96      bs97      bs98      bs99
uc001vvl.4 2293.7795 2178.0311 2070.2899 2275.7140
uc021rnj.1  352.9838  530.6826  418.8782  427.4371
uc001vxb.1  249.9636  247.1526  271.6795  249.3429
uc001vxb.1  249.9636  247.1526  271.6795  249.3429
uc021rnj.1  352.9838  530.6826  418.8782  427.4371
uc001vvl.4 2293.7795 2178.0311 2070.2899 2275.7140
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>