R Graphical Manual

Browse All

Last data update: 2014.03.03

R: GFFFile objects

GFFFile-class

R Documentation

GFFFile objects

Description

These functions support the import and export of the GFF format, of which there are three versions and several flavors.

Usage

## S4 method for signature 'GFFFile,ANY,ANY'
import(con, format, text,
           version = c("", "1", "2", "3"),
           genome = NA, colnames = NULL, which = NULL,
           feature.type = NULL, sequenceRegionsAsSeqinfo = FALSE)
import.gff(con, ...)
import.gff1(con, ...)
import.gff2(con, ...)
import.gff3(con, ...)

## S4 method for signature 'ANY,GFFFile,ANY'
export(object, con, format, ...)
## S4 method for signature 'GenomicRanges,GFFFile,ANY'
export(object, con, format,
                   version = c("1", "2", "3"),
                   source = "rtracklayer", append = FALSE, index = FALSE)
## S4 method for signature 'GenomicRangesList,GFFFile,ANY'
export(object, con, format, ...)
export.gff(object, con, ...)
export.gff1(object, con, ...)
export.gff2(object, con, ...)
export.gff3(object, con, ...)

Arguments

`con`	A path, URL, connection or `GFFFile` object. For the functions ending in `.gff`, `.gff1`, etc, the file format is indicated by the function name. For the base `export` and `import` functions, the format must be indicated another way. If `con` is a path, URL or connection, either the file extension or the `format` argument needs to be one of “gff”, “gff1” “gff2”, “gff3”, “gvf”, or “gtf”. Compressed files (“gz”, “bz2” and “xz”) are handled transparently.
`object`	The object to export, should be a `GRanges` or something coercible to a `GRanges`. If the object has a method for `asGFF`, it is called prior to coercion. This makes it possible to export a `GRangesList` or `TxDb` in a way that preserves the hierarchical structure. For exporting multiple tracks, in the UCSC track line metaformat, pass a `GenomicRangesList`, or something coercible to one.
`format`	If not missing, should be one of “gff”, “gff1” “gff2”, “gff3”, “gvf”, or “gtf”.
`version`	If the format is given as “gff”, i.e., it does not specify a version, then this should indicate the GFF version as one of “” (for import only, from the `gff-version` directive in the file or “1” if none), “1”, “2” or “3”.
`text`	If `con` is missing, a character vector to use as the input.
`genome`	The identifier of a genome, or `NA` if unknown. Typically, this is a UCSC identifier like “hg19”. An attempt will be made to derive the `seqinfo` on the return value using either an installed BSgenome package or UCSC, if network access is available.
`colnames`	A character vector naming the columns to parse. These should name either fixed fields, like `source` or `type`, or, for GFF2 and GFF3, any attribute.
`which`	A `GRanges` or other range-based object supported by `findOverlaps`. Only the intervals in the file overlapping the given ranges are returned. This is much more efficient when the file is indexed with the tabix utility.
`feature.type`	`NULL` (the default) or a character vector of valid feature types. If not `NULL`, then only the features of the specified type(s) are imported.
`sequenceRegionsAsSeqinfo`	If `TRUE`, attempt to infer the `Seqinfo` (`seqlevels` and `seqlengths`) from the “##sequence-region” directives as specified by GFF3.
`source`	The value for the source column in GFF. This is typically the name of the package or algorithm that generated the feature.
`index`	If `TRUE`, automatically compress and index the output file with bgzf and tabix. Note that tabix indexing will sort the data by chromosome and start. Tabix supports a single track in a file.
`append`	If `TRUE`, and `con` points to a file path, the data is appended to the file. Obviously, if `con` is a connection, the data is always appended.
`...`	Arguments to pass down to methods to other methods. For import, the flow eventually reaches the `GFFFile` method on `import`. When `trackLine` is `TRUE` or the target format is BED15, the arguments are passed through `export.ucsc`, so track line parameters are supported.

Details

The Generic Feature Format (GFF) format is a tab-separated table of intervals. There are three different versions of GFF, and they all have the same number of columns. In GFF1, the last column is a grouping factor, whereas in the later versions the last column holds application-specific attributes, with some conventions defined for those commonly used. This attribute support facilitates specifying extensions to the format. These include GTF (Gene Transfer Format, an extension of GFF2) and GVF (Genome Variation Format, an extension of GFF3). The rtracklayer package recognizes the “gtf” and “gvf” extensions and parses the extra attributes into columns of the result; however, it does not perform any extension-specific processing. Both GFF1 and GFF2 have been proclaimed obsolete; however, the UCSC Genome Browser only supports GFF1 (and GTF), and GFF2 is still in broad use.

GFF is distinguished from the simpler BED format by its flexible attribute support and its hierarchical structure, as specified by the group column in GFF1 (only one level of grouping) and the Parent attribute in GFF3. GFF2 does not specify a convention for representing hierarchies, although its GTF extension provides this for gene structures. The combination of support for hierarchical data and arbitrary descriptive attributes makes GFF(3) the preferred format for representing gene models.

Although GFF features a score column, large quantitative data belong in a format like BigWig and alignments from high-throughput experiments belong in BAM. For variants, the VCF format (supported by the VariantAnnotation package) seems to be more widely adopted than the GVF extension.

A note on the UCSC track line metaformat: track lines are a means for passing hints to visualization tools like the UCSC Genome Browser and the Integrated Genome Browser (IGB), and they allow multiple tracks to be concatenated in the same file. Since GFF is not a UCSC format, it is not common to annotate GFF data with track lines, but rtracklayer still supports it. To export or import GFF data in the track line format, call export.ucsc or import.ucsc.

The following is the mapping of GFF elements to a GRanges object. NA values are allowed only where indicated. These appear as a “.” in the file. GFF requires that all columns are included, so export generates defaults for missing columns.

seqid, start, end: the ranges component.
source: character vector in the source column; defaults to “rtracklayer” on export.
type: character vector in the type column; defaults to “sequence_feature” in the output, i.e., SO:0000110.
score: numeric vector (NA's allowed) in the score column, accessible via the score accessor; defaults to NA upon export.
strand: strand factor (NA's allowed) in the strand column, accessible via the strand accessor; defaults to NA upon export.
phase: integer vector, either 0, 1 or 2 (NA's allowed); defaults to NA upon export.
group: a factor (GFF1 only); defaults to the seqid (e.g., chromosome) on export.

In GFF versions 2 and 3, attributes map to arbitrary columns in the result. In GFF3, some attributes (Parent, Alias, Note, DBxref and Ontology_term) can have multiple, comma-separated values; these columns are thus always CharacterList objects.

Value

A GRanges with the metadata columns described in the details.

GFFFile objects

The GFFFile class extends RTLFile and is a formal represention of a resource in the GFF format. To cast a path, URL or connection to a GFFFile, pass it to the GFFFile constructor. The GFF1File, GFF2File, GFF3File, GVFFile and GTFFile classes all extend GFFFile and indicate a particular version of the format.

It has the following utility methods:

: genome: Gets the genome identifier from the “genome-build” header directive.

Author(s)

Michael Lawrence

References

GFF1, GFF2: http://www.sanger.ac.uk/resources/software/gff/spec.html
GFF3: http://www.sequenceontology.org/gff3.shtml
GVF: http://www.sequenceontology.org/resources/gvf.html
GTF: http://mblab.wustl.edu/GTF22.html

Examples

  test_path <- system.file("tests", package = "rtracklayer")
  test_gff3 <- file.path(test_path, "genes.gff3")

  ## basic import
  test <- import(test_gff3)
  test

  ## import.gff functions
  import.gff(test_gff3)
  import.gff3(test_gff3)

  ## GFFFile derivatives
  test_gff_file <- GFF3File(test_gff3)
  import(test_gff_file)
  test_gff_file <- GFFFile(test_gff3)
  import(test_gff_file)
  test_gff_file <- GFFFile(test_gff3, version = "3")
  import(test_gff_file)

  ## from connection
  test_gff_con <- file(test_gff3)
  test <- import(test_gff_con, format = "gff")
  close(test_gff_con)

  ## various arguments
  import(test_gff3, genome = "hg19")
  import(test_gff3, colnames = character())
  import(test_gff3, colnames = c("type", "geneName"))

  ## 'which'
  which <- GRanges("chr10:90000-93000")
  import(test_gff3, which = which)

## Not run: 
  ## 'append'
  test_gff3_out <- file.path(tempdir(), "genes.gff3")

  export(test[seqnames(test) == "chr10"], test_gff3_out)
  export(test[seqnames(test) == "chr12"], test_gff3_out, append = TRUE)
  import(test_gff3_out)
  
  ## 'index'
  export(test, test_gff3_out, index = TRUE)
  test_bed_gz <- paste(test_gff3_out, ".gz", sep = "")
  import(test_bed_gz, which = which)

## End(Not run)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(rtracklayer)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/rtracklayer/GFFFile-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GFFFile-class
> ### Title: GFFFile objects
> ### Aliases: class:GFFFile class:GFF1File class:GFF2File class:GFF3File
> ###   class:GVFFile class:GTFFile GFFFile-class GFF1File-class
> ###   GFF2File-class GFF3File-class GVFFile-class GTFFile-class GFFFile
> ###   GFF1File GFF2File GFF3File GVFFile GTFFile
> ###   import,GFFFile,ANY,ANY-method import.gff import.gff1 import.gff2
> ###   import.gff3 import.gff,ANY-method import.gff1,ANY-method
> ###   import.gff2,ANY-method import.gff3,ANY-method
> ###   export,ANY,GFFFile,ANY-method export,GenomicRanges,GFFFile,ANY-method
> ###   export,GenomicRangesList,GFFFile,ANY-method
> ###   export,GRangesList,GFFFile,ANY-method export.gff
> ###   export.gff,ANY-method export.gff1 export.gff1,ANY-method export.gff2
> ###   export.gff2,ANY-method export.gff3 export.gff3,ANY-method
> ###   genome,GFFFile-method
> ### Keywords: methods classes
> 
> ### ** Examples
> 
>   test_path <- system.file("tests", package = "rtracklayer")
>   test_gff3 <- file.path(test_path, "genes.gff3")
> 
>   ## basic import
>   test <- import(test_gff3)
>   test
GRanges object with 31 ranges and 10 metadata columns:
       seqnames         ranges strand |      source     type     score
          <Rle>      <IRanges>  <Rle> |    <factor> <factor> <numeric>
   [1]    chr10 [92828, 95504]      - | rtracklayer     gene         5
   [2]    chr10 [92828, 95178]      - | rtracklayer     mRNA      <NA>
   [3]    chr10 [92828, 95504]      - | rtracklayer     mRNA      <NA>
   [4]    chr10 [92828, 94054]      - | rtracklayer     exon      <NA>
   [5]    chr10 [92997, 94054]      - | rtracklayer      CDS      <NA>
   ...      ...            ...    ... .         ...      ...       ...
  [27]    chr12 [89675, 89827]      + | rtracklayer      CDS      <NA>
  [28]    chr12 [90587, 90655]      + | rtracklayer     exon      <NA>
  [29]    chr12 [90587, 90655]      + | rtracklayer      CDS      <NA>
  [30]    chr12 [90796, 91263]      + | rtracklayer     exon      <NA>
  [31]    chr12 [90796, 91263]      * | rtracklayer      CDS      <NA>
           phase            ID        Name        geneName           Alias
       <integer>   <character> <character>     <character> <CharacterList>
   [1]      <NA> GeneID:347688       TUBB8 tubulin, beta 8  FLJ40100,TUBB8
   [2]      <NA>           873       TUBB8            <NA>                
   [3]      <NA>           872       TUBB8            <NA>                
   [4]      <NA>          <NA>        <NA>            <NA>                
   [5]      <NA>          <NA>        <NA>            <NA>                
   ...       ...           ...         ...             ...             ...
  [27]      <NA>          <NA>        <NA>            <NA>                
  [28]      <NA>          <NA>        <NA>            <NA>                
  [29]      <NA>          <NA>        <NA>            <NA>                
  [30]      <NA>          <NA>        <NA>            <NA>                
  [31]      <NA>          <NA>        <NA>            <NA>                
            genome          Parent
       <character> <CharacterList>
   [1]        hg19                
   [2]        <NA>   GeneID:347688
   [3]        <NA>   GeneID:347688
   [4]        <NA>         872,873
   [5]        <NA>         872,873
   ...         ...             ...
  [27]        <NA>            4644
  [28]        <NA>            4644
  [29]        <NA>            4644
  [30]        <NA>            4644
  [31]        <NA>            4644
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> 
>   ## import.gff functions
>   import.gff(test_gff3)
GRanges object with 31 ranges and 10 metadata columns:
       seqnames         ranges strand |      source     type     score
          <Rle>      <IRanges>  <Rle> |    <factor> <factor> <numeric>
   [1]    chr10 [92828, 95504]      - | rtracklayer     gene         5
   [2]    chr10 [92828, 95178]      - | rtracklayer     mRNA      <NA>
   [3]    chr10 [92828, 95504]      - | rtracklayer     mRNA      <NA>
   [4]    chr10 [92828, 94054]      - | rtracklayer     exon      <NA>
   [5]    chr10 [92997, 94054]      - | rtracklayer      CDS      <NA>
   ...      ...            ...    ... .         ...      ...       ...
  [27]    chr12 [89675, 89827]      + | rtracklayer      CDS      <NA>
  [28]    chr12 [90587, 90655]      + | rtracklayer     exon      <NA>
  [29]    chr12 [90587, 90655]      + | rtracklayer      CDS      <NA>
  [30]    chr12 [90796, 91263]      + | rtracklayer     exon      <NA>
  [31]    chr12 [90796, 91263]      * | rtracklayer      CDS      <NA>
           phase            ID        Name        geneName           Alias
       <integer>   <character> <character>     <character> <CharacterList>
   [1]      <NA> GeneID:347688       TUBB8 tubulin, beta 8  FLJ40100,TUBB8
   [2]      <NA>           873       TUBB8            <NA>                
   [3]      <NA>           872       TUBB8            <NA>                
   [4]      <NA>          <NA>        <NA>            <NA>                
   [5]      <NA>          <NA>        <NA>            <NA>                
   ...       ...           ...         ...             ...             ...
  [27]      <NA>          <NA>        <NA>            <NA>                
  [28]      <NA>          <NA>        <NA>            <NA>                
  [29]      <NA>          <NA>        <NA>            <NA>                
  [30]      <NA>          <NA>        <NA>            <NA>                
  [31]      <NA>          <NA>        <NA>            <NA>                
            genome          Parent
       <character> <CharacterList>
   [1]        hg19                
   [2]        <NA>   GeneID:347688
   [3]        <NA>   GeneID:347688
   [4]        <NA>         872,873
   [5]        <NA>         872,873
   ...         ...             ...
  [27]        <NA>            4644
  [28]        <NA>            4644
  [29]        <NA>            4644
  [30]        <NA>            4644
  [31]        <NA>            4644
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
>   import.gff3(test_gff3)
GRanges object with 31 ranges and 10 metadata columns:
       seqnames         ranges strand |      source     type     score
          <Rle>      <IRanges>  <Rle> |    <factor> <factor> <numeric>
   [1]    chr10 [92828, 95504]      - | rtracklayer     gene         5
   [2]    chr10 [92828, 95178]      - | rtracklayer     mRNA      <NA>
   [3]    chr10 [92828, 95504]      - | rtracklayer     mRNA      <NA>
   [4]    chr10 [92828, 94054]      - | rtracklayer     exon      <NA>
   [5]    chr10 [92997, 94054]      - | rtracklayer      CDS      <NA>
   ...      ...            ...    ... .         ...      ...       ...
  [27]    chr12 [89675, 89827]      + | rtracklayer      CDS      <NA>
  [28]    chr12 [90587, 90655]      + | rtracklayer     exon      <NA>
  [29]    chr12 [90587, 90655]      + | rtracklayer      CDS      <NA>
  [30]    chr12 [90796, 91263]      + | rtracklayer     exon      <NA>
  [31]    chr12 [90796, 91263]      * | rtracklayer      CDS      <NA>
           phase            ID        Name        geneName           Alias
       <integer>   <character> <character>     <character> <CharacterList>
   [1]      <NA> GeneID:347688       TUBB8 tubulin, beta 8  FLJ40100,TUBB8
   [2]      <NA>           873       TUBB8            <NA>                
   [3]      <NA>           872       TUBB8            <NA>                
   [4]      <NA>          <NA>        <NA>            <NA>                
   [5]      <NA>          <NA>        <NA>            <NA>                
   ...       ...           ...         ...             ...             ...
  [27]      <NA>          <NA>        <NA>            <NA>                
  [28]      <NA>          <NA>        <NA>            <NA>                
  [29]      <NA>          <NA>        <NA>            <NA>                
  [30]      <NA>          <NA>        <NA>            <NA>                
  [31]      <NA>          <NA>        <NA>            <NA>                
            genome          Parent
       <character> <CharacterList>
   [1]        hg19                
   [2]        <NA>   GeneID:347688
   [3]        <NA>   GeneID:347688
   [4]        <NA>         872,873
   [5]        <NA>         872,873
   ...         ...             ...
  [27]        <NA>            4644
  [28]        <NA>            4644
  [29]        <NA>            4644
  [30]        <NA>            4644
  [31]        <NA>            4644
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> 
>   ## GFFFile derivatives
>   test_gff_file <- GFF3File(test_gff3)
>   import(test_gff_file)
GRanges object with 31 ranges and 10 metadata columns:
       seqnames         ranges strand |      source     type     score
          <Rle>      <IRanges>  <Rle> |    <factor> <factor> <numeric>
   [1]    chr10 [92828, 95504]      - | rtracklayer     gene         5
   [2]    chr10 [92828, 95178]      - | rtracklayer     mRNA      <NA>
   [3]    chr10 [92828, 95504]      - | rtracklayer     mRNA      <NA>
   [4]    chr10 [92828, 94054]      - | rtracklayer     exon      <NA>
   [5]    chr10 [92997, 94054]      - | rtracklayer      CDS      <NA>
   ...      ...            ...    ... .         ...      ...       ...
  [27]    chr12 [89675, 89827]      + | rtracklayer      CDS      <NA>
  [28]    chr12 [90587, 90655]      + | rtracklayer     exon      <NA>
  [29]    chr12 [90587, 90655]      + | rtracklayer      CDS      <NA>
  [30]    chr12 [90796, 91263]      + | rtracklayer     exon      <NA>
  [31]    chr12 [90796, 91263]      * | rtracklayer      CDS      <NA>
           phase            ID        Name        geneName           Alias
       <integer>   <character> <character>     <character> <CharacterList>
   [1]      <NA> GeneID:347688       TUBB8 tubulin, beta 8  FLJ40100,TUBB8
   [2]      <NA>           873       TUBB8            <NA>                
   [3]      <NA>           872       TUBB8            <NA>                
   [4]      <NA>          <NA>        <NA>            <NA>                
   [5]      <NA>          <NA>        <NA>            <NA>                
   ...       ...           ...         ...             ...             ...
  [27]      <NA>          <NA>        <NA>            <NA>                
  [28]      <NA>          <NA>        <NA>            <NA>                
  [29]      <NA>          <NA>        <NA>            <NA>                
  [30]      <NA>          <NA>        <NA>            <NA>                
  [31]      <NA>          <NA>        <NA>            <NA>                
            genome          Parent
       <character> <CharacterList>
   [1]        hg19                
   [2]        <NA>   GeneID:347688
   [3]        <NA>   GeneID:347688
   [4]        <NA>         872,873
   [5]        <NA>         872,873
   ...         ...             ...
  [27]        <NA>            4644
  [28]        <NA>            4644
  [29]        <NA>            4644
  [30]        <NA>            4644
  [31]        <NA>            4644
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
>   test_gff_file <- GFFFile(test_gff3)
>   import(test_gff_file)
GRanges object with 31 ranges and 10 metadata columns:
       seqnames         ranges strand |      source     type     score
          <Rle>      <IRanges>  <Rle> |    <factor> <factor> <numeric>
   [1]    chr10 [92828, 95504]      - | rtracklayer     gene         5
   [2]    chr10 [92828, 95178]      - | rtracklayer     mRNA      <NA>
   [3]    chr10 [92828, 95504]      - | rtracklayer     mRNA      <NA>
   [4]    chr10 [92828, 94054]      - | rtracklayer     exon      <NA>
   [5]    chr10 [92997, 94054]      - | rtracklayer      CDS      <NA>
   ...      ...            ...    ... .         ...      ...       ...
  [27]    chr12 [89675, 89827]      + | rtracklayer      CDS      <NA>
  [28]    chr12 [90587, 90655]      + | rtracklayer     exon      <NA>
  [29]    chr12 [90587, 90655]      + | rtracklayer      CDS      <NA>
  [30]    chr12 [90796, 91263]      + | rtracklayer     exon      <NA>
  [31]    chr12 [90796, 91263]      * | rtracklayer      CDS      <NA>
           phase            ID        Name        geneName           Alias
       <integer>   <character> <character>     <character> <CharacterList>
   [1]      <NA> GeneID:347688       TUBB8 tubulin, beta 8  FLJ40100,TUBB8
   [2]      <NA>           873       TUBB8            <NA>                
   [3]      <NA>           872       TUBB8            <NA>                
   [4]      <NA>          <NA>        <NA>            <NA>                
   [5]      <NA>          <NA>        <NA>            <NA>                
   ...       ...           ...         ...             ...             ...
  [27]      <NA>          <NA>        <NA>            <NA>                
  [28]      <NA>          <NA>        <NA>            <NA>                
  [29]      <NA>          <NA>        <NA>            <NA>                
  [30]      <NA>          <NA>        <NA>            <NA>                
  [31]      <NA>          <NA>        <NA>            <NA>                
            genome          Parent
       <character> <CharacterList>
   [1]        hg19                
   [2]        <NA>   GeneID:347688
   [3]        <NA>   GeneID:347688
   [4]        <NA>         872,873
   [5]        <NA>         872,873
   ...         ...             ...
  [27]        <NA>            4644
  [28]        <NA>            4644
  [29]        <NA>            4644
  [30]        <NA>            4644
  [31]        <NA>            4644
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
>   test_gff_file <- GFFFile(test_gff3, version = "3")
>   import(test_gff_file)
GRanges object with 31 ranges and 10 metadata columns:
       seqnames         ranges strand |      source     type     score
          <Rle>      <IRanges>  <Rle> |    <factor> <factor> <numeric>
   [1]    chr10 [92828, 95504]      - | rtracklayer     gene         5
   [2]    chr10 [92828, 95178]      - | rtracklayer     mRNA      <NA>
   [3]    chr10 [92828, 95504]      - | rtracklayer     mRNA      <NA>
   [4]    chr10 [92828, 94054]      - | rtracklayer     exon      <NA>
   [5]    chr10 [92997, 94054]      - | rtracklayer      CDS      <NA>
   ...      ...            ...    ... .         ...      ...       ...
  [27]    chr12 [89675, 89827]      + | rtracklayer      CDS      <NA>
  [28]    chr12 [90587, 90655]      + | rtracklayer     exon      <NA>
  [29]    chr12 [90587, 90655]      + | rtracklayer      CDS      <NA>
  [30]    chr12 [90796, 91263]      + | rtracklayer     exon      <NA>
  [31]    chr12 [90796, 91263]      * | rtracklayer      CDS      <NA>
           phase            ID        Name        geneName           Alias
       <integer>   <character> <character>     <character> <CharacterList>
   [1]      <NA> GeneID:347688       TUBB8 tubulin, beta 8  FLJ40100,TUBB8
   [2]      <NA>           873       TUBB8            <NA>                
   [3]      <NA>           872       TUBB8            <NA>                
   [4]      <NA>          <NA>        <NA>            <NA>                
   [5]      <NA>          <NA>        <NA>            <NA>                
   ...       ...           ...         ...             ...             ...
  [27]      <NA>          <NA>        <NA>            <NA>                
  [28]      <NA>          <NA>        <NA>            <NA>                
  [29]      <NA>          <NA>        <NA>            <NA>                
  [30]      <NA>          <NA>        <NA>            <NA>                
  [31]      <NA>          <NA>        <NA>            <NA>                
            genome          Parent
       <character> <CharacterList>
   [1]        hg19                
   [2]        <NA>   GeneID:347688
   [3]        <NA>   GeneID:347688
   [4]        <NA>         872,873
   [5]        <NA>         872,873
   ...         ...             ...
  [27]        <NA>            4644
  [28]        <NA>            4644
  [29]        <NA>            4644
  [30]        <NA>            4644
  [31]        <NA>            4644
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> 
>   ## from connection
>   test_gff_con <- file(test_gff3)
>   test <- import(test_gff_con, format = "gff")
Warning in readGFF(filepath, version = version, filter = filter) :
  connection is not positioned at the start of the file, rewinding it
>   close(test_gff_con)
> 
>   ## various arguments
>   import(test_gff3, genome = "hg19")
GRanges object with 31 ranges and 10 metadata columns:
       seqnames         ranges strand |      source     type     score
          <Rle>      <IRanges>  <Rle> |    <factor> <factor> <numeric>
   [1]    chr10 [92828, 95504]      - | rtracklayer     gene         5
   [2]    chr10 [92828, 95178]      - | rtracklayer     mRNA      <NA>
   [3]    chr10 [92828, 95504]      - | rtracklayer     mRNA      <NA>
   [4]    chr10 [92828, 94054]      - | rtracklayer     exon      <NA>
   [5]    chr10 [92997, 94054]      - | rtracklayer      CDS      <NA>
   ...      ...            ...    ... .         ...      ...       ...
  [27]    chr12 [89675, 89827]      + | rtracklayer      CDS      <NA>
  [28]    chr12 [90587, 90655]      + | rtracklayer     exon      <NA>
  [29]    chr12 [90587, 90655]      + | rtracklayer      CDS      <NA>
  [30]    chr12 [90796, 91263]      + | rtracklayer     exon      <NA>
  [31]    chr12 [90796, 91263]      * | rtracklayer      CDS      <NA>
           phase            ID        Name        geneName           Alias
       <integer>   <character> <character>     <character> <CharacterList>
   [1]      <NA> GeneID:347688       TUBB8 tubulin, beta 8  FLJ40100,TUBB8
   [2]      <NA>           873       TUBB8            <NA>                
   [3]      <NA>           872       TUBB8            <NA>                
   [4]      <NA>          <NA>        <NA>            <NA>                
   [5]      <NA>          <NA>        <NA>            <NA>                
   ...       ...           ...         ...             ...             ...
  [27]      <NA>          <NA>        <NA>            <NA>                
  [28]      <NA>          <NA>        <NA>            <NA>                
  [29]      <NA>          <NA>        <NA>            <NA>                
  [30]      <NA>          <NA>        <NA>            <NA>                
  [31]      <NA>          <NA>        <NA>            <NA>                
            genome          Parent
       <character> <CharacterList>
   [1]        hg19                
   [2]        <NA>   GeneID:347688
   [3]        <NA>   GeneID:347688
   [4]        <NA>         872,873
   [5]        <NA>         872,873
   ...         ...             ...
  [27]        <NA>            4644
  [28]        <NA>            4644
  [29]        <NA>            4644
  [30]        <NA>            4644
  [31]        <NA>            4644
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome
>   import(test_gff3, colnames = character())
GRanges object with 31 ranges and 0 metadata columns:
       seqnames         ranges strand
          <Rle>      <IRanges>  <Rle>
   [1]    chr10 [92828, 95504]      -
   [2]    chr10 [92828, 95178]      -
   [3]    chr10 [92828, 95504]      -
   [4]    chr10 [92828, 94054]      -
   [5]    chr10 [92997, 94054]      -
   ...      ...            ...    ...
  [27]    chr12 [89675, 89827]      +
  [28]    chr12 [90587, 90655]      +
  [29]    chr12 [90587, 90655]      +
  [30]    chr12 [90796, 91263]      +
  [31]    chr12 [90796, 91263]      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
>   import(test_gff3, colnames = c("type", "geneName"))
GRanges object with 31 ranges and 2 metadata columns:
       seqnames         ranges strand |     type        geneName
          <Rle>      <IRanges>  <Rle> | <factor>     <character>
   [1]    chr10 [92828, 95504]      - |     gene tubulin, beta 8
   [2]    chr10 [92828, 95178]      - |     mRNA            <NA>
   [3]    chr10 [92828, 95504]      - |     mRNA            <NA>
   [4]    chr10 [92828, 94054]      - |     exon            <NA>
   [5]    chr10 [92997, 94054]      - |      CDS            <NA>
   ...      ...            ...    ... .      ...             ...
  [27]    chr12 [89675, 89827]      + |      CDS            <NA>
  [28]    chr12 [90587, 90655]      + |     exon            <NA>
  [29]    chr12 [90587, 90655]      + |      CDS            <NA>
  [30]    chr12 [90796, 91263]      + |     exon            <NA>
  [31]    chr12 [90796, 91263]      * |      CDS            <NA>
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> 
>   ## 'which'
>   which <- GRanges("chr10:90000-93000")
>   import(test_gff3, which = which)
GRanges object with 5 ranges and 10 metadata columns:
      seqnames         ranges strand |      source     type     score     phase
         <Rle>      <IRanges>  <Rle> |    <factor> <factor> <numeric> <integer>
  [1]    chr10 [92828, 95504]      - | rtracklayer     gene         5      <NA>
  [2]    chr10 [92828, 95178]      - | rtracklayer     mRNA      <NA>      <NA>
  [3]    chr10 [92828, 95504]      - | rtracklayer     mRNA      <NA>      <NA>
  [4]    chr10 [92828, 94054]      - | rtracklayer     exon      <NA>      <NA>
  [5]    chr10 [92997, 94054]      - | rtracklayer      CDS      <NA>      <NA>
                 ID        Name        geneName           Alias      genome
        <character> <character>     <character> <CharacterList> <character>
  [1] GeneID:347688       TUBB8 tubulin, beta 8  FLJ40100,TUBB8        hg19
  [2]           873       TUBB8            <NA>                        <NA>
  [3]           872       TUBB8            <NA>                        <NA>
  [4]          <NA>        <NA>            <NA>                        <NA>
  [5]          <NA>        <NA>            <NA>                        <NA>
               Parent
      <CharacterList>
  [1]                
  [2]   GeneID:347688
  [3]   GeneID:347688
  [4]         872,873
  [5]         872,873
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> 
> ## Not run: 
> ##D   ## 'append'
> ##D   test_gff3_out <- file.path(tempdir(), "genes.gff3")
> ##D 
> ##D   export(test[seqnames(test) == "chr10"], test_gff3_out)
> ##D   export(test[seqnames(test) == "chr12"], test_gff3_out, append = TRUE)
> ##D   import(test_gff3_out)
> ##D   
> ##D   ## 'index'
> ##D   export(test, test_gff3_out, index = TRUE)
> ##D   test_bed_gz <- paste(test_gff3_out, ".gz", sep = "")
> ##D   import(test_bed_gz, which = which)
> ## End(Not run)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>