R: Find closest refseq gene
Find closest refseq gene


This function is used to find the nearest refseq transcript(s) to a point in the genome specified. Note that it is limited to the refseq transcripts listed at, where this function goes for information.


findClosestGene(chrom, pos, genome = "hg17", position = "txStart")



Usually specified like 'chr1', 'chr2', etc.


A position in base pairs in the genome


Something like 'hg16', 'hg17', 'mm6', etc.


The location to measure distance from: one of 'txStart', 'txEnd', 'cdsStart', 'cdsEnd'


The first time the function is run, it checks to see if the refflat table for the given genome is present in the package environment. If not, it downloads it to the /tmp directory and gunzips it (using getRefflat. It is then stored so that in future calls, there is no re-download required.


A data frame with the gene name, refseq id(s), txStart, txEnd, cdsStart, cdsEnd, exon count, and distance. Note that distance is measured as pos-position, so negative values mean that the point in the gene is to the left of the point specified in the function call (with the p-tel on the left).


The function may return more than one transcript, as several transcripts may have the same start site


Sean Davis <>




> findClosestGene('chr1',100000000,'hg17')
trying URL ''
Content type 'application/x-gzip' length 3423284 bytes (3.3 MB)
downloaded 3.3 MB

     geneName      name chrom strand   txStart     txEnd  cdsStart    cdsEnd
2388      AGL NM_000642  chr1      + 100027660 100101600 100028619 100099228
     exonCount Distance
2388        34    27660
