R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Randomly Divide Data into Training and Test Sets

get.test

R Documentation

Randomly Divide Data into Training and Test Sets

Description

Uses random selection to split a dataset into training and test data sets

Usage

get.test(proportion.test, qdatafn = NULL, seed = NULL, folder=NULL, 
qdata.trainfn = paste(strsplit(qdatafn, split = ".csv")[[1]], "_train.csv", sep = ""), 
qdata.testfn = paste(strsplit(qdatafn, split = ".csv")[[1]], "_test.csv", sep = ""))

Arguments

`proportion.test`	Number. The proportion of the training data that will be randomly extracted for use as a test set. Value between 0 and 1.
`qdatafn`	String. The name (basename or full path) of the data file to be split into training and test data. This data should include both response and predictor variables. The file must be a comma-delimited file `*.csv)` with column headings and the predictor names in the file must match the raster layer files, if applying predictions (`predict = TRUE`). If `NULL` (the default), a GUI interface prompts user to browse to the data file.
`seed`	Integer. The number used to initialize randomization to randomly select rows for a test data set. If you want to produce the same model later, use the same seed. If `seed = NULL` (the default), a new one is created each time.
`folder`	String. The folder used for all output from predictions and/or maps. Do not add ending slash to path string. If `folder = NULL` (default), a GUI interface prompts user to browse to a folder. To use the working directory, specify `folder = getwd()`.
`qdata.trainfn`	String. The name of the file output of training data. By default, `_train` appended after `qdatafn`.
`qdata.testfn`	String. The name of the file output of test data. By default, `_test` appended after `qdatafn`.

Details

This function should be run once, before starting analysis to create training and test sets. If the cross validation option is to be used with RF or SGB models, or if the OOB option is to be used for RF models, then this step is unnecessary.

Value

Outputs a training data file and test data file. Unless qdata.trainfn or qdata.testfn are specified, the output will be located in the same folder as the original data file (qdatafn). The output will have the same rows and columns as the original data.

Author(s)

Elizabeth Freeman

Examples


qdatafn<-system.file("extdata", "helpexamples","DATATRAIN.csv", package = "ModelMap")

qdata<-read.table(file=qdatafn,sep=",",header=TRUE,check.names=FALSE)

get.test(	proportion.test=0.2,
		qdatafn=qdatafn,
		seed=42,
		folder=getwd(),
		qdata.trainfn="example.train.csv",
		qdata.testfn="example.test.csv")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ModelMap)
Loading required package: randomForest
randomForest 4.6-12
Type rfNews() to see new features/changes/bug fixes.
Loading required package: raster
Loading required package: sp
Loading required package: rgdal
rgdal: version: 1.1-10, (SVN revision 622)
 Geospatial Data Abstraction Library extensions to R successfully loaded
 Loaded GDAL runtime: GDAL 1.11.3, released 2015/09/16
 Path to GDAL shared files: /usr/share/gdal/1.11
 Loaded PROJ.4 runtime: Rel. 4.9.2, 08 September 2015, [PJ_VERSION: 492]
 Path to PROJ.4 shared files: (autodetected)
 Linking to sp version: 1.2-3 
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/ModelMap/get.test.Rd_%03d_medium.png", width=480, height=480)
> ### Name: get.test
> ### Title: Randomly Divide Data into Training and Test Sets
> ### Aliases: get.test
> ### Keywords: models
> 
> ### ** Examples
> 
> 
> qdatafn<-system.file("extdata", "helpexamples","DATATRAIN.csv", package = "ModelMap")
> 
> qdata<-read.table(file=qdatafn,sep=",",header=TRUE,check.names=FALSE)
> 
> get.test(	proportion.test=0.2,
+ 		qdatafn=qdatafn,
+ 		seed=42,
+ 		folder=getwd(),
+ 		qdata.trainfn="example.train.csv",
+ 		qdata.testfn="example.test.csv")
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>