Last data update: 2014.03.03

R: Map Making
model.mapmakeR Documentation

Map Making

Description

Applies models to either ERDAS Imagine image (.img) files or ESRI Grids of predictors to create detailed prediction surfaces. It will handle large predictor files for map making, by reading in the .img files in rows, and output to the .img file the prediction for each data row, before reading the next row of data.

Usage

model.mapmake(model.obj= NULL, folder = NULL, MODELfn = NULL, 
rastLUTfn = NULL, na.action = NULL, na.value=-9999, keep.predictor.brick=FALSE, 
map.sd = FALSE, OUTPUTfn = NULL, quantiles=NULL, n.trees = NULL)

Arguments

model.obj

R model object. The model object to use for prediction. The model object must be of type "RF" (random forest), "QRF" (quantile random forest), "CF" (conditional forest), or "SGB" (stochastic gradient boosting).

folder

String. The folder used for all output from predictions and/or maps. Do not add ending slash to path string. If folder = NULL (default), a GUI interface prompts user to browse to a folder. To use the working directory, specify folder = getwd().

MODELfn

String. The file name to use to save the generated model object. If MODELfn = NULL (the default), a default name is generated by pasting model.type_response.type_response.name. If the other output filenames are left unspecified, MODELfn will be used as the basic name to generate other output filenames. The filename can be the full path, or it can be the simple basename, in which case the output will be to the folder specified by folder.

rastLUTfn

String. The file name (full path or base name with path specified by folder) of a .csv file for a rastLUT. Alternatively, a dataframe containing the same information. The rastLUT must include 3 columns: (1) the full path and name of the raster file; (2) the shortname of each predictor / raster layer (band); (3) the layer (band) number. The shortname (column 2) must match the names predList, the predictor column names in training/test data set (qdata.trainfn and qdata.testfn, and the predictor names in model.obj.

Example of comma-delimited file:

C:/button_test/tc99_2727subset.img, tc99_2727subsetb1, 1
C:/button_test/tc99_2727subset.img, tc99_2727subsetb2, 2
C:/button_test/tc99_2727subset.img, tc99_2727subsetb3, 3
na.action

String. Model validation. Specifies the action to take if there are NA values in the prediction data or if there is a level or class of a categorical predictor variable in the validation test set or the production (mapping) data set, but not in the training data set. Currently, the only supported option for map making is na.action = "na.omit" (the default) where any data point or pixel with any new levels for any of the factored predictors is returned as NA (na.value, defaults to -9999).

na.value

Number. Value that indicates NA in the predictor rasters. Defaults to -9999. Note: all predictor rasters must use the same value for NA.

keep.predictor.brick

Logical. Map Production. If TRUE then the raster brick containing the predictors from the model object is saved as a native raster package format file. If FALSE a temporary brick is created and then deleted at the end of map production.

map.sd

Logical. Map Production. If map.sd = TRUE, maps of mean, standard deviation, and coefficient of variation of the predictions from all the trees are generated for each pixel. If map.sd = FALSE (the default), only the predicted probability map will be built.

This option is only available if the model.type = "RF" and the response.type = "continuous". Note: This option requires much more available memory.

The names of the additional maps default to:

folder/model.type_response.type_response.name_mean.img
folder/model.type_response.type_response.name_stdev.img
folder/model.type_response.type_response.name_coefv.img
OUTPUTfn

String. Map Production. Filename of output file for map production. The filename can be the full path, or it can be the simple basename, in which case the output will be to the folder specified by folder. If OUTPUTfn = NULL (the default), a name is created by pasting modelfn and "_map".

The raster package uses the filename extension to determine the output format of the map object. See the raster package function writeFormats for a list of valid write formats. Because of this, only use the "." in output filenames if it is indicating a valid file extension.

If the output filename does not include an extension, the default extension of ".img" will be added.

For continuous random forest models with map.sd = TRUE then OUTPUTfn is also used to create output file names for maps of the mean, standard deviation and coeficient of variation of all the trees predictions for each pixel.

quantiles

Numeric Vector. QRF models. The quantiles to predict. A numeric vector with values between zero and one. The quantile map output will be a multilayer raster with one layer for each quantile. If model.obj was created with the ModelMap package, there will also be a single layer outputraster giving the ordinary RF mean predicted values.

n.trees

Integer. SGB models. The number of stochastic gradient boosting trees for an SGB model. If n.trees=NULL (the default) the model creation code will increase the number of trees 100 at a time until OOB error rate stops improving. The gbm function gbm.perf() will be used to select from the total calculated trees, the best number of trees for model predictions, with argument method="OOB". The gbm package warns that OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive.

Details

model.mapmake() can be run in a traditional R command mode, where all arguments are specified in the function call. However it can also be used in a full push button mode, where you type in the simple command model.mapmake(), and GUI pop up windows will ask questions about the type of model, the file locations of the data, etc...

When running model.mapmake() on non-Windows platforms, file names and folders need to be specified in the argument list, but other pushbutton selections are handled by the select.list() function, which is platform independent.

The R package raster is used to read spatial rasters into R. The data for production mapping should be in the form of pixel-based raster layers representing the predictors in the model. If there is more than one predictor or raster layer, the layers must all have the same number of columns and rows. The layers must also have the same extent, projection, and pixel size, for effective model development and accuracy. The raster package function compareRaster() is used to check predictor layers for consistency.

The layers must also be in (single or multi-band) raster data formats that can be read by package raster, for example ESRI Grid or ERDAS Imagine image files. The predictor layers must have continuous or categorical data values. See writeRaster for a list of available formats.

To improve processing speed, the raster package is used to create a raster brick object with a layer for each predictor in the model. By default, this brick is a temporary file that is automatically deleated as soon as the map is completed. If keep.predictor.brick=TRUE, the predictor brick with be saved as a native raster package file, with a file name created by appending '_brick' to the OUTPUTfn. Warning: these bricks can be quite large, as they contain all the predictor data for every pixel in the map.

When creating maps of non-rectangular study regions there may be large portions of the rectangle where you have no predictors, and are uninterested in making predictions. The suggested value for the pixels outside the study area is -9999. These pixels will be ignored in the predictions, thus saving computing time.

The function model.mapmake() outputs an rater file of map information suitable to be imported into a GIS. Maps can also be imported back into R using the function raster() from the raster package. The file extension of OUTPUTfn determines the write format. If OUTPUTfn does not include a file extension, output will default to an ERDAS Imagine image file with extension ".img"

For Binary response models the output is in the form of predicted probability of presence for each pixel. For Continuous response models the output is the predicted value for each pixel. For Categorical response models the map output depends on the category labels. If the categorical response variable is numeric, the map output will use the original numeric categories. If the categories are non-numeric (for example, character strings), map output is in the form of integer class codes for each pixel, coded for each level of the factored response, and a CSV file containing a look up table is also generated to associate the integer codes with the original values of the response categories.

The first predictor from predList is used to determine projection of output Imagine Image file.

Value

The model.mapmake() function does not return a value, instead it writes a raster file of map information (suitable for importing into a GIS) to the specified folder. The output raster is saved in the format specifed by the file extension of OUTPUTfn

The model.mapmake() function also writes a text file listing the projections of all predictor rasters.

For categorical response models, a csv file map key is written giving the integer code associated with each response category.

If keep.predictor.brick = TRUE then a raster brick of all the predictor rasters from the model is also saved to the specified folder. If keep.predictor.brick = FALSE (the default) then the predictor brick is written to a temprary file, and deleted. Warning: the predictor bricks can be quite large, and saving them can require quite a bit of memory.

Note

If model.mapmake() is interupted it may leave orphan .gri and .grd files in your temporary directory. The raster package functions showTmpFiles and removeTmpFiles can be used to locate and remove these files, or they can be deleated manually from the temporary directory.

Author(s)

Elizabeth Freeman and Tracey Frescino

References

Breiman, L. (2001) Random Forests. Machine Learning, 45:5-32.

Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Ann. Stat., 29(5):1189-1232.

Friedman, J.H. (2002). Stochastic gradient boosting. Comput. Stat. Data An., 38(4):367-378.

Liaw, A. and Wiener, M. (2002). Classification and Regression by randomForest. R News 2(3), 18–22.

Ridgeway, G., (1999). The state of boosting. Comp. Sci. Stat. 31:172-181

Simpson, E. H. (1949). Measurement of diversity. Nature.

See Also

get.test, model.build, model.diagnostics, compareRaster, writeRaster

Examples


###########################################################################
############################# Run this set up code: #######################
###########################################################################

# set seed:
seed=38

# Define training and test files:

qdata.trainfn = system.file("extdata", "helpexamples","DATATRAIN.csv", package = "ModelMap")

# Define folder for all output:
folder=getwd()	


#identifier for individual training and test data points

unique.rowname="ID"


###########################################################################
######################## Define the model: ################################
###########################################################################


########## Continuous Response, Continuous Predictors ############

#file name to store model:
MODELfn="RF_Bio_TC"				

#predictors:
predList=c("TCB","TCG","TCW")	

#define which predictors are categorical:
predFactor=FALSE	

# Response name and type:
response.name="BIO"
response.type="continuous"



###########################################################################
########################### build model: ##################################
###########################################################################


### create model ###

model.obj = model.build( model.type="RF",
                       qdata.trainfn=qdata.trainfn,
                       folder=folder,		
                       unique.rowname=unique.rowname,		
                       MODELfn=MODELfn,
                       predList=predList,
                       predFactor=predFactor,
                       response.name=response.name,
                       response.type=response.type,
                       seed=seed,
                       na.action="na.roughfix"
)



###########################################################################
############ Then Run this code to predict map pixels #####################
###########################################################################


### Create a the filename (including path) for the rast Look up Tables ###


rastLUTfn.2001 <- system.file(	"extdata",
				"helpexamples",
				"LUT_2001.csv",
				package="ModelMap")
                         
                 



### Load rast LUT table, and add path to the predictor raster filenames in column 1 ###

rastLUT.2001 <- read.table(rastLUTfn.2001,header=FALSE,sep=",",stringsAsFactors=FALSE)

for(i in 1:nrow(rastLUT.2001)){
	rastLUT.2001[i,1] <- system.file("extdata",
					"helpexamples",
					rastLUT.2001[i,1],
					package="ModelMap")
}                                 


### Define filename for map  output ###

OUTPUTfn.2001 <- "RF_BIO_TCandNLCD_01.img"
OUTPUTfn.2001 <- paste(folder,OUTPUTfn.2001,sep="/")


### Create image files of predicted map data ###

model.mapmake( model.obj=model.obj,
               folder=folder,		
               rastLUTfn=rastLUT.2001,
           # Mapping arguments						
               OUTPUTfn=OUTPUTfn.2001
               )


###########################################################################
################ run this code to create maps in R ########################
###########################################################################

### Define Color Ramp ###

l <- seq(100,0,length.out=101)
c <- seq(0,100,length.out=101)
col.ramp <- hcl(h = 120, c = c, l = l)


### read in map data ###

mapgrid.2001 <- raster(OUTPUTfn.2001)

#mapgrid.2001 <- setMinMax(mapgrid.2001)


### create map ###

dev.new(width = 5, height = 5)
opar <- par(mar=c(3,3,2,1),oma=c(0,0,3,4),xpd=NA)

zlim <- c(0,max(maxValue(mapgrid.2001)))
legend.label<-rev(pretty(zlim,n=5))
legend.colors<-col.ramp[trunc((legend.label/max(legend.label))*100)+1]

image(  mapgrid.2001, col = col.ramp, zlim=zlim, asp=1, bty="n", 
        xaxt="n", yaxt="n", main="", xlab="", ylab="")
mtext("2001 Imagery",side=3,line=1,cex=1.2)

legend(	x=xmax(mapgrid.2001),y=ymax(mapgrid.2001),
	legend=legend.label,
	fill=legend.colors,
	bty="n",
	cex=1.2
)
mtext("Predictions",side=3,line=1,cex=1.5,outer=TRUE)
par(opar)



Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ModelMap)
Loading required package: randomForest
randomForest 4.6-12
Type rfNews() to see new features/changes/bug fixes.
Loading required package: raster
Loading required package: sp
Loading required package: rgdal
rgdal: version: 1.1-10, (SVN revision 622)
 Geospatial Data Abstraction Library extensions to R successfully loaded
 Loaded GDAL runtime: GDAL 1.11.3, released 2015/09/16
 Path to GDAL shared files: /usr/share/gdal/1.11
 Loaded PROJ.4 runtime: Rel. 4.9.2, 08 September 2015, [PJ_VERSION: 492]
 Path to PROJ.4 shared files: (autodetected)
 Linking to sp version: 1.2-3 
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/ModelMap/model.mapmake.Rd_%03d_medium.png", width=480, height=480)
> ### Name: model.mapmake
> ### Title: Map Making
> ### Aliases: model.mapmake
> ### Keywords: models
> 
> ### ** Examples
> 
> 
> ###########################################################################
> ############################# Run this set up code: #######################
> ###########################################################################
> 
> # set seed:
> seed=38
> 
> # Define training and test files:
> 
> qdata.trainfn = system.file("extdata", "helpexamples","DATATRAIN.csv", package = "ModelMap")
> 
> # Define folder for all output:
> folder=getwd()	
> 
> 
> #identifier for individual training and test data points
> 
> unique.rowname="ID"
> 
> 
> ###########################################################################
> ######################## Define the model: ################################
> ###########################################################################
> 
> 
> ########## Continuous Response, Continuous Predictors ############
> 
> #file name to store model:
> MODELfn="RF_Bio_TC"				
> 
> #predictors:
> predList=c("TCB","TCG","TCW")	
> 
> #define which predictors are categorical:
> predFactor=FALSE	
> 
> # Response name and type:
> response.name="BIO"
> response.type="continuous"
> 
> 
> 
> ###########################################################################
> ########################### build model: ##################################
> ###########################################################################
> 
> 
> ### create model ###
> 
> model.obj = model.build( model.type="RF",
+                        qdata.trainfn=qdata.trainfn,
+                        folder=folder,		
+                        unique.rowname=unique.rowname,		
+                        MODELfn=MODELfn,
+                        predList=predList,
+                        predFactor=predFactor,
+                        response.name=response.name,
+                        response.type=response.type,
+                        seed=seed,
+                        na.action="na.roughfix"
+ )
mtry = 1  OOB error = 1957.564 
Searching left ...
Searching right ...
mtry = 2 	OOB error = 2007.104 
-0.02530731 0.05 
> 
> 
> 
> ###########################################################################
> ############ Then Run this code to predict map pixels #####################
> ###########################################################################
> 
> 
> ### Create a the filename (including path) for the rast Look up Tables ###
> 
> 
> rastLUTfn.2001 <- system.file(	"extdata",
+ 				"helpexamples",
+ 				"LUT_2001.csv",
+ 				package="ModelMap")
>                          
>                  
> 
> 
> 
> ### Load rast LUT table, and add path to the predictor raster filenames in column 1 ###
> 
> rastLUT.2001 <- read.table(rastLUTfn.2001,header=FALSE,sep=",",stringsAsFactors=FALSE)
> 
> for(i in 1:nrow(rastLUT.2001)){
+ 	rastLUT.2001[i,1] <- system.file("extdata",
+ 					"helpexamples",
+ 					rastLUT.2001[i,1],
+ 					package="ModelMap")
+ }                                 
> 
> 
> ### Define filename for map  output ###
> 
> OUTPUTfn.2001 <- "RF_BIO_TCandNLCD_01.img"
> OUTPUTfn.2001 <- paste(folder,OUTPUTfn.2001,sep="/")
> 
> 
> ### Create image files of predicted map data ###
> 
> model.mapmake( model.obj=model.obj,
+                folder=folder,		
+                rastLUTfn=rastLUT.2001,
+            # Mapping arguments						
+                OUTPUTfn=OUTPUTfn.2001
+                )
[1] "model.NA.ACTION: NULL"
[1] "Using default 'na.action' of "na.omit""
[1] "starting production prediction"
[1] "predFactor: "
[1] "all predictor layer rasters match"
[1] "brick done"
[1] "write start /home/ddbj/DataUpdator-rgm3/target/RF_BIO_TCandNLCD_01.img"
[1] "starting row by row predictions"
[1] "     rows = 1"
[1] "     rows = 2"
[1] "     rows = 3"
[1] "     rows = 4"
[1] "     rows = 5"
[1] "     rows = 6"
[1] "     rows = 7"
[1] "     rows = 8"
[1] "     rows = 9"
[1] "     rows = 10"
[1] "     rows = 11"
[1] "     rows = 12"
[1] "     rows = 13"
[1] "     rows = 14"
[1] "     rows = 15"
[1] "     rows = 16"
[1] "     rows = 17"
[1] "     rows = 18"
[1] "     rows = 19"
[1] "     rows = 20"
[1] "     rows = 21"
[1] "     rows = 22"
[1] "     rows = 23"
[1] "     rows = 24"
[1] "     rows = 25"
[1] "     rows = 26"
[1] "     rows = 27"
[1] "     rows = 28"
[1] "     rows = 29"
[1] "     rows = 30"
[1] "     rows = 31"
[1] "     rows = 32"
[1] "     rows = 33"
[1] "     rows = 34"
[1] "     rows = 35"
[1] "     rows = 36"
[1] "     rows = 37"
[1] "     rows = 38"
[1] "     rows = 39"
[1] "     rows = 40"
[1] "     rows = 41"
[1] "     rows = 42"
[1] "     rows = 43"
[1] "     rows = 44"
[1] "     rows = 45"
[1] "     rows = 46"
[1] "     rows = 47"
[1] "     rows = 48"
[1] "     rows = 49"
[1] "     rows = 50"
[1] "     rows = 51"
[1] "     rows = 52"
[1] "     rows = 53"
[1] "     rows = 54"
[1] "     rows = 55"
[1] "     rows = 56"
[1] "     rows = 57"
[1] "     rows = 58"
[1] "     rows = 59"
[1] "     rows = 60"
[1] "ARGfn: /home/ddbj/DataUpdator-rgm3/target/RF_BIO_TCandNLCD_01_mapmake_arguments.txt"
> 
> 
> ###########################################################################
> ################ run this code to create maps in R ########################
> ###########################################################################
> 
> ### Define Color Ramp ###
> 
> l <- seq(100,0,length.out=101)
> c <- seq(0,100,length.out=101)
> col.ramp <- hcl(h = 120, c = c, l = l)
> 
> 
> ### read in map data ###
> 
> mapgrid.2001 <- raster(OUTPUTfn.2001)
> 
> #mapgrid.2001 <- setMinMax(mapgrid.2001)
> 
> 
> ### create map ###
> 
> dev.new(width = 5, height = 5)
Error in dev.new(width = 5, height = 5) : 
  no suitable unused file name for pdf()
Execution halted