Last data update: 2014.03.03

R: Identify and corrects variable formats
format_correctorR Documentation

Identify and corrects variable formats

Description

The function creates a loop to compare for each variable the values it have with the usual ones that typical R formats have in order to correct, for example, missing value or dates stored as a character. It also specify for each variable the most appropriate SPSS format that it should have.

Usage

format_corrector(table,identif=NULL,force=FALSE,rate.miss.date=0.5)

Arguments

table

The data set we want to correct.

identif

The name of the identification variable included in the data frame. It will be used to list the individuals who had any problems during the execution of the function.

force

If TRUE, run format_corrector even if "fixed.formats" attribute is TRUE

rate.miss.date

The maximum rate of missing date fields we want the function to accept.The function details which fields have been lost anyways.

Details

If the date variable don't have chron format it must be in one of the following formats, else the function leaves it as a character:
—-dates separator must be one of the following:("-","/",".").
—-hour separator must be ":".

Value

A single data frame which results from the function.

Note

This function may not be completely optimal so it might have problems when correcting huge data frames.

See Also

spss_export

Examples

require(ImportExport)
a<-c(1,NA,3,5,".")
b<-c("19/11/2006","05/10/2011","09/02/1906","22/01/1956","10/10/2010")
c<-101:105
x<-data.frame(a,b,c)
sapply(x,class)
x_corr<-format_corrector(x)
sapply(x_corr,class)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ImportExport)
Loading required package: xlsx
Loading required package: rJava
Loading required package: xlsxjars
Loading required package: gdata
gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.

gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.

Attaching package: 'gdata'

The following object is masked from 'package:stats':

    nobs

The following object is masked from 'package:utils':

    object.size

The following object is masked from 'package:base':

    startsWith

Loading required package: Hmisc
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Loading required package: ggplot2

Attaching package: 'Hmisc'

The following object is masked from 'package:gdata':

    combine

The following objects are masked from 'package:base':

    format.pval, round.POSIXt, trunc.POSIXt, units

Loading required package: chron
Loading required package: RODBC
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/ImportExport/format_corrector.Rd_%03d_medium.png", width=480, height=480)
> ### Name: format_corrector
> ### Title: Identify and corrects variable formats
> ### Aliases: format_corrector
> ### Keywords: format_corrector
> 
> ### ** Examples
> 
> require(ImportExport)
> a<-c(1,NA,3,5,".")
> b<-c("19/11/2006","05/10/2011","09/02/1906","22/01/1956","10/10/2010")
> c<-101:105
> x<-data.frame(a,b,c)
> sapply(x,class)
        a         b         c 
 "factor"  "factor" "integer" 
> x_corr<-format_corrector(x)


-----Fixing variable ' a '---------

   The following SPSS format has been assigned: F2.0      


-----Fixing variable ' b '---------

   The following SPSS format has been assigned: DATE11      


-----Fixing variable ' c '---------

   The following SPSS format has been assigned: F4.0      
> sapply(x_corr,class)
$a
[1] "numeric"

$b
[1] "dates" "times"

$c
[1] "numeric"

> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>