A character matrix or data.frame with the
same number of rows as data. The best partial
match is sought in Names. The algorithm
stops when a unique match is found; any remaining
columns of x are then ignored. Any
nicknames are ignored for the first column
but not for subsequent columns.
A character vector whose length matches the number
of rows of data. This will be replaced by
parseName(x).
data
a character matrix or a data.frame. If
surname and givenName are character vectors of
names, their length musth match the number of rows of
data.
Names
One of the following in which matches for x will
be sought:
A character vector or matrix or a data.frame
for which NROW(Names) == nrow(data).
Something to select columns of data to
produce a character vector or matrix or
data.frame via data[, Names]. In this
case, accents will be stripped using
subNonStandardNames.
nicknames
a character matrix with two columns, each row giving a pair
of names like "Pete" and "Peter" that should be regarded as
equivalent if no exact match(es) is(are) found.
...
optional arguments passed to subNonStandardNames
x1
a character vector of names to match name.
NOTE: matchName calls subNonStandardNames,
but matchName1 does not. Thus, x1 is
assumed to NOT to contain characters not in standard
English.
name
A character vector or matrix for which NROW(name)
== nrow(data).
NOTE: matchName calls subNonStandardNames,
but matchName1 does not. Thus, name is
assumed to NOT to contain characters not in standard
English.
namesNotFound
character vector passed to subNonStandardNames and used
to compute any "namesNotFound" attribute of the object returned
by parseName.
1.3. For any component i of x1 with multiple rows,
let x1i <- matchName1(x[i, 2], x1[[i]], Name[-1],
nicknames=nicknames, ...). If nrow(x1i)>0,
x1[[i]] <- x1i; else leave unchanged.
2.6. let jd = the subset of names that match xj or
subNonStandardNames(xj) or nicknames of xj; xlist[j] <- jd.
2.7. return xlist
Value
matchName returns a list of the same length as x,
each of whose components is object obtained as a subset of rows
of data or NULL if no acceptable matches are found.
The list may have an attribute "namesNotFound" as determined per
the argument of that name.
matchNames1 returns a list of vectors of integers for
subsets of data matching x1.
Author(s)
Spencer Graves
See Also
parseNamesubNonStandardNames
Examples
##
## 1. Names to match exercising many possibile combinations
## of surname with 0, 1, >1 matches possibly after
## replacing with subNonStandardNames
## combined with possibly multiple givenName combinations
## with 0, 1, >1 matches possibly requiring replacing with
## subNonStandardNames or nicknames
##
# NOTE: "-" could also be "e" with an accent;
# not included with this documentation, because
# non-English characters generate warnings in standard tests.
Names2mtch <- c("Andr_ Bruce C_rdenas", "Dolores Ella Feinstein",
"George Homer", "Inez Jane Kappa", "Luke Michael Noel",
"Oscar Papa", "Quincy Ra_l Stevens",
"Thomas U. Vel_zquez", "William X. Young",
"Zebra")
##
## 2. Data = matrix(..., byrow=TRUE) to exercise the combinations
## the combinations from 1
##
Data1 <- matrix(c("Feld", "Don", "789",
"C_rdenas", "Don", "456",
"C_rdenas", "Andre B.", "123",
"Smith", "George", "aaa",
"Young", "Bill", "369"),
ncol=3, byrow=TRUE)
Data1. <- subNonStandardNames(Data1)
##
## 3. matchName1
##
parceNm1 <- parseName(Names2mtch)
match1.1 <- matchName1(parceNm1[, 'surname'], Data1.)
# check
match1.1s <- vector('list', 10)
match1.1s[[1]] <- 2:3
match1.1s[[9]] <- 5
names(match1.1s) <- parceNm1[, 'surname']
all.equal(match1.1, match1.1s)
##
## 4. matchName1 with name = multiple columns
##
match1.2 <- matchName1(c('Cardenas', 'Don'), Data1.,
name=Data1.[, 1:2])
# check
match1.2a <- list(Cardenas=2:3, Don=1:2)
all.equal(match1.2, match1.2a)
##
## 5. matchName
##
nickNames <- matrix(c("William", "Bill"), 1, byrow=TRUE)
match1 <- matchName(Names2mtch, Data1, nicknames=nickNames)
# check
match1a <- list("Cardenas, Andre Bruce"=Data1[3,, drop=FALSE ],
"Feinstein, Dolores Ella"=NULL,
"Homer, George"=NULL, "Kappa, Inez Jane"=NULL,
"Noel, Luke Michael"=NULL, "Papa, Oscar"=NULL,
"Stevens, Quincy Raul"=NULL,
"Velazquez, Thomas U."=NULL,
"Young, William X."=Data1[5,, drop=FALSE],
"Zebra"=NULL)
all.equal(match1, match1a)
##
## 6. namesNotFound
##
tstNotFound <- matchName('xx_x', Data1)
# check
tstNF <- list('xx_x'=NULL)
attr(tstNF, 'namesNotFound') <- 'xx_x'
all.equal(tstNotFound, tstNF)
##
## 7. matchName(NULL) to simplify use
##
mtchNULL <- matchName(NULL, Data1)
all.equal(mtchNULL, NULL)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(Ecfun)
Attaching package: 'Ecfun'
The following object is masked from 'package:base':
sign
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/Ecfun/matchName.Rd_%03d_medium.png", width=480, height=480)
> ### Name: matchName
> ### Title: Match surname and givenName in a table
> ### Aliases: matchName matchName1
> ### Keywords: manip
>
> ### ** Examples
>
> ##
> ## 1. Names to match exercising many possibile combinations
> ## of surname with 0, 1, >1 matches possibly after
> ## replacing with subNonStandardNames
> ## combined with possibly multiple givenName combinations
> ## with 0, 1, >1 matches possibly requiring replacing with
> ## subNonStandardNames or nicknames
> ##
> # NOTE: "-" could also be "e" with an accent;
> # not included with this documentation, because
> # non-English characters generate warnings in standard tests.
> Names2mtch <- c("Andr_ Bruce C_rdenas", "Dolores Ella Feinstein",
+ "George Homer", "Inez Jane Kappa", "Luke Michael Noel",
+ "Oscar Papa", "Quincy Ra_l Stevens",
+ "Thomas U. Vel_zquez", "William X. Young",
+ "Zebra")
> ##
> ## 2. Data = matrix(..., byrow=TRUE) to exercise the combinations
> ## the combinations from 1
> ##
> Data1 <- matrix(c("Feld", "Don", "789",
+ "C_rdenas", "Don", "456",
+ "C_rdenas", "Andre B.", "123",
+ "Smith", "George", "aaa",
+ "Young", "Bill", "369"),
+ ncol=3, byrow=TRUE)
> Data1. <- subNonStandardNames(Data1)
> ##
> ## 3. matchName1
> ##
> parceNm1 <- parseName(Names2mtch)
> match1.1 <- matchName1(parceNm1[, 'surname'], Data1.)
>
> # check
> match1.1s <- vector('list', 10)
> match1.1s[[1]] <- 2:3
> match1.1s[[9]] <- 5
> names(match1.1s) <- parceNm1[, 'surname']
> ## Don't show:
> stopifnot(
+ ## End(Don't show)
+ all.equal(match1.1, match1.1s)
+ ## Don't show:
+ )
> ## End(Don't show)
>
> ##
> ## 4. matchName1 with name = multiple columns
> ##
> match1.2 <- matchName1(c('Cardenas', 'Don'), Data1.,
+ name=Data1.[, 1:2])
>
> # check
> match1.2a <- list(Cardenas=2:3, Don=1:2)
> ## Don't show:
> stopifnot(
+ ## End(Don't show)
+ all.equal(match1.2, match1.2a)
+ ## Don't show:
+ )
> ## End(Don't show)
>
> ##
> ## 5. matchName
> ##
> nickNames <- matrix(c("William", "Bill"), 1, byrow=TRUE)
>
> match1 <- matchName(Names2mtch, Data1, nicknames=nickNames)
>
> # check
> match1a <- list("Cardenas, Andre Bruce"=Data1[3,, drop=FALSE ],
+ "Feinstein, Dolores Ella"=NULL,
+ "Homer, George"=NULL, "Kappa, Inez Jane"=NULL,
+ "Noel, Luke Michael"=NULL, "Papa, Oscar"=NULL,
+ "Stevens, Quincy Raul"=NULL,
+ "Velazquez, Thomas U."=NULL,
+ "Young, William X."=Data1[5,, drop=FALSE],
+ "Zebra"=NULL)
> ## Don't show:
> stopifnot(
+ ## End(Don't show)
+ all.equal(match1, match1a)
+ ## Don't show:
+ )
> ## End(Don't show)
> ##
> ## 6. namesNotFound
> ##
> tstNotFound <- matchName('xx_x', Data1)
>
> # check
> tstNF <- list('xx_x'=NULL)
> attr(tstNF, 'namesNotFound') <- 'xx_x'
> ## Don't show:
> stopifnot(
+ ## End(Don't show)
+ all.equal(tstNotFound, tstNF)
+ ## Don't show:
+ )
> ## End(Don't show)
>
> ##
> ## 7. matchName(NULL) to simplify use
> ##
> mtchNULL <- matchName(NULL, Data1)
> ## Don't show:
> stopifnot(
+ ## End(Don't show)
+ all.equal(mtchNULL, NULL)
+ ## Don't show:
+ )
> ## End(Don't show)
>
>
>
>
>
> dev.off()
null device
1
>