Dataframe of picked LC-MS peaks with three numeric columns for (a) m/z, (b) intensity and (c) retention time, such as peaklist.
isotopes
Dataframe isotopes
elements
FALSE or chemical elements in the changing units of the homologue series, e.g. c("C","H") for alkane chains. Used to restrict search.
use_C
For elements: take element ratio to C-atoms into account? Used to restrict search.
minmz
Defines the lower limit of the m/z window to search homologue series peaks, relative to the m/z of the one peak to search from. Absolute m/z value [u].
maxmz
Defines the upper limit of the m/z window to search homologue series peaks, relative to the m/z of the one peak to search from. Absolute m/z value [u].
minrt
Defines the lower limit of the retention time (RT) window to look for other homologue peaks, relative to the RT of the one peak to search from, i.e., RT-minrt.
maxrt
Defines the upper limit of the RT window to look for other homologue peaks, relative to the RT of the one peak to search from, i.e., RT+maxrt.
ppm
Should mztol be set in ppm (TRUE) or in absolute m/z [u] (FALSE)?
mztol
m/z tolerance setting: +/- value by which the m/z of a peak may vary from its expected value. If parameter ppm=TRUE (see below) given in ppm, otherwise, if ppm=FALSE, in absolute m/z [u].
rttol
Retention time (RT) tolerance by which the RT between two adjacent pairs of a homologue series is allowed to differ. Units as given in column 3 of peaklist argument, e.g. [min].
minlength
Minimum number of peaks in a homologue series.
mzfilter
Vector of numerics to filter for homologue series with specific m/z differences of their repeating units, given the tolerances in mztol. Mind charge z!
vec_size
Vector size. Ignore unless a relevant error message is printed (then try to increase size).
mat_size
Matrix size for recombining, multiple of input tuples. Ignore unless a relevant error message is printed (then try to increase size).
R2
FALSE or 0<numeric<=1. Coefficient of determination for cubic smoothing spline fits of m/z versus retention time; homologue series with lower R2 are rejected. See smooth.spline.
spar
Smoothing parameter, typically (but not necessarily) in (0,1]. See smooth.spline.
plotit
Logical FALSE or 0<integer<5. Intermediate plots of nearest neigbour paths, spline fits of individual homologues series >=minlength, clustered HS pairs, etc .
deb
Debug returns, ignore.
Details
A dynamic programming approach is used to extract series of peaks that differ in constant m/z units and smooth changes in their retention time within bounds of mass defect changes.
First, a nearest neighbour path through a kd-tree representation of the data is used to extract all feasible peak triplets.
These triplets are then combined to all plausible n-tupels in n-3 steps. At each such step, each newly formed n-tupel is checked for smooth changes of RT with increasing m/z of
the homologues, using cubic splines and a R2-based threshold of the model fit.
Value
List of type homol with 6 entries
homol[[1]]
Homologue Series. Dataframe with peaks (mass,intensity,rt,peak ID) and their homologue series relations (to ID,m/z increment,RT increment) within
different homologue series (HS IDs,series level). Last column HS cluster states HS clusters into which a peak was assigned via its HS.
homol[[2]]
Parameters. Parameters used.
homol[[3]]
Peaks in homologue series. Dataframe listing all peaks (peak IDs) per homologue series (HS IDs), the underlying mean m/z & RT increments
(m/z increments, RT increments) and the minimum and maximum RT changes between individual peaks of the series.
homol[[4]]
m/z restrictions used. See function argument mzfilter.
homol[[5]]
Peaks per level. List of peak IDs per level in the individual series.
homol[[6]]
Ignore. List with superjacent HS IDs per group - for setdeb=c(3,...)
Warning
The rttol argument of homol.search must not be mixed with that of pattern.search or pattern.search2.
Note
Arguments isotopes and elements are needed to limit intermediate numbers of m/z differences to screen over, based on feasible changes in mass defect.
Similarly, intermediate numbers are also limited by the retention time and m/z windows defined by minmz/maxmz and minrt/maxrt/rttol, respectively.
The latter are always set relative to the individual RT and m/z values of the peaks to be searched from.
Overall, these parameters must be chosen carefully to avoid a combinatorial explosion of triplet m/z differences, leading to slow computation, memory problems or senseless results.
Values for spar and R2 have to be adjusted for different chromatographic settings; the smoothing spline fits are used to eliminate homologue series candidates with erratic RT-behaviour.
Spline fits at >=minlength can be viewed by plotit=2.
Peak IDs refer to the order in which peaks are provided. Different IDs exist for adduct groups, isotope pattern groups, grouped homologue series (HS) peaks
and homologue series cluster. Yet other IDs exist for the individual components (see note section of combine).
Here, IDs of homologue series group are given both in the function output homol[[1]], homol[[3]] and homol[[6]], with one homologue series stating one group of interrelated peaks.