Software for the analysis of data acquired in observer performance studies conducted using ROC, FROC or ROI multiple reader multiple case (MRMC) data collection paradigms. It is an R implementation of current JAFROC analysis (http://www.devchakraborty.com) with enhancements including allowing choice between DBM or OR significance testing methods, with Hillis improvements, and choices of several figures of merit, plotting empirical operating characteristics, sample size estimation tools and generating formatted outputs.
treatment/modality: used interchangeably, for example, CT images vs. MRI images of the same patients
reader/observer: used interchangeably, also radiologist
image/case: used interchangeably; a case can consist of several images of the same patient in the same modality
MRMC: multiple reader multiple case (each reader interprets each case in each modality, i.e. fully crossed study design)
DBM: Dorfman-Berbaum-Metz (a significance testing method for detecting a treatment effect in MRMC studies)
DBMH: Hillis modification of DBM
OR: Obuchowski-Rockette (a significance testing method for detecting a treatment effect in MRMC studies)
ORH: Hillis modification of OR
ROC: receiver operating characteristic (a data collection paradigm where each image yields a single rating)
mark: the location of a suspected diseased region
rating: the level of confidence that the image or the location is diseased, higher numbers indicate increasing confidence in presence of disease
FROC: free-response ROC (a data collection paradigm where each image yields a random number, 0, 1, 2,..., of mark-rating pairs)
AFROC: alternative FROC
JAFROC: jackknife AFROC: an integrated software suite for analyzing observer performance data
ROI: region-of-interest (each case is divided into a fixed number of region and the reader rates each region)
FOM: figure of merit or quantitative measure of performance
FP: false positive
TP: true positive
FPF: number of FPs divided by number of non-diseased cases
TPF: number of TPs divided by number of diseased cases
SP: specificity, same as 1-FPF
SE: sensitivity, same as TPF
ROC operating characteristic: plot of TPF (ordinate) vs. FPF
AUC: trapezoidal area under the ROC curve as estimated by the Wilcoxon statistic
NL: non-lesion localization, of which FP is a special case, i.e., a mark that does not correctly locate any existing localized lesion(s)
LL: lesion localization, of which TP is a special case, i.e., a mark that correctly locates an existing localized lesion
LLF: number of LLs divided by the total number of lesions
NLF: number of NLs divided by the total number of cases
FROC curve: plot of LLF (ordinate) vs. NLF
AFROC curve: plot of LLF (ordinate) vs. FPF, where FPF is inferred using highest rating of NL marks on non-diseased cases only
AFROC1 curve: plot of LLF (ordinate) vs. FPF1, where FPF1 is inferred using highest rating of NL marks on all cases
JAFROC FOM: trapezoidal area under AFROC curve
JAFROC1 FOM: trapezoidal area under AFROC1 curve
alpha/α: The significance level of the test of the null hypothesis of no treatment effect
p-value: the probability, under the null hypothesis, that the observed treatment effects, or larger, could occur by chance
NH: The null hypothesis that all treatments effects are zero; rejected if the p-value is smaller than α
RRRC: Analysis that treats both readers and cases as random factors
RRFC: Analysis that treats readers as random and cases as fixed factors
FRRC: Analysis that treats readers as fixed and cases as random factors
ddf: Denominator degrees of freedom of appropriate F-test, the numerator df (ndf) is always number of treatments minus one
CI: The 1-α confidence interval for the stated statistic
I: total number of modalities, indexed by i; I must be at least 2 to perform null hypothesis testing
J: total number of readers, indexed by j
K1: total number of non-diseased cases, indexed by k1
K2: total number of diseased cases, indexed by k2
K: total number of cases, K = K1 + K2, indexed by k
maxNL: maximum number of NL marks per case in dataset
maxLL: maximum number of lesions per case in dataset
Dataset
Dataset, an R object, can be created by the user or read from an external data file. Note: the word "dataset" used in this package always
represents an R object with following structure.
Data structure
The dataset is an R list containing 9 elements: Note: -Inf is assigned to any missing/unavailabe element, e.g., an unmarked true lesion.
NL: a floating-point array with a dimension of c(I, J, K, maxNL) that contains the ratings of NL marks for specified modality, reader and case. For ROC datasets FP ratings are assigned to NL with maxNL = 1, i.e., the last index is set to 1.
LL: a floating-point array with a dimension of c(I, J, K2, maxLL)that contains the ratings of all LL marks for specified modality, reader and case. For ROC datasets TP ratings are assigned to LL with maxLL = 1.
lesionNum: a integer vector with a length of K2, whose elements indicate the number of lesions in each diseased case.
lesionID: a integer array with a dimnsion of c(K2, maxLL). Note that ratings of lesions in LL must appear in the same sequence as lesionID for that case. For example, if the lesionID field for the first diseased case is c(4, 2, 3, 1), i.e., there are 4 lesion on this case labeled 4, 2, 3 and 1, the ratings in LL for this case must appear in the same sequence, with the first rating corresponding to the lesion labeled 4, the second corresponding to the lesion labeled 2, etc.
lesionWeight: a floating point array with a dimension of c(K2, maxLL), representing the relative importance of detecting each lesion. For each case, the weights must sum to unity. If zero is assigned to all elements of this array, then the software assigns equal weighting, e.g., c(0.5, 0.5) to an image with two lesions.
maxNL: the maximum number of NL marks per case over the entire dataset.
dataType: a string variable: "ROC", "ROI" or "FROC".
modalityID: a string vector of length I, which labels the modalities in the dataset.
readerID: a string vector of length J, which contains the ID of each reader. Note that the order of elements in modalityID and readerID must match that in NL and LL. For example, NL[1, 2, , ] indicates the ratings of the reader with the second ID in readerID using the modality with the first ID in modalityID.
Data file format
The package reads JAFROC, MRMC (ROC data only) and iMRMC (ROC data only) data files. The data can be imported by using the function ReadDataFile.
JAFROC data file format
The JAFROC data file is an Excel file containing three worksheets (*.xls and *.xlsx are supported): (1) the Truth worksheet, (2) the TP or lesion localization worksheet and (3) the FP or non-lesion localization worksheet. Except for the Truth worksheet, where each case must occur at least once, the number of rows in the other worksheets is variable.
Truth worksheet consists of
CaseID, an integer field uniquely labeling the cases (images). It must occur at least once for each case, and since a case may have multiple lesions, it can occur multiple times, once for each lesion.
LesionID, an integer field uniquely labeling the lesions in each case. This field is zero for non-diseased cases.
Weight, a floating-point field, which is the relative importance of detecting each lesion. This field is zero for non-diseased cases and for equally weighted lesions; otherwise the weights must sum to unity for each case. Unless a weighted figure of merit is selected, this field is irrelevant.
TP worksheet consists of
ReaderID, a string field uniquely labeling the readers (radiologists).
ModalityID, a string field uniquely labeling the modalities.
CaseID, see Truth worksheet. A non-diseased case in this field will generate an error.
LesionID, see Truth worksheet. An entry in this field that does not appear in the Truth worksheet will generate an error. It is the user's responsibility to ensure that the entries in the Truth and TP worksheets correspond to the same physical lesions.
TP_Rating, a positive floating-point field denoting the rating assigned to a particular lesion-localization mark, with higher numbers represent greater confidence that the location is actually a lesion.
FP worksheet consists of
ReaderID, see TP worksheet.
ModalityID, see TP worksheet.
CaseID, see TP worksheet.
FP_Rating, a positive floating-point field denoting the rating assigned to a particular non-lesion-localization mark, with higher numbers represent greater confidence that the location is actually a lesion.
LABMRMC data format. The data file includes following parts. The file must be saved as plain text file with *.lrc extension. All items in the file are separated by one or more blank spaces.
The first line is a free text description of the file.
The second line is the name or ID of the first reader.
The third line has the names or IDs of all the modalities. Each name or ID must be enclosed by double quotes(" ").
The fourth line must have the letter (l or s) or word (large or small) for each modality. The letter or word indicates that smaller or larger ratings represent stronger confidence of presence of disease.
The following lines contain the ratings in all modalities, separated by spaces or tabs, of the non-diseased cases, one case per line. The cases must appear in the same order for all readers. Missing value is not allowed.
After the last non-diseased case insert a line containing the asterisk (*) symbol.
Repeat steps 5 and 6 for the diseased cases.
Repeat steps 2, 5, 6 and 7 for the remaining readers.
The last line of the data file must be a pound symbol (#).
DBMHAnalysis: Performs Dorfman-Berbaum-Metz analysis with Hillis improvements for the specified dataset.
EmpiricalOpCharac: Plot empirical curves for specified modalities and readers in the dataset.
FigureOfMerit: Calculate the figure of merit for each reader using each modality.
FROC2HrROC: Convert an FROC dataset to a highest rating inferred ROC dataset.
ORHAnalysis: Performs Obuchowski-Rockette analysis with Hillis improvements for the specified dataset.
OutputReport: Save the results of the analysis to a text file.
PowerGivenJK: Calculate the statistical power with the given number of readers, number of cases and DBM or OR variances components.
PowerTable: Calculate required sample size for the specified dataset with given significance level, effect size and desired power.
ReadDataFile: Read the dataset that will be analysis from data file.
SampleSizeGivenJ: Calculate required number of cases with the given number of readers and DBM variances components.
SaveDataFile: Save data file in specified format.
Author(s)
Xuetong Zhai, Dev Chakraborty
Maintainer: Xuetong Zhai <xuetong.zhai@gmail.com>
References
Basics of ROC
Metz, C. E. (1978). Basic principles of ROC analysis. In Seminars in nuclear medicine (Vol. 8, pp. 283â298). Elsevier.
Metz, C. E. (1986). ROC Methodology in Radiologic Imaging. Investigative Radiology, 21(9), 720.
Metz, C. E. (1989). Some practical issues of experimental design and data analysis in radiological ROC studies. Investigative Radiology, 24(3), 234.
Metz, C. E. (2008). ROC analysis in medical imaging: a tutorial review of the literature. Radiological Physics and Technology, 1(1), 2â12.
Wagner, R. F., Beiden, S. V, Campbell, G., Metz, C. E., & Sacks, W. M. (2002). Assessment of medical imaging and computer-assist systems: lessons from recent experience. Academic Radiology, 9(11), 1264â77.
Wagner, R. F., Metz, C. E., & Campbell, G. (2007). Assessment of medical imaging systems and computer aids: a tutorial review. Academic Radiology, 14(6), 723â48.
DBM/OR methods and extensions
DORFMAN, D. D., BERBAUM, K. S., & Metz, C. E. (1992). Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Investigative Radiology, 27(9), 723.
Obuchowski, N. A., & Rockette, H. E. (1994). HYPOTHESIS TESTING OF DIAGNOSTIC ACCURACY FOR MULTIPLE READERS AND MULTIPLE TESTS: AN ANOVA APPROACH WITH DEPENDENT OBSERVATIONS. Communications in Statistics-Simulation and Computation, 24(2), 285â308.
Hillis, S. L., Berbaum, K. S., & Metz, C. E. (2008). Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis. Academic Radiology, 15(5), 647â61.
Hillis, S. L., Obuchowski, N. A., & Berbaum, K. S. (2011). Power Estimation for Multireader ROC Methods: An Updated and Unified Approach. Acad Radiol, 18, 129â142.
Hillis, S. L. S. L. (2007). A comparison of denominator degrees of freedom methods for multiple observer ROC analysis. Statistics in Medicine, 26(3), 596â619.
FROC paradigm
Chakraborty, D. P., & Berbaum, K. S. (2004). Observer studies involving detection and localization: modeling, analysis, and validation. Medical Physics, 31(8), 1â18.
Chakraborty, D. P. (2006). A search model and figure of merit for observer data acquired according to the free-response paradigm. Physics in Medicine and Biology, 51(14), 3449â62.
Chakraborty, D. P. (2006). ROC curves predicted by a model of visual search. Physics in Medicine and Biology, 51(14), 3463â82.
Chakraborty, D. P. (2011). New Developments in Observer Performance Methodology in Medical Imaging. Seminars in Nuclear Medicine, 41(6), 401â418.
Chakraborty, D. P. (2013). A Brief History of Free-Response Receiver Operating Characteristic Paradigm Data Analysis. Academic Radiology, 20(7), 915â919.
Chakraborty, D. P., & Yoon, H.-J. (2008). Operating characteristics predicted by models for diagnostic tasks involving lesion localization. Medical Physics, 35(2), 435.
ROI paradigm
Obuchowski, N. A., Lieber, M. L., & Powell, K. A. (2000). Data analysis for detection and localization of multiple abnormalities with application to mammography. Academic Radiology, 7(7), 553â4; discussion 554â6.