Last data update: 2014.03.03

R: Doctoral Publications
PhDPublicationsR Documentation

Doctoral Publications

Description

Cross-section data on the scientific productivity of PhD students in biochemistry.

Usage

data("PhDPublications")

Format

A data frame containing 915 observations on 6 variables.

articles

Number of articles published during last 3 years of PhD.

gender

factor indicating gender.

married

factor. Is the PhD student married?

kids

Number of children less than 6 years old.

prestige

Prestige of the graduate program.

mentor

Number of articles published by student's mentor.

Source

Online complements to Long (1997).

http://www.indiana.edu/~jslsoc/research_rm4cldvs.htm

References

Long, J.S. (1990). Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks: Sage Publications.

Long, J.S. (1997). The Origin of Sex Differences in Science. Social Forces, 68, 1297–1315.

Examples

## from Long (1997)
data("PhDPublications")

## Table 8.1, p. 227
summary(PhDPublications)

## Figure 8.2, p. 220
plot(0:10, dpois(0:10, mean(PhDPublications$articles)), type = "b", col = 2,
  xlab = "Number of articles", ylab = "Probability")
lines(0:10, prop.table(table(PhDPublications$articles))[1:11], type = "b")
legend("topright", c("observed", "predicted"), col = 1:2, lty = rep(1, 2), bty = "n")

## Table 8.2, p. 228
fm_lrm <- lm(log(articles + 0.5) ~ ., data = PhDPublications)
summary(fm_lrm)
-2 * logLik(fm_lrm)
fm_prm <- glm(articles ~ ., data = PhDPublications, family = poisson)
library("MASS")
fm_nbrm <- glm.nb(articles ~ ., data = PhDPublications)

## Table 8.3, p. 246
library("pscl")
fm_zip <- zeroinfl(articles ~ . | ., data = PhDPublications)
fm_zinb <- zeroinfl(articles ~ . | ., data = PhDPublications, dist = "negbin")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(AER)
Loading required package: car
Loading required package: lmtest
Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

Loading required package: sandwich
Loading required package: survival
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/AER/PhDPublications.Rd_%03d_medium.png", width=480, height=480)
> ### Name: PhDPublications
> ### Title: Doctoral Publications
> ### Aliases: PhDPublications
> ### Keywords: datasets
> 
> ### ** Examples
> 
> ## from Long (1997)
> data("PhDPublications")
> 
> ## Table 8.1, p. 227
> summary(PhDPublications)
    articles         gender    married        kids           prestige    
 Min.   : 0.000   male  :494   no :309   Min.   :0.0000   Min.   :0.755  
 1st Qu.: 0.000   female:421   yes:606   1st Qu.:0.0000   1st Qu.:2.260  
 Median : 1.000                          Median :0.0000   Median :3.150  
 Mean   : 1.693                          Mean   :0.4951   Mean   :3.103  
 3rd Qu.: 2.000                          3rd Qu.:1.0000   3rd Qu.:3.920  
 Max.   :19.000                          Max.   :3.0000   Max.   :4.620  
     mentor      
 Min.   : 0.000  
 1st Qu.: 3.000  
 Median : 6.000  
 Mean   : 8.767  
 3rd Qu.:12.000  
 Max.   :77.000  
> 
> ## Figure 8.2, p. 220
> plot(0:10, dpois(0:10, mean(PhDPublications$articles)), type = "b", col = 2,
+   xlab = "Number of articles", ylab = "Probability")
> lines(0:10, prop.table(table(PhDPublications$articles))[1:11], type = "b")
> legend("topright", c("observed", "predicted"), col = 1:2, lty = rep(1, 2), bty = "n")
> 
> ## Table 8.2, p. 228
> fm_lrm <- lm(log(articles + 0.5) ~ ., data = PhDPublications)
> summary(fm_lrm)

Call:
lm(formula = log(articles + 0.5) ~ ., data = PhDPublications)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.87006 -0.87012  0.07973  0.63630  2.17374 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.177774   0.107725   1.650   0.0992 .  
genderfemale -0.134567   0.057298  -2.349   0.0191 *  
marriedyes    0.132826   0.065027   2.043   0.0414 *  
kids         -0.133148   0.040655  -3.275   0.0011 ** 
prestige      0.025502   0.028469   0.896   0.3706    
mentor        0.025421   0.002954   8.607   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8146 on 909 degrees of freedom
Multiple R-squared:  0.1008,	Adjusted R-squared:  0.09582 
F-statistic: 20.37 on 5 and 909 DF,  p-value: < 2.2e-16

> -2 * logLik(fm_lrm)
'log Lik.' 2215.323 (df=7)
> fm_prm <- glm(articles ~ ., data = PhDPublications, family = poisson)
> library("MASS")
> fm_nbrm <- glm.nb(articles ~ ., data = PhDPublications)
> 
> ## Table 8.3, p. 246
> library("pscl")
Loading required package: lattice
Classes and Methods for R developed in the

Political Science Computational Laboratory

Department of Political Science

Stanford University

Simon Jackman

hurdle and zeroinfl functions by Achim Zeileis

> fm_zip <- zeroinfl(articles ~ . | ., data = PhDPublications)
> fm_zinb <- zeroinfl(articles ~ . | ., data = PhDPublications, dist = "negbin")
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>