Asymptotically Unbiased: Survival Analysis

Building off the Weibull survival simulation, and a quick snippet of code to create a watermark background, we can create a quick report with SAS ODS. If you want to play along at home, just add the following code to the survival simulation: it will seed the random number generator so you should get the exact same data:

set.seed(1)

Now, let's start up a report with SAS 9.2 using ODS: we can load in the data using IMPORT, and start playing around with LIFETEST, LIFEREG, and PHREG. This won't be extensive or exhaustive- just an excuse to show off a new trick. If you're new to survival analysis in SAS, or survival analysis in general, there's plenty of material over at UCLA's Institute for Digital Research and Education- possibly one of the best stat resources out there.

/* This will keep the watermark properly aligned*/
options papersize = (8.5in 11in);

proc import
    datafile = "C:\Data\Didactic\Data\Survival sim 2013-03-20.csv"
    out = surv dbms = csv replace;
run;

/* create our template using the results */
proc template;
    define style watermark;
    parent = styles.printer;
    style header from header / background=_undef_;
    style body from document / background=_undef_
        backgroundimage = "C:\Data\Didactic\Results\2013-03-20\wm - CONFIDENTIAL 03-19-2013 300 dpi.png";
    end;
run;

/* begin the PDF report */
ods pdf file = "C:\Data\Didactic\Results\2013-03-20\Survival Sim - 2013-03-05.pdf"
    style = watermark;
ods noproctitle;
ods escapechar="^";
ods pdf text = "^S={just = center font_weight = bold font_style = italic
    font_size = 18pt}Survival Simulated Data";

/* plot survival, log survival, and log-log survival */
ods proclabel = "Survival - X1";
title "Survival Function - Stratified by X1";
proc lifetest data = surv plots = (s, ls, lls) graphics notable;
    time time_to_event*event(0);
    strata x1;
run;

ods proclabel = "Survival - X2";
title "Survival Function - Stratified by X2";
proc lifetest data = surv plots = (s, ls, lls) graphics notable;
    time time_to_event*event(0);
    strata x2;
run;

ods proclabel = "Survival - X3";
title "Survival Function - Stratified by X3";
proc lifetest data = surv plots = (s, ls, lls) graphics notable;
    time time_to_event*event(0);
    strata x3;
run;

ods proclabel = "Weibull GLM";
title "Parametric Regression - Weibull";
proc lifereg data = surv;
    class x1 x2 x3;
    model time_to_event*event(0) = x1 x2 x3 x4 x5 x6
        x7 x8 x9 x10 x11 / dist = weibull;
run;

ods proclabel = "Cox PH Model";
title "Nonparametric Regression - Cox Proportional Hazards";
proc phreg data = surv;
    class x1 x2 x3;
    model time_to_event*event(0) = x1 x2 x3
        x4 x5 x6 x7 x8 x9 x10 x11;
run;
ods pdf close;

The results aren't too shabby, and should give you something like this:

Another day, we can delve deeper into the dataset in SAS, or maybe boot up the Survival package in R.

This was a quick Saturday afternoon project- I wanted to write the guts of a program to simulate survival data. While there's plenty of survival datasets around to play with, I wanted to make something that could eventually be used to explore models which I don't get a chance to use that often, such as competing risks, discrete-time survival, and joint longitudinal-survival models. This will get us started in the right direction, and utilize some code I already had kicking around. First, let's get started by clearing the workspace, and setting up a directory for the resulting dataset.

rm(list = ls(all = T))
working.dir <- "c:/Data/Didactic/Data"

if (file.exists(working.dir)) {
    setwd(file.path(working.dir))
} else {
    dir.create(working.dir, recursive = T, showWarnings = F)
    setwd(working.dir)
}

Now let's get to some actual fun stuff. First, a quick function that gives us a positive definite correlation matrix, so we can simulate some covariates from the multivariate normal distribution. We use the QR decomposition to get an orthogonal basis, then fills in eigenvalues, and converts the resulting covariance matrix to a correlation matrix.

generate.rho <- function (p = 10) {
    # Generate Orthogonal Columns
    E <- qr.Q(qr(matrix(runif(p), p, p)))

    # Generate Eigenvalues from Gamma Distribution
    evalues <- runif(p)

    # Scale eigenvalues to give a sum of 1
    evalues <- evalues/sum(evalues)
    L <- diag(evalues)

    rho <- cov2cor(E %*% diag(evalues) %*% E)
}

Let's sample some covariates - these will play into our censoring and event processes. Since I am sampling my covariates from a multivariate normal distribution with zero mean and correlation matrix, all the covariates will have marginal distributions that are normal zero mean and unit variance. We can create categorical covariates by using the cut function, and control the proportions in each category using the quantiles of the standard normal distribution.

# Number of Observations
n.obs <- 500

# Time at which all observations will be censored
censoring.time <- 10

# Number of Covariates - Generate correlation matrices for covariates
n.categorical <- 4

n.cens.cov   <- 3;
n.event.cov <- 3;
n.common.cov <- 2;
n.null.cov   <- 3;
n.cov <- n.cens.cov + n.event.cov + n.common.cov + n.null.cov

rho <- generate.rho(n.cov)

# Sample covariates & coefficients, assign sign at random
library(MASS) # for mvrnorm function
X <- matrix(mvrnorm(n.obs, matrix(0, n.cov), rho), n.obs)

# Create Categorical Covariates - Tertiles based on Z scores
X[,1:n.categorical] <- apply(X[,1:n.categorical], 2, cut, breaks = c(-Inf, -.43, .43, Inf), labels = F)

So now that we have our covariates, let's randomly assign each of our covariates to the censoring process, the event process, both processes, or being unrelated to the processes. Once we have our covariates, we sample some regression parameters, and create our linear predictors.

# Switch up the order, so we don't know which variables are related to the
# censoring and event processes
var.type <- sample(c(rep("cens", n.cens.cov), rep("event", n.event.cov),
    rep("common", n.common.cov), rep("null", n.null.cov)))
X.cens   <- X[, which(var.type == "cens")]
X.event <- X[, which(var.type == "event")]
X.common <- X[, which(var.type == "common")]
X.null   <- X[, which(var.type == "null")]

B.cens   <- matrix(rbeta(n.cens.cov, 8, 4)*sign(rnorm(n.cens.cov)), n.cens.cov)
B.event <- matrix(rbeta(n.event.cov, 20, 2)*sign(rnorm(n.event.cov)),    n.event.cov)
B.common <- matrix(rbeta(n.common.cov, 6, 4)*sign(rnorm(n.common.cov)),    n.common.cov)

Eta.cens <- cbind(X.cens, X.common) %*% rbind(B.cens, B.common)
Eta.event <- cbind(X.event, X.common) %*% rbind(B.event, B.common)

The last thing we need to do is sample our censoring and event times: in our current example, we only have two competing processes, censoring and one event. This could easily be generalized to several competing events.

Our event times will be sampled from a Weibull distribution. In the Weibull model, the shape parameter is fixed, and the linear predictor enters into the scale parameter. The date we generate can be adapted to both parametric and non-parametric survival models.

Once we have our events and times, we gather all our variables into a data frame, and write our results into a data frame, put it into a .CSV, and we're ready to go.

# Set scale and shape parameters for distribution
event.shape <- 2
cens.shape <- 2

event.scale <- .9*censoring.time
cens.scale <- .9*censoring.time

# Parameterize scale with linear predictor
event.time <- rweibull(n.obs, event.shape, scale = event.scale + Eta.event)
cens.time <- rweibull(n.obs, event.shape, scale = cens.scale + Eta.cens)

time_to_event <- apply(cbind(event.time, cens.time), 1, min)
censored <- matrix(0, n.obs)
censored[which(time_to_event == cens.time)] <- 1
censored[which(event.time >= 10)] <- 1
time_to_event[which(time_to_event > censoring.time)] <- 10
event = 1 - censored

surv.data <- data.frame(censored, event, time_to_event, X)

write.csv(surv.data, paste0("Survival sim ", format(Sys.Date(), "%Y-%m-%d"), ".csv"),
    row.names = F)

This will definitely need some tweaks, but it has some potential. With a little modification, we could introduce other factors like left truncation, discrete time scale, and so on. This was a spur-of-the-moment project, so give it a try, tweak it, and let me know the results.

Asymptotically Unbiased

Wednesday, March 20, 2013

SAS: Adding Watermarks to ODS Report - LIFETEST, LIFEREG, and PHREG

Sunday, March 17, 2013

R: Time To Event Simulation - Weibull Model