Cox Regression

Proportional Hazards Model

Proposed by Cox (1972, JRSS-B), primarily to model the relationship between hazard function and covariates. most cited paper in statistics ( 41; 000 as of April 2016), one of the most cited in science.

Several extensions to more complex data structures, e.g., clustered failure time data, or recurrent event data, etc.

※ Data Structure

Observed data:

추가로 , = covariate vector (possibly time-dependent).

Cox PH Model

semiparametric model:

  • , parametric assumption on covariate effects
  • multiplicative model
  • : vector,
  • , nonparametric; is dimensional
  • shape of hazard function is unspecified

Due to nonparametric component, standard maximum likelihood theory does not apply

Let be the -th element of

  • = difference in log hazards

  • = ratio of hazards; assumed constant for all

  • : baseline hazard; common to all subjects,

The hazard ratio, , is sometimes referred to as a relative risk

  • risk = probability, not a rate
  • hazard is a rate, not a probability
  • in ratio of hazards, time dimension cancels out

Direction of effect:

Magnitude of effect is easy to interpret w.r.t.

Cumulative hazard function:

Survival function:

By fitting a Cox model, one can readily interpret the multiplicative effect on the hazard:

  • ex) randomized trial: treatment () versus placebo (); ()
  • for treated patients is 50% more of that of the controls.
  • irrespective of

Nevertheless, is required in order to determine ‘s effect on , e.g.,

Cox Model: Independent Censoring

Independent censoring assumption is less stringent than in nonparametric estimation.

Assumption is often written as :

※ Note: is allowed to depend on

Semiparametric PH Model: General

  • General expression for multiplicative proportional hazards model:

is link function, specified. , , and in special case, .

  • Other choices for link function (e.g., Self & Prentice, 1983):

※ Notes:

  • not all choices of lead to clear interpretation of
  • certain choices of lead to numerical issues; e.g., likelihood is flat; local maxima, etc.
  • has received little attention in the literature

Multiplicative Model

Cox model is a multiplicative model, i.e., covariates assumed to affect survival probability by multiplying the baseline hazard.

  • Additive models also been proposed

Proportional Hazards Regression and Multiplicative Intensity Model

  • Recall Counting process: martingale representation
  1. intensity , therefore integrated form is cumulative intensity .
  • Multiplicative Intensity Model:
  • Counting process: = Number of events of a specified type that have occurred by time

    • may take more than one jump
    • multiple infections, repeated breakdowns, hospital admissions
  • At-risk process: , left-continuous process, if failure can be observed at time , otherwise .

    • can be used to represent situation in which a subject enter and exit risk sets several times
    • may be even after an observed failure
  • Covariate process: = (bounded) predictable process

    • time-dependent treatment, risk factors
    • model checking and relaxing PH assumption
  • Baseline hazard function: = an arbitrary deterministic function

  • Filtration:

  • Martingale:

  • Intensity function: E \Big \{ dN(t) \Big \| \mathcal F_{t-} \Big} = l(t) dt

  • Data: independent observations on

Likelihood; conditional, marginal and partial likelihoods

  • vector of observations; density of

  • vector parameter;

  • parameter of interest; nuisance parameter

  • likelihood:

    • infinite-dimensional
    • does not involve use (conditional likelihood)
    • does not involve use (marginal likelihood)

is free of use (partial likelihood)

Partial & Marginal Likelihoods

Focus on Proportional Hazards Model: i.e., ( independent triplets)

위에서 unspecified.

  • Partial Likelihood: assume no ties, absolutely continuous failure distribution

Suppose there are L observed failures at (set & )

16.png

Let (i) be the label for individual failing at (set ). Note

Covariates for failures: . (Hereafter, condition on )

Censorship times in : with covariates , i.e., is label for item censored at

17.png

The data can be divided into sets

where, for ,

18.png

19.png

GOAL: Build a likelihood on a subset of the full data set

  • carrying most of the information about
  • carrying no information on nuisance parameters

PROPOSAL: Generate likelihood of

JUSTIFICATION, WHY?:

  • Timing of events can be explained by .
  • Censoring times and labels can be ignored if we assume non-informative censorship (independent censoring).

So this is a partial likelihood in the sense that it is only part of the likelihood of the observed data.

If and , the partial likelihood is , i.e., given the risk set at , and given event occurs at .

Denote as risk set at . Then, by the assumption of independent censoring,

  • at (a),
  • at (2),

Thus, the Partial Likelihood is

Note: unspecified + noninformative censoring contains little or no information about .

  • Counting process notation:
  • Maximum partial likelihood estimator (MPLE): (using Newton-Raphson (NR) algorithm)

    • Specifically, the log partial likelihood is then
    • The score vector, , can be obtained by differentiating w.r.t. :
    • where is a weighted mean of over those observations still at risk at time .

    • The information matrix, , is the negative second derivative where

    • and is the weighted variance of at time .

Then, the MPLE, , is found by solving the partial likelihood equation: .

Under some regularity conditions, is consistent and asymptotically normally distributed with mean and variance (will be shown later.)

The NR algorithm to solve the partial likelihood equation: Compute iteratively until convergence (requires an initial value ).

※ Note:

  1. (incredibly) Robust algorithm!
  2. usually works.

Cox Proportional Hazards Model

Cox model:

※ Note:

  • (1) is relative risk of hazard of death comparing covariates values to

Interpreting Cox Model Coeffcients: is the log RR (hazard ratio) for a unit change in , given all other covariates remain constant, i.e.,

The RR comparing 2 sets of values for the covariates vs. :

20.png

Comparison of Nested Models

  • Nested Models:

To test:

  • Nested Models:

Use the partial likelihood ratio statistic, .

Under : Reduced model, and when is large:

\begin{align} X^2_{Cox} \sim \chi^2_{k-p} && k-p \text{ is the ## of parameters set to 0 by }H_0 \end{align}

20.png, 21.png

Stratification

Two Ways to Stratify. Suppose a confounder has 3 levels on which we would like to stratify when comparin g and . How?

22.png

  • Which Way to Stratify?
  1. Under dummy variable stratification model, the proportional stratum-to-stratum hazards assumption may not be correct. If not, the con-founder may be inadequately controlled.
  2. Proportionality assumption can be checked using time-dependent covariates.
  3. True stratification is a more thorough adjustment, as long as observations within each level are homogeneous. If can be measured continuously and the strata were formed by grouping values of it, better control for might be achieved with continuous (could be time-dependent) covariate adjustment.
  4. If is controlled using the true stratification there is no way to estimate one summary relative risk comparing two levels of . However, we can estimate for each stratum then we can estimate a RR function.
  5. True stratification generally requires more data to obtain the same precision in coefficient estimates.

23.png

24.png

Test statistics

The standard asymptotic likelihood inference tests, Wald, score, and likelihood ratio (LR), still can be applied for the Cox partial likelihood.

25.png

Their finite sample properties may differ; in general, the LRT is the most reliable, the Wald test is the least.

26.png

When and the single covariate is categorical, the score test is identical to the log-rank test.

27.png

Handling ties

Real data sets often contain tied event times.

  • When do we have ties?
  1. Continuous event times are grouped into intervals.
  2. Event time scale is discrete.

Four commonly used ways of handling ties: 1) Breslow approximation, 2) Efron approximation, 3) Exact partial likelihood, and 4) Averaged likelihood.

When the underlying time is continuous but ties are generated due to a grouping, the contribution to the partial likelihood for the -th event at time is

Two commonly used methods are

  1. Breslow approximation
  2. Efron approximation

Example: Assume 5 subjects are at risk of dying at time and two die at the same time (because of grouping of time) If the time data had been more precise, then the first two terms in the likelihood would be either

28.png

29.png

30.png