Multivariate Nomral (wk2)

Overview

let rvec .

then follows MVN.

at here, if , then .


notation:

rvec have Multivariate Normal Distribution, if has Univariate Normal Distribution, for every possible set of values for the elements in .

pdf for .


Ellipsoid:

  • path of values yielding a constant height for the density,
    i.e., all s.t. .

Standard Normal Distribtion:

  • ,
    where satiesfy .

Property of :

  1. symmetric Matrix
  2. positive definite Matrix
  3. .

※ if is symmetric and non-singular, then , where is lower triangular Matrix. This is called Cholesky Decomposition of .


  1. if is non-singular Matrix
  2. 는 symmetric, n.n.d.

이상의 에 대해 이하는 TFAE.

  1. .

Spectral Decomposition

if is symmetric, non-singular, then , where are ev (), and are evec (. This is called Spectral Decomposition of .

이때 .

Center & Axis of ellipsoids of :

  • center:
  • axis :


Square root Matrix:

let symmetric non-negative Matrix . the square root matrix of is defined as , where



Negative Square Root Matrix:

Let be of full rank and all of its are positive, in addition to symmetry. , where



Generalized Inverse:

let be a non-negative M. if , i.e., not full rank, then the Moore-Penrose generalized inverse of is given by

where



Marginal Distribtion:

Properties of MVN

  1. linear combination of the components of are normally distributed.
  2. any subset of have MVN.
  3. conditional distribution of the components of are MVN:

즉슨 dimension 변화

if , and cvec , then .

  1. If we partition y, μ, S ! ! as follows

Let 1 11 2 ~ ( , ) p y y N y μ é ù ê ú = ê ú S ê ú ë û !” ! ! ! with

distribution

if , then .

if , then

the distribution assigns probability to the solid ellipsoid , where denotes upper th percentile of the distribution.

Linear Combination of Random Vectors

Multivariate Normal Likelihood

Sampling Distribtion of

let rvec .

(n-1) \ast S \sim df=n-1$

  • is random Matrix, e.g., Wishart is distribution of rM.

.

Wishart Distribtion

, i.e., \sum (x_i - \barx )^2 = (n-1)S^2 \sim \sigma^2 \ast \chi_{n-1}^

for let rvec ,

if , and

if , then

if , , where gamma function.



MV t-Distribtion

※ univariate t-Distribtion , where , and .

let , and , and .

assume rvec , * Note that each .

at here, joint distribution of is called MV t-distribution, with and matrix parameter .

***denote this distribution by ***



Dirichlet Distribution

※ is MV generalization of .

let

  • parameters:
  • pdf:

?????????????????????????????????????????????????



CLT

let

. then




Assessing Normality

1. Univariate Marginal Distribtion
a. Q-Q Plot

※ Sample quantile vs. quantile of N distribution

let order statitics, or sample quantiles .

the proportion of sample below is approximated by .

the quantiles for std. N are defined as

if the data arise from a N population, then .

Similarly, the pairs will be linearly related.

Proceeds:

  1. get from original obs.
  2. calculate probability values
  3. calculate standard normal quantities
  4. plot the pairs of observations (q_{(n)}, x_{(n)})$

Checking the straightness of Q-Q plot:

  • using corr coef
  • Hypothesis tesiting: , $T=\tfrac {r\sqrt{n-2}}{\sqrt{1-r^2}} \overset {H_0}{\sim} t_{n-2}


b. others
    1. Shapiro-Wilks Test:

Test of correlation coefficient b/w . is function of the expected value of standard normal order statistics, and their .


    1. Kolmogorov-Smirnov Test

Compare cdf’s:

If the data arise from a normal population, the differences are small.

where cdf , empirical cdf .


    1. Skewness Test

skewness

When the population is normal, the skewness = 0.


    1. Kurtosis Test:

kurtosis

When the population is normal, the kurtosis is 3.


    1. Lin and Mudholkar (1980):

where is the sample of pair with .

if the data arise from a normal population, .

2. Bivariate Normality

※ If the data are generated from a multivariate normal, each bivariate distribution would be normal.

    1. Scatter Plot

the contours of bivariate normal density are ellipses. The pattern of the scatter plot must be near elliptical.


    1. Squared Generalized Distances

.

it means, for bivariate cases, Squared Generalized Distances .


    1. Chi2 Plot (Gamma Plot)

should behave like rv.

  1. order the squared distances
  2. calculate the probabilitt values ,
  3. Calculate quantiles of distribution .
  4. Plot the pairs where

The plot should resemble a straight line through the origin having slope 1.



2. Multivariate Normality

Practically, it is usually sufficient to investigate the univariate and bivariate distributions.

Chi-square plot is still useful. When the parent population is multivariate normal, and both and are greater than 25 or 30, the squared generalized distance should behave like .




Power Transformation

Examine Q-Q plot to see whether the normal assumption is satisfactory after power transformation.


Power Transformation

at here, find that maximizes

where

is the most feasible values for normal distribution, but not guaranteed to follow normal distribution.

  • Transformation (Box-Cox) usually improves the approximation to normality.
  • Trial-and-error calculations may be necessary to find that maximizes
  • Usually, change values from -1 to 1 with increment 0.1.
  • Examine Q-Q plot after the Box-Cox transformation.
nqplot, contour plot, cqplot, cqplot and box-cox plot