Multivariate Nomral (wk2)
Overview
let rvec .
then follows MVN.
at here, if , then .
notation:
rvec have Multivariate Normal Distribution, if has Univariate Normal Distribution, for every possible set of values for the elements in .
pdf for .
Ellipsoid:
- path of values yielding a constant height for the density,
i.e., all s.t. .
Standard Normal Distribtion:
- ,
where satiesfy .
Property of :
- symmetric Matrix
- positive definite Matrix
- .
※ if is symmetric and non-singular, then , where is lower triangular Matrix. This is called Cholesky Decomposition of .
- if is non-singular Matrix
- 는 symmetric, n.n.d.
이상의 에 대해 이하는 TFAE.
- .
Spectral Decomposition
if is symmetric, non-singular, then , where are ev (), and are evec (. This is called Spectral Decomposition of .
이때 .
Center & Axis of ellipsoids of :
- center:
- axis :
Square root Matrix:
let symmetric non-negative Matrix . the square root matrix of is defined as , where
Negative Square Root Matrix:
Let be of full rank and all of its are positive, in addition to symmetry. , where
Generalized Inverse:
let be a non-negative M. if , i.e., not full rank, then the Moore-Penrose generalized inverse of is given by
where
Marginal Distribtion:
Properties of MVN
- linear combination of the components of are normally distributed.
- any subset of have MVN.
- conditional distribution of the components of are MVN:
즉슨 dimension 변화
if , and cvec , then .
- If we partition y, μ, S ! ! as follows
Let 1 11 2 ~ ( , ) p y y N y μ é ù ê ú = ê ú S ê ú ë û !” ! ! ! with
distribution
if , then .
if , then
the distribution assigns probability to the solid ellipsoid , where denotes upper th percentile of the distribution.
Linear Combination of Random Vectors
Multivariate Normal Likelihood
Sampling Distribtion of
let rvec .
(n-1) \ast S \sim df=n-1$
- is random Matrix, e.g., Wishart is distribution of rM.
.
Wishart Distribtion
※ , i.e., \sum (x_i - \barx )^2 = (n-1)S^2 \sim \sigma^2 \ast \chi_{n-1}^
for let rvec ,
if , and
if , then
if , , where gamma function.
MV t-Distribtion
※ univariate t-Distribtion , where , and .
let , and , and .
assume rvec , * Note that each .
at here, joint distribution of is called MV t-distribution, with and matrix parameter .
***denote this distribution by ***
Dirichlet Distribution
※ is MV generalization of .
let
- parameters:
- pdf:
?????????????????????????????????????????????????
CLT
let
. then
Assessing Normality
1. Univariate Marginal Distribtion
a. Q-Q Plot
※ Sample quantile vs. quantile of N distribution
let order statitics, or sample quantiles .
the proportion of sample below is approximated by .
the quantiles for std. N are defined as
if the data arise from a N population, then .
Similarly, the pairs will be linearly related.
Proceeds:
- get from original obs.
- calculate probability values
- calculate standard normal quantities
- plot the pairs of observations (q_{(n)}, x_{(n)})$
Checking the straightness of Q-Q plot:
- using corr coef
- Hypothesis tesiting: , $T=\tfrac {r\sqrt{n-2}}{\sqrt{1-r^2}} \overset {H_0}{\sim} t_{n-2}
b. others
-
- Shapiro-Wilks Test:
Test of correlation coefficient b/w . is function of the expected value of standard normal order statistics, and their .
-
- Kolmogorov-Smirnov Test
Compare cdf’s:
If the data arise from a normal population, the differences are small.
where cdf , empirical cdf .
-
- Skewness Test
skewness
When the population is normal, the skewness = 0.
-
- Kurtosis Test:
kurtosis
When the population is normal, the kurtosis is 3.
-
- Lin and Mudholkar (1980):
where is the sample of pair with .
if the data arise from a normal population, .
2. Bivariate Normality
※ If the data are generated from a multivariate normal, each bivariate distribution would be normal.
-
- Scatter Plot
the contours of bivariate normal density are ellipses. The pattern of the scatter plot must be near elliptical.
-
- Squared Generalized Distances
※ .
it means, for bivariate cases, Squared Generalized Distances .
-
- Chi2 Plot (Gamma Plot)
should behave like rv.
- order the squared distances
- calculate the probabilitt values ,
- Calculate quantiles of distribution .
- Plot the pairs where
The plot should resemble a straight line through the origin having slope 1.
2. Multivariate Normality
Practically, it is usually sufficient to investigate the univariate and bivariate distributions.
Chi-square plot is still useful. When the parent population is multivariate normal, and both and are greater than 25 or 30, the squared generalized distance should behave like .
Power Transformation
Examine Q-Q plot to see whether the normal assumption is satisfactory after power transformation.
Power Transformation
at here, find that maximizes
where
is the most feasible values for normal distribution, but not guaranteed to follow normal distribution.
- Transformation (Box-Cox) usually improves the approximation to normality.
- Trial-and-error calculations may be necessary to find that maximizes
- Usually, change values from -1 to 1 with increment 0.1.
- Examine Q-Q plot after the Box-Cox transformation.