PrincipalProjections Class |
Namespace: Novacta.Analytics
The PrincipalProjections type exposes the following members.
Name | Description | |
---|---|---|
ActiveCloud |
Gets the active cloud of this instance.
| |
Contributions |
Gets the relative contributions of the projected points to the
variances of the principal variables.
| |
Coordinates |
Gets the principal coordinates of the projected points.
| |
Correlations |
Gets the correlations among the active variables and
the standardized principal variables.
| |
Directions |
Gets the coordinates of the principal directions
w.r.t. the basis of the ActiveCloud.
| |
NumberOfDirections |
Gets the number of principal directions.
| |
RegressionCoefficients |
Gets the coefficients of the regression of each active variable on
the standardized principal variables.
| |
RepresentationQualities |
Gets the point representation qualities on each principal direction.
| |
Variances |
Gets the variances of the principal variables.
|
Name | Description | |
---|---|---|
CorrelateSupplementaryVariables(DoubleMatrix) |
Gets the correlations of each specified
supplementary variable on
the standardized principal variables.
| |
CorrelateSupplementaryVariables(ReadOnlyDoubleMatrix) |
Gets the correlations of each specified
supplementary variable on
the standardized principal variables.
| |
Equals | Determines whether the specified object is equal to the current object. (Inherited from Object.) | |
Finalize | Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.) | |
GetHashCode | Serves as the default hash function. (Inherited from Object.) | |
GetType | Gets the Type of the current instance. (Inherited from Object.) | |
LocateSupplementaryPoints(DoubleMatrix) |
Gets the principal coordinates of the
specified supplementary points given
their active coordinates.
| |
LocateSupplementaryPoints(ReadOnlyDoubleMatrix) |
Gets the principal coordinates of the
specified supplementary points given
their active coordinates.
| |
MemberwiseClone | Creates a shallow copy of the current Object. (Inherited from Object.) | |
RegressSupplementaryVariables(DoubleMatrix) |
Gets the coefficients of the regression of each specified
supplementary variable on
the standardized principal variables.
| |
RegressSupplementaryVariables(ReadOnlyDoubleMatrix) |
Gets the coefficients of the regression of each specified
supplementary variable on
the standardized principal variables.
| |
ToString | Returns a string that represents the current object. (Inherited from Object.) |
Multidimensional weighted points and their statistics
Let us consider points in
.
Their statistical characteristics are usually analyzed
by imposing a weighting scheme to them, that is, a
relative weight (possibly
elementary, i.e. ) is assigned to
point ,
and controls how the point contributes to the
overall statistics, provided that the weights sum
up to . Such set of weighted points can thus
be expressed
by means of the pair ,
referred to as a
weighted multidimensional structure
where
is the matrix whose -th
row is (the transpose of ),
and is
the weighting scheme expressed as a sequence:
Given a basis of , a
structure
can be represented by a
cloud, say , which
can be formally defined as the
triplet , where
is the coordinates matrix w.r.t. of the points
in , i.e., its -th row,
, stands for the coordinates
of point .
The basic statistics of can be defined
as follows. Its
mean point, the vector ,
whose coordinates are given by
,
is
Furthermore, the variance of ,
say , is defined
as follows:
where
is the distance induced by basis via
its norm
and scalar product
where and
The columns of are referred to as
the active variables
observed at the points w.r.t. basis
. The covariance matrix of such variables
can be defined
as follows,
where
Notice how the variance of can be characterized
in terms of the covariances
w.r.t. to ,
with being the trace operator.
Principal projections
A set of weighted points in a vector space can be represented by different clouds, simply by selecting different bases for that space, but the basic statistics of , its mean and variance constants, are the same irrespective of the cloud selected to analyze . On the contrary, the covariances are not coordinate-free, i.e. they depend on the basis chosen to represent the points in . As a consequence, a question raises about the choice of the basis to apply when observing the data and trying to analyze their variability. Is it possible to select a basis that enhances our comprehension of the variance of , better figuring out how it can be explained given the available evidence?
A possible approach is that of seeking a basis such that the variables in the corresponding cloud become uncorrelated, so simplifying the interpretation of . In addition, if is a high-dimensional structure, another useful goal would be that of approximating the cloud in a lower dimensional space, defined so as to minimize the lost of information due to a representation of obtained in a reduced space. Such tasks can be approached by projecting the points in along its principal directions[1] .
When projecting the points along a direction in , the collection of points resulting from the projection defines a new projected structure, whose variance can contribute to the explanation of . The first principal direction of , say , is thus selected so that the variance of its projected structure, say , is maximal. With a first projected structure, there is associated a residual structure, say , whose points have a variance that is the difference between and , i.e. it represents a measure of the variability not yet explained exploiting the first principal direction. If such a measure is not considered sufficiently low, one determines the second principal direction of , say , obtained by maximizing the variance of under the constraint that is orthogonal to . From the second residual structure , one proceeds in the same way to determine a third principal direction, perpendicular to the previous ones, and so on, until the determination of, say, principal directions able to guarantee the required explanation of the variance.
Instantiation
The cloud exploited to initially represent the structure is referred to as the active cloud. Given a Cloud instance representing , a corresponding PrincipalProjections object can be obtained by calling on it method GetPrincipalProjections.
Projecting points in lower dimensional spaces characterizes several statistical multidimensional methods, such as the Principal Components Analysis, the Correspondence, and the Multiple Correspondence Analyses. All these classes provides methods to create the required PrincipalProjections instances, and properties to access them.
Once created, the PrincipalProjections instance enables the access to the initial cloud via its property ActiveCloud.
The identification of the principal directions
of is provided by the eigendecomposition
of matrix
.
In fact,
let be the matrix
whose columns are the
normalized eigenvectors of w.r.t.
basis ,
that is satisfies the conditions
and
where is the diagonal matrix whose entries
are the corresponding
eigenvalues of in decreasing order.
Then the
principal directions of
satisfy
The rows of can be interpreted
as the coordinates w.r.t. basis
of the basis, say ,
whose matrix is given by
and the points in admit the following
coordinates matrix w.r.t. :
This argument suggests that a principal cloud of
can be defined as the cloud
.
Approximations in lower dimensional spaces
The dimension of the active cloud is equal to that of the corresponding principal one. However, often not all the principal variables are taken into account: by keeping only some of the first ones, say , a dimensionality reduction can be achieved, while simultaneously preserving - as much as possible - the original variance of the points in .
A PrincipalProjections instance reports information about such variables. First of all, is returned by property NumberOfDirections. Additional insights are exposed as follows.
Variance breakdowns
The covariance matrix of a principal cloud is
characterized as follows:
hence the variance of can be
factorized as follows:
A finer factorization of is
obtained by taking
into account the specific contributions of each point in
to the overall variance.
Remember that the active and principal
clouds represent the same weighted structure , hence
the distances among
cloud points and the corresponding means are also preserved.
Thus one has
The quantity
can thus be interpreted as the amount of the
the -th principal variable's variance
due to the -th point of the cloud, since
Such values can also be exploited to supply aids to the
interpretation of the principal cloud. In particular,
the relative contribution of the -th point
to the variance of the
-th principal variable is defined as the quantity
which is the generic entry of the
matrix returned by property Contributions,
while the quality of representation of the -th
point of
on the -th principal direction as
where is the angle between the
vectors and
. You can inspect the squared cosines
by getting property RepresentationQualities.
Note |
---|
Directions are added until the corresponding projected variance is greater than 1e-6. |
Relationships between active and principal variables
Given an active cloud and a
corresponding principal
cloud , one can regress the
active variables on the first standardized
principal variables
as follows.
Let be the -th
column of , i.e.,
the -th active variable; furthermore, define
as the matrix representing the first
columns of , and
as the sub-matrix of
given by deleting its last
rows and columns.
Since the points are weighted, the regression can be achieved
by applying the principle of Weighted Least Squares to define the
following optimization
problem:
where, being a column vector of ones,
is the matrix of the first standardized
principal coordinates.
Thus, for ,
It can be noted that
hence one can define the following matrix
of regression coefficients:
Matrix can be analyzed by getting property
RegressionCoefficients.
The correlations among the -th active variable
and the
standardized principal variables can thus be obtained as follows:
hence the following matrix of correlations:
which is returned by property Correlations.