Note
Directions are added
until the corresponding projected variance is greater than 1e-6.
public class PrincipalProjections
Public Class PrincipalProjections
public ref class PrincipalProjections
type PrincipalProjections = class end
Multidimensional weighted points and their statistics
Let us consider points
in
.
Their statistical characteristics are usually analyzed
by imposing a weighting scheme to them, that is, a
relative weight
(possibly
elementary, i.e.
) is assigned to
point
,
and controls how the point contributes to the
overall statistics, provided that the weights sum
up to
. Such set of weighted points can thus
be expressed
by means of the pair
,
referred to as a
weighted multidimensional structure
where
is the matrix whose
-th
row is
(the transpose of
),
and
is
the weighting scheme expressed as a sequence:
Given a basis of
, a
structure
can be represented by a
cloud, say
, which
can be formally defined as the
triplet
, where
is the coordinates matrix w.r.t. of the points
in
, i.e., its
-th row,
, stands for the coordinates
of point
.
The basic statistics of can be defined
as follows. Its
mean point, the vector
,
whose
coordinates are given by
,
is
Furthermore, the variance of ,
say
, is defined
as follows:
where
is the distance induced by basis via
its norm
and scalar product
where and
The columns of are referred to as
the active variables
observed at the
points w.r.t. basis
. The covariance matrix of such variables
can be defined
as follows,
where
Notice how the variance of can be characterized
in terms of the covariances
w.r.t. to
,
with being the trace operator.
Principal projections
A set of weighted points in a vector space
can be represented by different clouds, simply by selecting different
bases for that space,
but the basic statistics of , its mean and
variance constants, are the same irrespective of
the cloud selected to analyze
.
On the contrary, the covariances are not coordinate-free, i.e.
they depend on the basis chosen to represent the points
in
.
As a consequence, a question raises
about the choice of the basis to apply when observing the data
and trying to analyze their variability.
Is it possible
to select a basis that enhances our comprehension of the variance
of
, better figuring out how it can
be explained given the available evidence?
A possible approach is that of seeking a basis such that the variables
in the corresponding cloud become uncorrelated, so simplifying the
interpretation of . In addition, if
is a
high-dimensional structure, another
useful goal would be that of approximating the cloud in a lower
dimensional space, defined so as to
minimize the lost of information due to a representation
of
obtained
in a reduced space.
Such tasks can be approached by projecting the points
in
along
its principal directions[1].
When projecting the points along a direction
in
, the collection of points resulting from
the projection defines a new projected structure,
whose variance can contribute to the explanation
of
.
The first principal direction of
,
say
, is thus selected
so that the variance of its projected structure,
say
, is maximal.
With a first projected structure, there is associated a
residual structure, say
, whose points have a
variance that is the
difference between
and
, i.e. it
represents a measure of the
variability not
yet explained exploiting
the first principal direction.
If such a measure is not considered sufficiently low,
one determines the
second principal direction of
,
say
, obtained by
maximizing the variance of
under
the constraint that
is orthogonal to
.
From the second
residual structure
, one proceeds in the same
way to determine a
third principal direction, perpendicular to the previous ones,
and so on, until the determination of, say,
principal
directions able
to guarantee the required explanation of the
variance.
Instantiation
The cloud exploited to initially represent the
structure
is referred to as the
active cloud.
Given a Cloud instance representing
,
a corresponding PrincipalProjections object can be obtained
by calling on it method GetPrincipalProjections.
Projecting points in lower dimensional spaces characterizes several statistical multidimensional methods, such as the Principal Components Analysis, the Correspondence, and the Multiple Correspondence Analyses. All these classes provides methods to create the required PrincipalProjections instances, and properties to access them.
Once created, the PrincipalProjections instance enables the access to the initial cloud via its property ActiveCloud.
The identification of the principal directions
of is provided by the eigendecomposition
of matrix
.
In fact,
let
be the
matrix
whose columns are the
normalized eigenvectors of
w.r.t.
basis
,
that is
satisfies the conditions
and
where is the diagonal matrix whose entries
are the corresponding
eigenvalues of
in decreasing order.
Then the
principal directions
of
satisfy
The rows of can be interpreted
as the coordinates w.r.t. basis
of the basis, say
,
whose matrix is given by
and the points in admit the following
coordinates matrix w.r.t.
:
This argument suggests that a principal cloud of
can be defined as the cloud
.
Approximations in lower dimensional spaces
The dimension of the active cloud is equal to that of the
corresponding
principal one. However, often not all the principal
variables are taken
into account: by keeping only some of the first ones,
say
,
a dimensionality reduction can be achieved, while simultaneously
preserving
- as much as possible - the original variance of the points
in
.
A PrincipalProjections instance reports information about
such variables. First of all,
is
returned by property NumberOfDirections. Additional insights
are exposed as follows.
Variance breakdowns
The covariance matrix of a principal cloud is
characterized as follows:
hence the variance of can be
factorized as follows:
A finer factorization of is
obtained by taking
into account the specific contributions of each point in
to the overall variance.
Remember that the active and principal
clouds represent the same weighted structure
, hence
the distances among
cloud points and the corresponding means are also preserved.
Thus one has
The quantity
can thus be interpreted as the amount of the
the -th principal variable's variance
due to the
-th point of the cloud, since
Such values can also be exploited to supply aids to the
interpretation of the principal cloud. In particular,
the relative contribution of the -th point
to the variance of the
-th principal variable is defined as the quantity
which is the generic entry of the
matrix returned by property Contributions,
while the quality of representation of the -th
point of
on the
-th principal direction as
where is the angle between the
vectors
and
. You can inspect the squared cosines
by getting property RepresentationQualities.
Relationships between active and principal variables
Given an active cloud and a
corresponding principal
cloud
, one can regress the
active variables on the first
standardized
principal variables
as follows.
Let
be the
-th
column of
, i.e.,
the
-th active variable; furthermore, define
as the matrix representing the first
columns of
, and
as the sub-matrix of
given by deleting its last
rows and columns.
Since the points are weighted, the regression can be achieved
by applying the principle of Weighted Least Squares to define the
following optimization
problem:
where, being a column vector of
ones,
is the matrix of the first standardized
principal coordinates.
Thus, for
,
It can be noted that
hence one can define the following matrix
of regression coefficients:
Matrix can be analyzed by getting property
RegressionCoefficients.
The correlations among the -th active variable
and the
standardized principal variables can thus be obtained as follows:
hence the following matrix of correlations:
which is returned by property Correlations.
ActiveCloud | Gets the active cloud of this instance. |
Contributions | Gets the relative contributions of the projected points to the variances of the principal variables. |
Coordinates | Gets the principal coordinates of the projected points. |
Correlations | Gets the correlations among the active variables and the standardized principal variables. |
Directions | Gets the coordinates of the principal directions w.r.t. the basis of the ActiveCloud. |
NumberOfDirections | Gets the number of principal directions. |
RegressionCoefficients | Gets the coefficients of the regression of each active variable on the standardized principal variables. |
RepresentationQualities | Gets the point representation qualities on each principal direction. |
Variances | Gets the variances of the principal variables. |
CorrelateSupplementaryVariables(DoubleMatrix) | Gets the correlations of each specified supplementary variable on the standardized principal variables. |
CorrelateSupplementaryVariables(ReadOnlyDoubleMatrix) | Gets the correlations of each specified supplementary variable on the standardized principal variables. |
Equals | Determines whether the specified object is equal to the current object. (Inherited from Object) |
Finalize | Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object) |
GetHashCode | Serves as the default hash function. (Inherited from Object) |
GetType | Gets the Type of the current instance. (Inherited from Object) |
LocateSupplementaryPoints(DoubleMatrix) | Gets the principal coordinates of the specified supplementary points given their active coordinates. |
LocateSupplementaryPoints(ReadOnlyDoubleMatrix) | Gets the principal coordinates of the specified supplementary points given their active coordinates. |
MemberwiseClone | Creates a shallow copy of the current Object. (Inherited from Object) |
RegressSupplementaryVariables(DoubleMatrix) | Gets the coefficients of the regression of each specified supplementary variable on the standardized principal variables. |
RegressSupplementaryVariables(ReadOnlyDoubleMatrix) | Gets the coefficients of the regression of each specified supplementary variable on the standardized principal variables. |
ToString | Returns a string that represents the current object. (Inherited from Object) |