Click or drag to resize

PrincipalProjections Class

Represents the principal projections of a set of multidimensional, weighted points, and supports their elaboration in terms of directions, coordinates, variances, and point contributions to the overall variability.
Inheritance Hierarchy

Namespace:  Novacta.Analytics
Assembly:  Novacta.Analytics (in Novacta.Analytics.dll) Version: 2.0.0
Syntax
public class PrincipalProjections

The PrincipalProjections type exposes the following members.

Properties
  NameDescription
Public propertyActiveCloud
Gets the active cloud of this instance.
Public propertyContributions
Gets the relative contributions of the projected points to the variances of the principal variables.
Public propertyCoordinates
Gets the principal coordinates of the projected points.
Public propertyCorrelations
Gets the correlations among the active variables and the standardized principal variables.
Public propertyDirections
Gets the coordinates of the principal directions w.r.t. the basis of the ActiveCloud.
Public propertyNumberOfDirections
Gets the number of principal directions.
Public propertyRegressionCoefficients
Gets the coefficients of the regression of each active variable on the standardized principal variables.
Public propertyRepresentationQualities
Gets the point representation qualities on each principal direction.
Public propertyVariances
Gets the variances of the principal variables.
Top
Methods
  NameDescription
Public methodCorrelateSupplementaryVariables(DoubleMatrix)
Gets the correlations of each specified supplementary variable on the standardized principal variables.
Public methodCorrelateSupplementaryVariables(ReadOnlyDoubleMatrix)
Gets the correlations of each specified supplementary variable on the standardized principal variables.
Public methodEquals
Determines whether the specified object is equal to the current object.
(Inherited from Object.)
Protected methodFinalize
Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object.)
Public methodGetHashCode
Serves as the default hash function.
(Inherited from Object.)
Public methodGetType
Gets the Type of the current instance.
(Inherited from Object.)
Public methodLocateSupplementaryPoints(DoubleMatrix)
Gets the principal coordinates of the specified supplementary points given their active coordinates.
Public methodLocateSupplementaryPoints(ReadOnlyDoubleMatrix)
Gets the principal coordinates of the specified supplementary points given their active coordinates.
Protected methodMemberwiseClone
Creates a shallow copy of the current Object.
(Inherited from Object.)
Public methodRegressSupplementaryVariables(DoubleMatrix)
Gets the coefficients of the regression of each specified supplementary variable on the standardized principal variables.
Public methodRegressSupplementaryVariables(ReadOnlyDoubleMatrix)
Gets the coefficients of the regression of each specified supplementary variable on the standardized principal variables.
Public methodToString
Returns a string that represents the current object.
(Inherited from Object.)
Top
Remarks

Multidimensional weighted points and their statistics

Let us consider LaTeX equation points LaTeX equation in LaTeX equation. Their statistical characteristics are usually analyzed by imposing a weighting scheme to them, that is, a relative weight LaTeX equation (possibly elementary, i.e. LaTeX equation) is assigned to point LaTeX equation, and controls how the point contributes to the overall statistics, provided that the weights sum up to LaTeX equation. Such set of weighted points can thus be expressed by means of the pair LaTeX equation, referred to as a weighted multidimensional structure where

LaTeX equation

is the LaTeX equation matrix whose LaTeX equation-th row is LaTeX equation (the transpose of LaTeX equation), and LaTeX equation is the weighting scheme expressed as a sequence:

LaTeX equation

Given a basis LaTeX equation of LaTeX equation, a structure LaTeX equation can be represented by a cloud, say LaTeX equation, which can be formally defined as the triplet LaTeX equation, where

LaTeX equation

is the coordinates matrix w.r.t. LaTeX equation of the points in LaTeX equation, i.e., its LaTeX equation-th row, LaTeX equation, stands for the coordinates of point LaTeX equation.

The basic statistics of LaTeX equation can be defined as follows. Its mean point, the vector LaTeX equation, whose LaTeX equation coordinates are given by LaTeX equation, is

LaTeX equation

Furthermore, the variance of LaTeX equation, say LaTeX equation, is defined as follows:

LaTeX equation

where

LaTeX equation

is the distance induced by basis LaTeX equation via its norm

LaTeX equation

and scalar product

LaTeX equation

where LaTeX equation and

LaTeX equation

The columns of LaTeX equation are referred to as the active variables observed at the LaTeX equation points w.r.t. basis LaTeX equation. The covariance matrix of such variables can be defined as follows,

LaTeX equation

where

LaTeX equation

Notice how the variance of LaTeX equation can be characterized in terms of the covariances w.r.t. to LaTeX equation,

LaTeX equation

with LaTeX equation being the trace operator.

Principal projections

A set of weighted points in a vector space can be represented by different clouds, simply by selecting different bases for that space, but the basic statistics of LaTeX equation, its mean and variance constants, are the same irrespective of the cloud selected to analyze LaTeX equation. On the contrary, the covariances are not coordinate-free, i.e. they depend on the basis chosen to represent the points in LaTeX equation. As a consequence, a question raises about the choice of the basis to apply when observing the data and trying to analyze their variability. Is it possible to select a basis that enhances our comprehension of the variance of LaTeX equation, better figuring out how it can be explained given the available evidence?

A possible approach is that of seeking a basis such that the variables in the corresponding cloud become uncorrelated, so simplifying the interpretation of LaTeX equation. In addition, if LaTeX equation is a high-dimensional structure, another useful goal would be that of approximating the cloud in a lower dimensional space, defined so as to minimize the lost of information due to a representation of LaTeX equation obtained in a reduced space. Such tasks can be approached by projecting the points in LaTeX equation along its principal directions[1] .

When projecting the LaTeX equation points along a direction in LaTeX equation, the collection of points resulting from the projection defines a new projected structure, whose variance can contribute to the explanation of LaTeX equation. The first principal direction of LaTeX equation, say LaTeX equation, is thus selected so that the variance of its projected structure, say LaTeX equation, is maximal. With a first projected structure, there is associated a residual structure, say LaTeX equation, whose points have a variance that is the difference between LaTeX equation and LaTeX equation, i.e. it represents a measure of the LaTeX equation variability not yet explained exploiting the first principal direction. If such a measure is not considered sufficiently low, one determines the second principal direction of LaTeX equation, say LaTeX equation, obtained by maximizing the variance of LaTeX equation under the constraint that LaTeX equation is orthogonal to LaTeX equation. From the second residual structure LaTeX equation, one proceeds in the same way to determine a third principal direction, perpendicular to the previous ones, and so on, until the determination of, say, LaTeX equation principal directions able to guarantee the required explanation of the LaTeX equation variance.

Instantiation

The cloud LaTeX equation exploited to initially represent the structure LaTeX equation is referred to as the active cloud. Given a Cloud instance representing LaTeX equation, a corresponding PrincipalProjections object can be obtained by calling on it method GetPrincipalProjections.

Projecting points in lower dimensional spaces characterizes several statistical multidimensional methods, such as the Principal Components Analysis, the Correspondence, and the Multiple Correspondence Analyses. All these classes provides methods to create the required PrincipalProjections instances, and properties to access them.

Once created, the PrincipalProjections instance enables the access to the initial cloud via its property ActiveCloud.

The identification of the principal directions of LaTeX equation is provided by the eigendecomposition of matrix LaTeX equation. In fact, let LaTeX equation be the LaTeX equation matrix whose columns are the normalized eigenvectors of LaTeX equation w.r.t. basis LaTeX equation, that is LaTeX equation satisfies the conditions

LaTeX equation

and

LaTeX equation

where LaTeX equation is the diagonal matrix whose entries are the corresponding eigenvalues of LaTeX equation in decreasing order. Then the principal directions LaTeX equation of LaTeX equation satisfy

LaTeX equation

The rows of LaTeX equation can be interpreted as the coordinates w.r.t. basis LaTeX equation of the basis, say LaTeX equation, whose matrix is given by

LaTeX equation

and the points in LaTeX equation admit the following coordinates matrix w.r.t. LaTeX equation:

LaTeX equation

This argument suggests that a principal cloud of LaTeX equation can be defined as the cloud LaTeX equation.

Approximations in lower dimensional spaces

The dimension of the active cloud is equal to that of the corresponding principal one. However, often not all the LaTeX equation principal variables are taken into account: by keeping only some of the first ones, say LaTeX equation, a dimensionality reduction can be achieved, while simultaneously preserving - as much as possible - the original variance of the points in LaTeX equation.

A PrincipalProjections instance reports information about such LaTeX equation variables. First of all, LaTeX equation is returned by property NumberOfDirections. Additional insights are exposed as follows.

Variance breakdowns

The covariance matrix of a principal cloud is characterized as follows:

LaTeX equation

hence the variance of LaTeX equation can be factorized as follows:

LaTeX equation

A finer factorization of LaTeX equation is obtained by taking into account the specific contributions of each point in LaTeX equation to the overall variance. Remember that the active and principal clouds represent the same weighted structure LaTeX equation, hence the distances among cloud points and the corresponding means are also preserved. Thus one has

LaTeX equation

The quantity

LaTeX equation

can thus be interpreted as the amount of the the LaTeX equation-th principal variable's variance due to the LaTeX equation-th point of the cloud, since

LaTeX equation

Such values can also be exploited to supply aids to the interpretation of the principal cloud. In particular, the relative contribution of the LaTeX equation-th point to the variance of the LaTeX equation-th principal variable is defined as the quantity

LaTeX equation

which is the generic entry of the matrix returned by property Contributions, while the quality of representation of the LaTeX equation-th point of LaTeX equation on the LaTeX equation-th principal direction as

LaTeX equation

where LaTeX equation is the angle between the vectors LaTeX equation and LaTeX equation. You can inspect the squared cosines by getting property RepresentationQualities.

Note Note
Directions are added until the corresponding projected variance is greater than 1e-6.

Relationships between active and principal variables

Given an active cloud LaTeX equation and a corresponding principal cloud LaTeX equation, one can regress the active variables on the first LaTeX equation standardized principal variables as follows. Let LaTeX equation be the LaTeX equation-th column of LaTeX equation, i.e., the LaTeX equation-th active variable; furthermore, define LaTeX equation as the matrix representing the first LaTeX equation columns of LaTeX equation, and LaTeX equation as the sub-matrix of LaTeX equation given by deleting its last LaTeX equation rows and columns. Since the points are weighted, the regression can be achieved by applying the principle of Weighted Least Squares to define the following optimization problem:

LaTeX equation

where, LaTeX equation being a column vector of LaTeX equation ones,

LaTeX equation

is the matrix of the first LaTeX equation standardized principal coordinates. Thus, for LaTeX equation,

LaTeX equation

It can be noted that

LaTeX equation

hence one can define the following LaTeX equation matrix of regression coefficients:

LaTeX equation

Matrix LaTeX equation can be analyzed by getting property RegressionCoefficients.

The correlations among the LaTeX equation-th active variable and the LaTeX equation standardized principal variables can thus be obtained as follows:

LaTeX equation

hence the following LaTeX equation matrix of correlations:

LaTeX equation

which is returned by property Correlations.

Bibliography
[1] Le Roux, B. and Rouanet, H., Geometric Data Analysis, From Correspondence Analysis to Structured Data Analysis, Kluwer, Dordrecht. (2004)
See Also