PrincipalProjections Class

Represents the principal projections of a set of multidimensional, weighted points, and supports their elaboration in terms of directions, coordinates, variances, and point contributions to the overall variability.

Definition

Namespace: Novacta.Analytics
Assembly: Novacta.Analytics (in Novacta.Analytics.dll) Version: 2.1.0+428f3840cfab98dda567bb0ed350b302533e273a

C#

public class PrincipalProjections

VB

Public Class PrincipalProjections

C++

public ref class PrincipalProjections

F#

type PrincipalProjections = class end

Inheritance: Object PrincipalProjections

Derived: Novacta.AnalyticsPrincipalComponents

Remarks

Multidimensional weighted points and their statistics

Let us consider LaTeX equation points in . Their statistical characteristics are usually analyzed by imposing a weighting scheme to them, that is, a relative weight (possibly elementary, i.e. ) is assigned to point , and controls how the point contributes to the overall statistics, provided that the weights sum up to LaTeX equation . Such set of weighted points can thus be expressed by means of the pair , referred to as a weighted multidimensional structure where

is the matrix whose -th row is (the transpose of LaTeX equation ), and is the weighting scheme expressed as a sequence:

Given a basis LaTeX equation of , a structure can be represented by a cloud, say , which can be formally defined as the triplet , where

is the coordinates matrix w.r.t. of the points in , i.e., its -th row, LaTeX equation , stands for the coordinates of point .

The basic statistics of LaTeX equation can be defined as follows. Its mean point, the vector , whose coordinates are given by , is

Furthermore, the variance of , say , is defined as follows:

LaTeX equation

where

is the distance induced by basis via its norm

and scalar product

where and

The columns of are referred to as the active variables observed at the LaTeX equation points w.r.t. basis . The covariance matrix of such variables can be defined as follows,

where

Notice how the variance of LaTeX equation can be characterized in terms of the covariances w.r.t. to ,

with being the trace operator.

Principal projections

A set of weighted points in a vector space can be represented by different clouds, simply by selecting different bases for that space, but the basic statistics of LaTeX equation , its mean and variance constants, are the same irrespective of the cloud selected to analyze . On the contrary, the covariances are not coordinate-free, i.e. they depend on the basis chosen to represent the points in . As a consequence, a question raises about the choice of the basis to apply when observing the data and trying to analyze their variability. Is it possible to select a basis that enhances our comprehension of the variance of LaTeX equation , better figuring out how it can be explained given the available evidence?

A possible approach is that of seeking a basis such that the variables in the corresponding cloud become uncorrelated, so simplifying the interpretation of LaTeX equation . In addition, if is a high-dimensional structure, another useful goal would be that of approximating the cloud in a lower dimensional space, defined so as to minimize the lost of information due to a representation of obtained in a reduced space. Such tasks can be approached by projecting the points in LaTeX equation along its principal directions^[1].

When projecting the LaTeX equation points along a direction in , the collection of points resulting from the projection defines a new projected structure, whose variance can contribute to the explanation of . The first principal direction of , say LaTeX equation , is thus selected so that the variance of its projected structure, say , is maximal. With a first projected structure, there is associated a residual structure, say , whose points have a variance that is the difference between LaTeX equation and , i.e. it represents a measure of the variability not yet explained exploiting the first principal direction. If such a measure is not considered sufficiently low, one determines the second principal direction of LaTeX equation , say , obtained by maximizing the variance of under the constraint that is orthogonal to . From the second residual structure , one proceeds in the same way to determine a third principal direction, perpendicular to the previous ones, and so on, until the determination of, say, LaTeX equation principal directions able to guarantee the required explanation of the variance.

Instantiation

The cloud LaTeX equation exploited to initially represent the structure is referred to as the active cloud. Given a Cloud instance representing , a corresponding PrincipalProjections object can be obtained by calling on it method GetPrincipalProjections.

Projecting points in lower dimensional spaces characterizes several statistical multidimensional methods, such as the Principal Components Analysis, the Correspondence, and the Multiple Correspondence Analyses. All these classes provides methods to create the required PrincipalProjections instances, and properties to access them.

Once created, the PrincipalProjections instance enables the access to the initial cloud via its property ActiveCloud.

The identification of the principal directions of LaTeX equation is provided by the eigendecomposition of matrix . In fact, let be the matrix whose columns are the normalized eigenvectors of w.r.t. basis , that is satisfies the conditions

LaTeX equation

and

where is the diagonal matrix whose entries are the corresponding eigenvalues of in decreasing order. Then the principal directions of satisfy

The rows of LaTeX equation can be interpreted as the coordinates w.r.t. basis of the basis, say , whose matrix is given by

and the points in admit the following coordinates matrix w.r.t. :

This argument suggests that a principal cloud of LaTeX equation can be defined as the cloud .

Approximations in lower dimensional spaces

The dimension of the active cloud is equal to that of the corresponding principal one. However, often not all the LaTeX equation principal variables are taken into account: by keeping only some of the first ones, say , a dimensionality reduction can be achieved, while simultaneously preserving - as much as possible - the original variance of the points in LaTeX equation .

A PrincipalProjections instance reports information about such LaTeX equation variables. First of all, is returned by property NumberOfDirections. Additional insights are exposed as follows.

Variance breakdowns

The covariance matrix of a principal cloud is characterized as follows:

LaTeX equation

hence the variance of can be factorized as follows:

A finer factorization of LaTeX equation is obtained by taking into account the specific contributions of each point in to the overall variance. Remember that the active and principal clouds represent the same weighted structure , hence the distances among cloud points and the corresponding means are also preserved. Thus one has

LaTeX equation

The quantity

can thus be interpreted as the amount of the the -th principal variable's variance due to the -th point of the cloud, since

Such values can also be exploited to supply aids to the interpretation of the principal cloud. In particular, the relative contribution of the LaTeX equation -th point to the variance of the -th principal variable is defined as the quantity

which is the generic entry of the matrix returned by property Contributions, while the quality of representation of the -th point of LaTeX equation on the -th principal direction as

where is the angle between the vectors and . You can inspect the squared cosines by getting property RepresentationQualities.

Note

Directions are added until the corresponding projected variance is greater than 1e-6.

Relationships between active and principal variables

Given an active cloud LaTeX equation and a corresponding principal cloud , one can regress the active variables on the first standardized principal variables as follows. Let be the -th column of , i.e., the -th active variable; furthermore, define LaTeX equation as the matrix representing the first columns of , and as the sub-matrix of given by deleting its last rows and columns. Since the points are weighted, the regression can be achieved by applying the principle of Weighted Least Squares to define the following optimization problem:

LaTeX equation

where, being a column vector of ones,

is the matrix of the first standardized principal coordinates. Thus, for ,

It can be noted that

hence one can define the following LaTeX equation matrix of regression coefficients:

Matrix can be analyzed by getting property RegressionCoefficients.

The correlations among the LaTeX equation -th active variable and the standardized principal variables can thus be obtained as follows:

hence the following matrix of correlations:

which is returned by property Correlations.

Properties

ActiveCloud	Gets the active cloud of this instance.
Contributions	Gets the relative contributions of the projected points to the variances of the principal variables.
Coordinates	Gets the principal coordinates of the projected points.
Correlations	Gets the correlations among the active variables and the standardized principal variables.
Directions	Gets the coordinates of the principal directions w.r.t. the basis of the ActiveCloud.
NumberOfDirections	Gets the number of principal directions.
RegressionCoefficients	Gets the coefficients of the regression of each active variable on the standardized principal variables.
RepresentationQualities	Gets the point representation qualities on each principal direction.
Variances	Gets the variances of the principal variables.

Methods

CorrelateSupplementaryVariables(DoubleMatrix)	Gets the correlations of each specified supplementary variable on the standardized principal variables.
CorrelateSupplementaryVariables(ReadOnlyDoubleMatrix)	Gets the correlations of each specified supplementary variable on the standardized principal variables.
Equals	Determines whether the specified object is equal to the current object. (Inherited from Object)
Finalize	Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object)
GetHashCode	Serves as the default hash function. (Inherited from Object)
GetType	Gets the Type of the current instance. (Inherited from Object)
LocateSupplementaryPoints(DoubleMatrix)	Gets the principal coordinates of the specified supplementary points given their active coordinates.
LocateSupplementaryPoints(ReadOnlyDoubleMatrix)	Gets the principal coordinates of the specified supplementary points given their active coordinates.
MemberwiseClone	Creates a shallow copy of the current Object. (Inherited from Object)
RegressSupplementaryVariables(DoubleMatrix)	Gets the coefficients of the regression of each specified supplementary variable on the standardized principal variables.
RegressSupplementaryVariables(ReadOnlyDoubleMatrix)	Gets the coefficients of the regression of each specified supplementary variable on the standardized principal variables.
ToString	Returns a string that represents the current object. (Inherited from Object)

Bibliography

[1] Le Roux, B. and Rouanet, H., Geometric Data Analysis, From Correspondence Analysis to Structured Data Analysis, Kluwer, Dordrecht. (2004)

PrincipalProjections Class

Definition

Remarks

Properties

Methods

Bibliography

See Also

Reference

Other Resources