ClustersExplain Method |
Namespace: Novacta.Analytics
public static IndexCollection Explain( DoubleMatrix data, IndexPartition<double> partition, int numberOfExplanatoryFeatures )
Exception | Condition |
---|---|
ArgumentNullException | data is null. -or- partition is null. |
ArgumentOutOfRangeException | numberOfExplanatoryFeatures is not positive. |
ArgumentException | numberOfExplanatoryFeatures is not less than
the number of columns in data. -or- A part in partition contains a position which is not valid as a row index of data. |
Method Explain(DoubleMatrix, IndexPartitionDouble, Int32) selects the specified numberOfExplanatoryFeatures from the given data, by minimizing the Davies-Bouldin Index corresponding to the partition of the items under study.
This method uses a default Cross-Entropy context of type CombinationOptimizationContext to identify the optimal features. If different selection criteria need to be applied, or extra control on the parameters of the underlying algorithm is required, a specialized CombinationOptimizationContext can be can be instantiated and hence exploited executing method Optimize on a SystemPerformanceOptimizer object. See the documentation about CombinationOptimizationContext for additional examples.
In the following example, an existing partition of 12 items is explained by selecting 2 features out of the seven ones available in an artificial data set regarding the items under study.
using System; namespace Novacta.Analytics.CodeExamples { public class ClustersExplainExample0 { public void Main() { // Set the number of items and features under study. const int numberOfItems = 12; int numberOfFeatures = 7; // Define a partition that must be explained. // Three parts (clusters) are included, // containing, respectively, items 0 to 3, // 5 to 8, and 9 to 11. var partition = IndexPartition.Create( new double[numberOfItems] { 0 ,0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2 }); // Create a matrix that will represent // an artificial data set, // having 12 items (rows) and 7 features (columns). // This will store the observations which // explanation will be based on. var data = DoubleMatrix.Dense( numberOfRows: numberOfItems, numberOfColumns: numberOfFeatures); // The first 5 features are built to be almost // surely non informative, since they result // as samples drawn from a same distribution. var g = new GaussianDistribution(mu: 0, sigma: .01); for (int j = 0; j < 5; j++) { data[":", j] = g.Sample(sampleSize: numberOfItems); } // Features 5 to 6 are instead built to be informative, // since they are sampled from different distributions // while filling rows whose indexes are in different parts // of the partition to be explained. var partIdentifiers = partition.Identifiers; double mu = 1.0; for (int i = 0; i < partIdentifiers.Count; i++) { var part = partition[partIdentifiers[i]]; int partSize = part.Count; g.Mu = mu; data[part, 5] = g.Sample(sampleSize: partSize); mu += 2.0; g.Mu = mu; data[part, 6] = g.Sample(sampleSize: partSize); mu += 2.0; } Console.WriteLine("The data set:"); Console.WriteLine(data); // Define how many features must be selected // for explanation. int numberOfExplanatoryFeatures = 2; // Select the best features. IndexCollection optimalExplanatoryFeatureIndexes = Clusters.Explain( data, partition, numberOfExplanatoryFeatures); // Show the results. Console.WriteLine(); Console.WriteLine( "The {0} features best explaining the given partition have column indexes:", numberOfExplanatoryFeatures); Console.WriteLine(optimalExplanatoryFeatureIndexes); Console.WriteLine(); Console.WriteLine("The Davies-Bouldin Index for the selected features:"); var dbi = IndexPartition.DaviesBouldinIndex( data[":", optimalExplanatoryFeatureIndexes], partition); Console.WriteLine(dbi); } } } // Executing method Main() produces the following output: // // The data set: // 0.00443412891 0.00269053161 0.00413587912 -0.00765022961 -0.00516230961 1.00663787 3.01053155 // -0.00206677161 0.0208840727 -0.00323082941 -0.00939014629 0.00144991289 0.999318094 3.01264231 // 0.0115714825 0.00980880513 0.00490173372 0.00327885751 0.0157818959 0.990821676 3.01207396 // -0.0156854205 -0.00757566326 -0.00972832587 -0.00217925897 0.0107421304 0.992541729 2.99695621 // 0.0022067431 -0.00321077809 -0.00611898592 0.00720305793 0.0128767272 4.99440474 6.99892958 // -0.00637438188 0.00505242911 -0.0040927039 0.00210944391 -0.0152463979 4.9974367 6.99460151 // -0.00662648185 -0.0149292848 0.00236975765 0.0103282087 -0.0108846478 4.99249371 6.98860335 // -0.0219354054 0.012282089 0.01095691 -0.0108910034 0.00275269084 5.02268395 6.99732006 // -0.00172760645 0.000890969089 -0.0121749937 -0.0060896535 -0.0125774475 9.00956698 10.9938497 // 0.0157657881 0.00840849213 0.00295384061 -0.00358519597 0.00447359706 8.98856241 11.0013196 // 0.0129253424 -0.000948574239 0.00235032211 -0.0135124598 -0.023309088 9.00738398 10.9891406 // -0.00848136345 -0.00459883434 -0.0148632861 0.0223964956 -0.00259506386 9.00721897 11.005672 // // // // The 2 features best explaining the given partition have column indexes: // 5, 6 // // The Davies-Bouldin Index for the selected features: // 0.003700083150008252