CombinationOptimizationContext Class

Represents a Cross-Entropy context supporting the optimization of objective functions whose arguments are the combinations having the specified size available from a given set of items.

Definition

Namespace: Novacta.Analytics.Advanced
Assembly: Novacta.Analytics (in Novacta.Analytics.dll) Version: 2.1.0+428f3840cfab98dda567bb0ed350b302533e273a

C#

public sealed class CombinationOptimizationContext : SystemPerformanceOptimizationContext

VB

Public NotInheritable Class CombinationOptimizationContext
	Inherits SystemPerformanceOptimizationContext

C++

public ref class CombinationOptimizationContext sealed : public SystemPerformanceOptimizationContext

F#

[<SealedAttribute>]
type CombinationOptimizationContext = 
    class
        inherit SystemPerformanceOptimizationContext
    end

Inheritance: Object CrossEntropyContext SystemPerformanceOptimizationContext CombinationOptimizationContext

Remarks

Class CombinationOptimizationContext derives from SystemPerformanceOptimizationContext, and defines a Cross-Entropy context able to solve combinatorial optimization problems regarding the selection of fixed size subsets from a collection of items.

Class SystemPerformanceOptimizationContext thoroughly defines a system whose performance must be optimized. Class CombinationOptimizationContext specializes that system by assuming that its performance, say LaTeX equation , is defined on the set of combinations, having the specified size, available from a given collection of items.

The system's state-space LaTeX equation , i.e. the domain of , can thus be represented as the Cartesian product of copies of the set , where is the number of available items. An argument defines a combination by signaling that the -th item is included in the combination if LaTeX equation , otherwise setting .

If the combinations under study have fixed size equal to LaTeX equation , then each argument will have exactly nonzero entries.

A Cross-Entropy optimizer is designed to identify the optimal arguments at which the performance function of a complex system reaches its minimum or maximum value. To get the optimal state, the system's state-space LaTeX equation , i.e. the domain of , is traversed iteratively by sampling, at each iteration, from a specific density function, member of a parametric family

where is a possible argument of , and LaTeX equation is the set of allowable values for parameter . The parameter exploited at a given iteration is referred to as the reference parameter of such iteration and indicated as . A minimum number of iterations, say LaTeX equation , must be executed, while a number of them up to a maximum, say , is allowed.

Implementing a context for optimizing on combinations

The Cross-Entropy method provides an iterative multi step procedure. In the context of combinatorial optimization, at each iteration LaTeX equation a sampling step is executed in order to generate diverse candidate arguments of the objective function, sampled from a distribution characterized by the reference parameter of the iteration, say . Such sample is thus exploited in the updating step in which a new reference parameter LaTeX equation is identified to modify the distribution from which the samples will be obtained in the next iteration: such modification is executed in order to improve the probability of sampling relevant arguments, i.e. those arguments corresponding to the function values of interest (See the documentation of class CrossEntropyProgram for a thorough discussion of the Cross-Entropy method).

When the Cross-Entropy method is applied in an optimization context, a final optimizing step is executed, in which the argument corresponding to the searched extremum is effectively identified.

These steps have been implemented as follows.

Sampling step

In a CombinationOptimizationContext, the parametric family LaTeX equation is outlined as follows. Each component of an argument of is attached to a independent Bernoulli distribution having parameter , and is defined as the distribution of the sum of the corresponding Bernoulli trials, conditional to having exactly LaTeX equation successes. The Cross-Entropy sampling parameter can thus be represented as the vector .

The parametric space LaTeX equation should include a parameter under which all possible states must have a real chance of being selected: this parameter is specified as the initial reference parameter . A CombinationOptimizationContext defines LaTeX equation as a constant vector whose entries are all equal to .

Updating step

At iteration LaTeX equation , let us represent the sample drawn as , where is the Cross-Entropy sample size, and the -th sample point is the sequence . The parameter's updating formula is, for ,

where is the elite sample in this context, i.e. the set of sample points having the lowest performances observed during the LaTeX equation -th iteration, if minimizing, the highest ones, otherwise, while is its indicator function.

Applying a smoothing scheme to updated parameters

In a CombinationOptimizationContext, the sampling parameter is smoothed applying the following formula (See Rubinstein and Kroese, Remark 5.2, p. 189^[1]):

LaTeX equation

where .

Optimizing step

The optimizing step is executed after that the underlying Cross-Entropy program has converged. In a specified context, it is expected that, given a reference parameter LaTeX equation , a corresponding reasonable value could be guessed for the optimizing argument of , say , with a function from to . Function is defined by overriding method GetOptimalState(DoubleMatrix) that should return LaTeX equation given a specific reference parameter .

Given the optimal parameter (the parameter corresponding to the last iteration LaTeX equation executed by the algorithm before stopping),

the argument at which the searched extremum is considered as reached according to the Cross-Entropy method will be returned as follows. The probabilities are sorted in increasing order, say obtaining the following ordering:

LaTeX equation

and the corresponding sequence of indexes such that

is taken into account by defining the set and returning

where is one if ; zero otherwise. This is equivalent to include in the optimal combination those items having the greatest LaTeX equation probabilities in parameter .

Stopping criterion

A CombinationOptimizationContext never stops before executing a number of iterations less than MinimumNumberOfIterations, and always stops if such number is greater than or equal to MaximumNumberOfIterations.

For intermediate iterations, method StopAtIntermediateIteration(Int32, LinkedListDouble, LinkedListDoubleMatrix) is called to check if a Cross-Entropy program executing in this context should stop or not.

In a CombinationOptimizationContext, the method analyzes the currently updated reference parameter, say LaTeX equation as follows. If condition

can be verified, the method returns true; otherwise false is returned. Equivalently, the algorithm converges if the indexes of the largest probabilities coincide times in a row of iterations.

Instantiating a context for optimizing on combinations

At instantiation, the constructor of a CombinationOptimizationContext object will receive information about the optimization under study by means of parameters representing the objective function LaTeX equation , the combination constants and , the extremes of the allowed range of intermediate iterations, and , and a constant stating if the optimization goal is a maximization or a minimization. In addition, the smoothing parameter LaTeX equation is also passed to the constructor.

After construction, LaTeX equation and can be inspected, respectively, via properties MinimumNumberOfIterations and MaximumNumberOfIterations. The smoothing coefficient is also available via property ProbabilitySmoothingCoefficient. Combination constants LaTeX equation and are returned by StateDimension and CombinationDimension, respectively. In addition, property OptimizationGoal signals that the performance function must be maximized if it evaluates to the constant Maximization, or that a minimization is requested if it evaluates to the constant Minimization.

To evaluate the objective function LaTeX equation at a specific argument, one can call method Performance(DoubleMatrix) passing the argument as a parameter. It is expected that the objective function will accept a row vector having binary entries as a valid representation of an argument.

Example

In the following example, an existing partition of twelve items is explained by selecting two features out of the seven ones available in an artificial data set regarding the items under study.

The selection criterion is defined as the maximization of the Dunn Index in the domain of selectable sub data sets.

Selecting features from a data set to explain a given partition by Dunn Index maximization.

using System;
using Novacta.Analytics.Advanced;

namespace Novacta.Analytics.CodeExamples.Advanced
{
    public class CombinationOptimizationContextExample0  
    {
        public void Main()
        {
            // Set the number of items and features under study.
            const int numberOfItems = 12;
            int numberOfFeatures = 7;

            // Define a partition that must be explained.
            // Three parts (clusters) are included,
            // containing, respectively, items 0 to 3,
            // 4 to 7, and 8 to 11.
            var partition = IndexPartition.Create(
                new double[numberOfItems]
                    { 0 ,0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2 });

            // Create a matrix that will represent
            // an artificial data set,
            // having 12 items (rows) and 7 features (columns).
            // This will store the observations which
            // explanation will be based on.
            var data = DoubleMatrix.Dense(
                numberOfRows: numberOfItems,
                numberOfColumns: numberOfFeatures);

            // The first 5 features are built to be almost
            // surely non informative, since they result
            // as samples drawn from a same distribution.
            var g = new GaussianDistribution(mu: 0, sigma: .01);
            for (int j = 0; j < 5; j++)
            {
                data[":", j] = g.Sample(sampleSize: numberOfItems);
            }

            // Features 5 to 6 are instead built to be informative,
            // since they are sampled from different distributions
            // while filling rows whose indexes are in different parts
            // of the partition to be explained.
            var partIdentifiers = partition.Identifiers;
            double mu = 1.0;
            for (int i = 0; i < partIdentifiers.Count; i++)
            {
                var part = partition[partIdentifiers[i]];
                int partSize = part.Count;
                g.Mu = mu;
                data[part, 5] = g.Sample(sampleSize: partSize);
                mu += 2.0;
                g.Mu = mu;
                data[part, 6] = g.Sample(sampleSize: partSize);
                mu += 2.0;
            }

            Console.WriteLine("The data set:");
            Console.WriteLine(data);

            // Define the selection problem as
            // the maximization of the Dunn Index.
            double objectiveFunction(DoubleMatrix x)
            {
                // An argument x has entries equal to one,
                // signaling that the corresponding features 
                // are selected at x. Otherwise, the entries
                // are zero.
                IndexCollection selected = x.FindNonzero();

                double performance =
                    IndexPartition.DunnIndex(
                        data: data[":", selected],
                        partition: partition);

                return performance;
            }

            var optimizationGoal = OptimizationGoal.Maximization;

            // Define how many features must be selected
            // for explanation.
            int numberOfExplanatoryFeatures = 2;

            // Create the required context.
            var context = new CombinationOptimizationContext(
                objectiveFunction: objectiveFunction,
                stateDimension: numberOfFeatures,
                combinationDimension: numberOfExplanatoryFeatures,
                probabilitySmoothingCoefficient: .8,
                optimizationGoal: optimizationGoal,
                minimumNumberOfIterations: 3,
                maximumNumberOfIterations: 1000);

            // Create the optimizer.
            var optimizer = new SystemPerformanceOptimizer()
            {
                PerformanceEvaluationParallelOptions = { MaxDegreeOfParallelism = -1 },
                SampleGenerationParallelOptions = { MaxDegreeOfParallelism = -1 }
            };

            // Set optimization parameters.
            double rarity = 0.01;
            int sampleSize = 1000;

            // Solve the problem.
            var results = optimizer.Optimize(
                context,
                rarity,
                sampleSize);

            IndexCollection optimalExplanatoryFeatureIndexes =
                results.OptimalState.FindNonzero();

            // Show the results.
            Console.WriteLine(
                "The Cross-Entropy optimizer has converged: {0}.",
                results.HasConverged);

            Console.WriteLine();
            Console.WriteLine("Initial guess parameter:");
            Console.WriteLine(context.InitialParameter);

            Console.WriteLine();
            Console.WriteLine("The maximizer of the performance is:");
            Console.WriteLine(results.OptimalState);

            Console.WriteLine();
            Console.WriteLine(
                "The {0} features best explaining the given partition have column indexes:",
                numberOfExplanatoryFeatures);
            Console.WriteLine(optimalExplanatoryFeatureIndexes);

            Console.WriteLine();
            Console.WriteLine("The maximum performance is:");
            Console.WriteLine(results.OptimalPerformance);

            Console.WriteLine();
            Console.WriteLine("This is the Dunn Index for the selected features:");
            var di = IndexPartition.DunnIndex(
                data[":", optimalExplanatoryFeatureIndexes],
                partition);
            Console.WriteLine(di);
        }
    }
}

// Executing method Main() produces the following output:
// 
// The data set:
// 0.00443412894    0.00269053161    0.00413587909    -0.00765022956   -0.0051623096    1.00663787       3.01053155       
// -0.0020667716    0.0208840726     -0.00323082939   -0.00939014629   0.00144991289    0.999318094      3.01264231       
// 0.0115714824     0.00980880506    0.0049017337     0.0032788575     0.0157818958     0.990821676      3.01207396       
// -0.0156854206    -0.00757566325   -0.00972832587   -0.00217925896   0.0107421303     0.992541729      2.99695621       
// 0.00220674309    -0.00321077807   -0.00611898588   0.00720305795    0.0128767272     4.99440474       6.99892958       
// -0.00637438186   0.0050524291     -0.00409270388   0.00210944391    -0.0152463979    4.9974367        6.99460151       
// -0.00662648189   -0.0149292848    0.00236975764    0.0103282088     -0.0108846478    4.99249371       6.98860335       
// -0.0219354055    0.0122820889     0.0109569101     -0.0108910035    0.00275269082    5.02268395       6.99732006       
// -0.00172760644   0.000890969086   -0.0121749938    -0.0060896535    -0.0125774475    9.00956698       10.9938497       
// 0.0157657881     0.0084084921     0.00295384059    -0.00358519595   0.00447359708    8.98856241       11.0013196       
// 0.0129253424     -0.000948574239  0.00235032211    -0.0135124599    -0.0233090879    9.00738398       10.9891406       
// -0.0084813634    -0.00459883432   -0.0148632861    0.0223964957     -0.00259506386   9.00721897       11.005672        
// 
// 
// The Cross-Entropy optimizer has converged: True.
// 
// Initial guess parameter:
// 0.5              0.5              0.5              0.5              0.5              0.5              0.5              
// 
// 
// 
// The maximizer of the performance is:
// 0                0                0                0                0                1                1                
// 
// 
// 
// The 2 features best explaining the given partition have column indexes:
// 5, 6
// 
// The maximum performance is:
// 179.2086690301557
// 
// This is the Dunn Index for the selected features:
// 179.2086690301557

Constructors

CombinationOptimizationContext

Initializes a new instance of the CombinationOptimizationContext class aimed to optimize the specified objective function, with the given optimization goal, range of iterations, and probability smoothing coefficient.

Properties

CombinationDimension	Gets the dimension of a combination represented by a system's state when a CrossEntropyProgram executes in this context.
EliteSampleDefinition	Gets the elite sample definition for this context. (Inherited from SystemPerformanceOptimizationContext)
InitialParameter	Gets the parameter initially exploited to sample from the state-space of the system defined by this context. (Inherited from CrossEntropyContext)
MaximumNumberOfIterations	Gets the maximum number of iterations allowed by this context. (Inherited from SystemPerformanceOptimizationContext)
MinimumNumberOfIterations	Gets the minimum number of iterations required by this context. (Inherited from SystemPerformanceOptimizationContext)
OptimizationGoal	Gets a constant specifying if the performance function in this context must be minimized or maximized. (Inherited from SystemPerformanceOptimizationContext)
ProbabilitySmoothingCoefficient	Gets the coefficient that defines the smoothing scheme for the probabilities of the Cross-Entropy parameters exploited by this context.
StateDimension	Gets or sets the dimension of a vector representing a system's state when a CrossEntropyProgram executes in this context. (Inherited from CrossEntropyContext)
TraceExecution	Gets or sets a value indicating whether the execution of this context must be traced. (Inherited from CrossEntropyContext)

Methods

Equals	Determines whether the specified object is equal to the current object. (Inherited from Object)
Finalize	Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object)
GetHashCode	Serves as the default hash function. (Inherited from Object)
GetOptimalState	Gets the argument that optimizes the objective function in this context, according to the specified Cross-Entropy sampling parameter. (Overrides SystemPerformanceOptimizationContextGetOptimalState(DoubleMatrix))
GetType	Gets the Type of the current instance. (Inherited from Object)
MemberwiseClone	Creates a shallow copy of the current Object. (Inherited from Object)
OnExecutedIteration	Called after completion of each iteration of a CrossEntropyProgram executing in this context. (Overrides SystemPerformanceOptimizationContextOnExecutedIteration(Int32, DoubleMatrix, LinkedListDouble, LinkedListDoubleMatrix))
PartialSample	Draws the specified subset of a sample from a distribution characterized by the given parameter, using the stated random number generator. Used when executing the sampling step of a CrossEntropyProgram running in this context. (Overrides CrossEntropyContextPartialSample(Double, TupleInt32, Int32, RandomNumberGenerator, DoubleMatrix, Int32))
Performance	Computes the objective function at a specified argument as the performance defined in this context. (Overrides CrossEntropyContextPerformance(DoubleMatrix))
SmoothParameter	Provides the smoothing of the updated sampling parameter of a SystemPerformanceOptimizer executing in this context. (Overrides SystemPerformanceOptimizationContextSmoothParameter(LinkedListDoubleMatrix))
StopAtIntermediateIteration	Specifies conditions under which a SystemPerformanceOptimizer executing in this context should be considered as terminated after completing an intermediate iteration. (Overrides SystemPerformanceOptimizationContextStopAtIntermediateIteration(Int32, LinkedListDouble, LinkedListDoubleMatrix))
StopExecution	Specifies conditions under which a CrossEntropyProgram executing in this context should be considered as terminated. (Inherited from SystemPerformanceOptimizationContext)
ToString	Returns a string that represents the current object. (Inherited from Object)
UpdateLevel	Updates the performance level for the current iteration of a CrossEntropyProgram executing in this context and determines the corresponding elite sample. (Inherited from SystemPerformanceOptimizationContext)
UpdateParameter	Updates the sampling parameter attending the generation of the sample in the next iteration of a CrossEntropyProgram executing in this context. (Overrides CrossEntropyContextUpdateParameter(LinkedListDoubleMatrix, DoubleMatrix))

Bibliography

[1] Rubinstein, R.Y. and Kroese, D.P., The Cross-Entropy Method, A unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning, Springer, New York. (2004)