CategoricalDataSet Class

Represents a collection of observations for a set of categorical variables

Definition

Namespace: Novacta.Analytics
Assembly: Novacta.Analytics (in Novacta.Analytics.dll) Version: 2.1.0+428f3840cfab98dda567bb0ed350b302533e273a
C#
public class CategoricalDataSet : IReadOnlyTabularCollection<Category, CategoricalDataSet>
Inheritance
Object    CategoricalDataSet
Implements
IReadOnlyTabularCollectionCategory, CategoricalDataSet

Remarks

A dataset is composed by a set of categorical variables, whose list is returned by property Variables, and a matrix, returned by property Data, consisting of the data observed for such variables at a given collection of individuals. Each matrix column is associated to one of the categorical variables under study, while the rows of the matrix are associated to the individuals.

Instantiation

New instances of the CategoricalDataSet class can be initialized from previously encoded data, through method FromEncodedData(ListCategoricalVariable, DoubleMatrix), or by encoding a data source, see, for example, Encode(TextReader, Char, IndexCollection, Boolean, DictionaryInt32, Categorizer, IFormatProvider).

The source can contain information about categorical or numerical variables observed at a given instance. Encoding methods take into account numerical variables by delegating their discretization to special categorizers. If needed, categorizers can be identified by splitting the range of the numerical data into multiple intervals in order to minimize the intra-interval heterogeneity of the given target, see, for example, CategorizeByEntropyMinimization(TextReader, Char, IndexCollection, Boolean, Int32, IFormatProvider)

Reverting an encoding operation is provided by method Decode.

Parts of the CategoricalDataSet can be selected through indexers, see, for example, ItemIndexCollection, IndexCollection.

Disjunctive forms

Data about a categorical variable can be represented in disjunctive form by splitting the information for the variable in as many binary variables as the number of variable categories. The disjunctive representation of a CategoricalDataSet is returned by Disjoin. Supplementary data, i.e. data containing information about the same variables observed at different individuals, can be obtained by method Disjoin(DoubleMatrix).

Serialization

Categorical data sets can be represented as JSON strings, see JsonSerialization.

Properties

Data Gets the matrix of category codes in the CategoricalDataSet.
ItemIndexCollection, IndexCollection Gets the information in the CategoricalDataSet corresponding to the specified individuals and variables.
ItemIndexCollection, Int32 Gets the information in the CategoricalDataSet corresponding to the specified individuals and variable.
ItemIndexCollection, String Gets the information in the CategoricalDataSet corresponding to the specified individuals and variables.
ItemInt32, IndexCollection Gets the information in the CategoricalDataSet corresponding to the specified individual and variables.
ItemInt32, Int32 Gets the information in the CategoricalDataSet corresponding to the specified individual and variable.
ItemInt32, String Gets the information in the CategoricalDataSet corresponding to the specified individual and variables.
ItemString, IndexCollection Gets the information in the CategoricalDataSet corresponding to the specified individuals and variables.
ItemString, Int32 Gets the information in the CategoricalDataSet corresponding to the specified individuals and variable.
ItemString, String Gets the information in the CategoricalDataSet corresponding to the specified individuals and variables.
Name Gets or sets the name of the CategoricalDataSet.
NumberOfColumns Gets the number of columns of this instance.
NumberOfRows Gets the number of rows of this instance.
Variables Gets the list of variables in the CategoricalDataSet.

Methods

CategorizeByEntropyMinimization(String, Char, IndexCollection, Boolean, Int32, IFormatProvider) Discretizes numerical data from the stream underlying the specified file by defining multiple intervals of the numerical data range. Intervals are identified by minimizing the intra-interval entropy of the specified target data.
CategorizeByEntropyMinimization(TextReader, Char, IndexCollection, Boolean, Int32, IFormatProvider) Discretizes numerical data from the stream underlying the specified text reader by defining multiple intervals of the numerical data range. Intervals are identified by minimizing the intra-interval entropy of the specified target data.
Decode Decodes the CategoricalDataSet.
Disjoin Disjoins the data of the CategoricalDataSet.
Disjoin(DoubleMatrix) Disjoins supplementary data.
Encode(String, Char, IndexCollection, Boolean) Encodes categorical data from the specified file.
Encode(TextReader, Char, IndexCollection, Boolean) Encodes categorical data from the stream underlying the specified text reader.
Encode(String, Char, IndexCollection, Boolean, DictionaryInt32, Categorizer, IFormatProvider) Encodes categorical or numerical data from the given file applying specific numerical data categorizers.
Encode(TextReader, Char, IndexCollection, Boolean, DictionaryInt32, Categorizer, IFormatProvider) Encodes categorical or numerical data from the stream underlying the specified text reader applying specific numerical data categorizers.
EqualsDetermines whether the specified object is equal to the current object.
(Inherited from Object)
FinalizeAllows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object)
FromEncodedData Initializes a new instance of the CategoricalDataSet class from previously encoded data.
GetContingencyTable Gets the contingency table representing the joint absolute frequency distribution of the specified categorical variables.
GetHashCodeServes as the default hash function.
(Inherited from Object)
GetTypeGets the Type of the current instance.
(Inherited from Object)
MemberwiseCloneCreates a shallow copy of the current Object.
(Inherited from Object)
ToStringReturns a string that represents the current object.
(Inherited from Object)

See Also