CategoricalDataSet Class |
Namespace: Novacta.Analytics
The CategoricalDataSet type exposes the following members.
Name | Description | |
---|---|---|
Data |
Gets the matrix of category codes in the CategoricalDataSet.
| |
ItemIndexCollection, IndexCollection |
Gets the information
in the CategoricalDataSet corresponding to
the specified individuals
and variables.
| |
ItemIndexCollection, Int32 |
Gets the information
in the CategoricalDataSet corresponding to the specified
individuals and variable.
| |
ItemIndexCollection, String |
Gets the information
in the CategoricalDataSet corresponding
to the specified individuals
and variables.
| |
ItemInt32, IndexCollection |
Gets the information
in the CategoricalDataSet corresponding to
the specified individual and variables.
| |
ItemInt32, Int32 |
Gets the information
in the CategoricalDataSet corresponding to
the specified individual and variable.
| |
ItemInt32, String |
Gets the information
in the CategoricalDataSet corresponding to the
specified individual and variables.
| |
ItemString, IndexCollection |
Gets the information
in the CategoricalDataSet corresponding to the
specified individuals and variables.
| |
ItemString, Int32 |
Gets the information
in the CategoricalDataSet corresponding to the
specified individuals
and variable.
| |
ItemString, String |
Gets the information
in the CategoricalDataSet corresponding to the
specified individuals
and variables.
| |
Name |
Gets or sets the name of the CategoricalDataSet.
| |
NumberOfColumns |
Gets the number of columns of this instance.
| |
NumberOfRows |
Gets the number of rows of this instance.
| |
Variables |
Gets the list of variables in the CategoricalDataSet.
|
Name | Description | |
---|---|---|
CategorizeByEntropyMinimization(String, Char, IndexCollection, Boolean, Int32, IFormatProvider) |
Discretizes numerical data from the stream underlying the specified file
by defining multiple intervals of the numerical data range.
Intervals are identified by minimizing the intra-interval entropy
of the specified target data.
| |
CategorizeByEntropyMinimization(TextReader, Char, IndexCollection, Boolean, Int32, IFormatProvider) |
Discretizes numerical data from the stream underlying the specified
text reader
by defining multiple intervals of the numerical data range.
Intervals are identified by minimizing the intra-interval
entropy of the specified target data.
| |
Decode |
Decodes the CategoricalDataSet.
| |
Disjoin |
Disjoins the data of the CategoricalDataSet.
| |
Disjoin(DoubleMatrix) |
Disjoins supplementary data.
| |
Encode(String, Char, IndexCollection, Boolean) |
Encodes categorical data from the specified file.
| |
Encode(TextReader, Char, IndexCollection, Boolean) |
Encodes categorical data from the stream underlying the
specified text reader.
| |
Encode(String, Char, IndexCollection, Boolean, DictionaryInt32, Categorizer, IFormatProvider) |
Encodes categorical or numerical data from the given file
applying specific data categorizers.
| |
Encode(TextReader, Char, IndexCollection, Boolean, DictionaryInt32, Categorizer, IFormatProvider) |
Encodes categorical or numerical data from the stream
underlying the specified text reader
applying specific data categorizers.
| |
Equals | Determines whether the specified object is equal to the current object. (Inherited from Object.) | |
Finalize | Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.) | |
FromEncodedData |
Initializes a new instance of the
CategoricalDataSet class
from previously encoded data.
| |
GetContingencyTable |
Gets the contingency table representing the
joint absolute frequency distribution
of the specified categorical variables.
| |
GetHashCode | Serves as the default hash function. (Inherited from Object.) | |
GetType | Gets the Type of the current instance. (Inherited from Object.) | |
MemberwiseClone | Creates a shallow copy of the current Object. (Inherited from Object.) | |
ToString | Returns a string that represents the current object. (Inherited from Object.) |
A dataset is composed by a set of categorical variables, whose list is returned by property Variables, and a matrix, returned by property Data, consisting of the data observed for such variables at a given collection of individuals. Each matrix column is associated to one of the categorical variables under study, while the rows of the matrix are associated to the individuals.
Instantiation
New instances of the CategoricalDataSet class can be initialized from previously encoded data, through method FromEncodedData(ListCategoricalVariable, DoubleMatrix), or by encoding a data source, see, for example, Encode(TextReader, Char, IndexCollection, Boolean, DictionaryInt32, Categorizer, IFormatProvider).
The source can contain information about categorical or numerical variables observed at a given instance. Encoding methods take into account numerical variables by delegating their discretization to special categorizers. If needed, categorizers can be identified by splitting the range of the numerical data into multiple intervals in order to minimize the intra-interval heterogeneity of the given target, see, for example, CategorizeByEntropyMinimization(TextReader, Char, IndexCollection, Boolean, Int32, IFormatProvider)
Reverting an encoding operation is provided by method Decode.
Parts of the CategoricalDataSet can be selected through indexers, see, for example, ItemIndexCollection, IndexCollection.
Disjunctive forms
Data about a categorical variable can be represented in disjunctive form by splitting the information for the variable in as many binary variables as the number of variable categories. The disjunctive representation of a CategoricalDataSet is returned by Disjoin. Supplementary data, i.e. data containing information about the same variables observed at different individuals, can be obtained by method Disjoin(DoubleMatrix).
Serialization
Categorical data sets can be represented as JSON strings, see JsonSerialization.