public class CategoricalDataSet : IReadOnlyTabularCollection<Category, CategoricalDataSet>
Public Class CategoricalDataSet
Implements IReadOnlyTabularCollection(Of Category, CategoricalDataSet)
public ref class CategoricalDataSet : IReadOnlyTabularCollection<Category^, CategoricalDataSet^>
type CategoricalDataSet =
class
interface IReadOnlyTabularCollection<Category, CategoricalDataSet>
end
A dataset is composed by a set of categorical variables, whose list is returned by property Variables, and a matrix, returned by property Data, consisting of the data observed for such variables at a given collection of individuals. Each matrix column is associated to one of the categorical variables under study, while the rows of the matrix are associated to the individuals.
Instantiation
New instances of the CategoricalDataSet class can be initialized from previously encoded data, through method FromEncodedData(ListCategoricalVariable, DoubleMatrix), or by encoding a data source, see, for example, Encode(TextReader, Char, IndexCollection, Boolean, DictionaryInt32, Categorizer, IFormatProvider).
The source can contain information about categorical or numerical variables observed at a given instance. Encoding methods take into account numerical variables by delegating their discretization to special categorizers. If needed, categorizers can be identified by splitting the range of the numerical data into multiple intervals in order to minimize the intra-interval heterogeneity of the given target, see, for example, CategorizeByEntropyMinimization(TextReader, Char, IndexCollection, Boolean, Int32, IFormatProvider)
Reverting an encoding operation is provided by method Decode.
Parts of the CategoricalDataSet can be selected through indexers, see, for example, ItemIndexCollection, IndexCollection.
Disjunctive forms
Data about a categorical variable can be represented in disjunctive form by splitting the information for the variable in as many binary variables as the number of variable categories. The disjunctive representation of a CategoricalDataSet is returned by Disjoin. Supplementary data, i.e. data containing information about the same variables observed at different individuals, can be obtained by method Disjoin(DoubleMatrix).
Serialization
Categorical data sets can be represented as JSON strings, see JsonSerialization.
Data | Gets the matrix of category codes in the CategoricalDataSet. |
ItemIndexCollection, IndexCollection | Gets the information in the CategoricalDataSet corresponding to the specified individuals and variables. |
ItemIndexCollection, Int32 | Gets the information in the CategoricalDataSet corresponding to the specified individuals and variable. |
ItemIndexCollection, String | Gets the information in the CategoricalDataSet corresponding to the specified individuals and variables. |
ItemInt32, IndexCollection | Gets the information in the CategoricalDataSet corresponding to the specified individual and variables. |
ItemInt32, Int32 | Gets the information in the CategoricalDataSet corresponding to the specified individual and variable. |
ItemInt32, String | Gets the information in the CategoricalDataSet corresponding to the specified individual and variables. |
ItemString, IndexCollection | Gets the information in the CategoricalDataSet corresponding to the specified individuals and variables. |
ItemString, Int32 | Gets the information in the CategoricalDataSet corresponding to the specified individuals and variable. |
ItemString, String | Gets the information in the CategoricalDataSet corresponding to the specified individuals and variables. |
Name | Gets or sets the name of the CategoricalDataSet. |
NumberOfColumns | Gets the number of columns of this instance. |
NumberOfRows | Gets the number of rows of this instance. |
Variables | Gets the list of variables in the CategoricalDataSet. |
CategorizeByEntropyMinimization(String, Char, IndexCollection, Boolean, Int32, IFormatProvider) | Discretizes numerical data from the stream underlying the specified file by defining multiple intervals of the numerical data range. Intervals are identified by minimizing the intra-interval entropy of the specified target data. |
CategorizeByEntropyMinimization(TextReader, Char, IndexCollection, Boolean, Int32, IFormatProvider) | Discretizes numerical data from the stream underlying the specified text reader by defining multiple intervals of the numerical data range. Intervals are identified by minimizing the intra-interval entropy of the specified target data. |
Decode | Decodes the CategoricalDataSet. |
Disjoin | Disjoins the data of the CategoricalDataSet. |
Disjoin(DoubleMatrix) | Disjoins supplementary data. |
Encode(String, Char, IndexCollection, Boolean) | Encodes categorical data from the specified file. |
Encode(TextReader, Char, IndexCollection, Boolean) | Encodes categorical data from the stream underlying the specified text reader. |
Encode(String, Char, IndexCollection, Boolean, DictionaryInt32, Categorizer, IFormatProvider) | Encodes categorical or numerical data from the given file applying specific numerical data categorizers. |
Encode(TextReader, Char, IndexCollection, Boolean, DictionaryInt32, Categorizer, IFormatProvider) | Encodes categorical or numerical data from the stream underlying the specified text reader applying specific numerical data categorizers. |
Equals | Determines whether the specified object is equal to the current object. (Inherited from Object) |
Finalize | Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object) |
FromEncodedData | Initializes a new instance of the CategoricalDataSet class from previously encoded data. |
GetContingencyTable | Gets the contingency table representing the joint absolute frequency distribution of the specified categorical variables. |
GetHashCode | Serves as the default hash function. (Inherited from Object) |
GetType | Gets the Type of the current instance. (Inherited from Object) |
MemberwiseClone | Creates a shallow copy of the current Object. (Inherited from Object) |
ToString | Returns a string that represents the current object. (Inherited from Object) |