Click or drag to resize

CategoricalDataSet Class

Represents a collection of observations for a set of categorical variables
Inheritance Hierarchy
SystemObject
  Novacta.AnalyticsCategoricalDataSet

Namespace:  Novacta.Analytics
Assembly:  Novacta.Analytics (in Novacta.Analytics.dll) Version: 2.0.0
Syntax
public class CategoricalDataSet : IReadOnlyTabularCollection<Category, CategoricalDataSet>

The CategoricalDataSet type exposes the following members.

Properties
  NameDescription
Public propertyData
Gets the matrix of category codes in the CategoricalDataSet.
Public propertyItemIndexCollection, IndexCollection
Gets the information in the CategoricalDataSet corresponding to the specified individuals and variables.
Public propertyItemIndexCollection, Int32
Gets the information in the CategoricalDataSet corresponding to the specified individuals and variable.
Public propertyItemIndexCollection, String
Gets the information in the CategoricalDataSet corresponding to the specified individuals and variables.
Public propertyItemInt32, IndexCollection
Gets the information in the CategoricalDataSet corresponding to the specified individual and variables.
Public propertyItemInt32, Int32
Gets the information in the CategoricalDataSet corresponding to the specified individual and variable.
Public propertyItemInt32, String
Gets the information in the CategoricalDataSet corresponding to the specified individual and variables.
Public propertyItemString, IndexCollection
Gets the information in the CategoricalDataSet corresponding to the specified individuals and variables.
Public propertyItemString, Int32
Gets the information in the CategoricalDataSet corresponding to the specified individuals and variable.
Public propertyItemString, String
Gets the information in the CategoricalDataSet corresponding to the specified individuals and variables.
Public propertyName
Gets or sets the name of the CategoricalDataSet.
Public propertyNumberOfColumns
Gets the number of columns of this instance.
Public propertyNumberOfRows
Gets the number of rows of this instance.
Public propertyVariables
Gets the list of variables in the CategoricalDataSet.
Top
Methods
  NameDescription
Public methodStatic memberCode exampleCategorizeByEntropyMinimization(String, Char, IndexCollection, Boolean, Int32, IFormatProvider)
Discretizes numerical data from the stream underlying the specified file by defining multiple intervals of the numerical data range. Intervals are identified by minimizing the intra-interval entropy of the specified target data.
Public methodStatic memberCode exampleCategorizeByEntropyMinimization(TextReader, Char, IndexCollection, Boolean, Int32, IFormatProvider)
Discretizes numerical data from the stream underlying the specified text reader by defining multiple intervals of the numerical data range. Intervals are identified by minimizing the intra-interval entropy of the specified target data.
Public methodDecode
Decodes the CategoricalDataSet.
Public methodDisjoin
Disjoins the data of the CategoricalDataSet.
Public methodDisjoin(DoubleMatrix)
Disjoins supplementary data.
Public methodStatic memberCode exampleEncode(String, Char, IndexCollection, Boolean)
Encodes categorical data from the specified file.
Public methodStatic memberCode exampleEncode(TextReader, Char, IndexCollection, Boolean)
Encodes categorical data from the stream underlying the specified text reader.
Public methodStatic memberCode exampleEncode(String, Char, IndexCollection, Boolean, DictionaryInt32, Categorizer, IFormatProvider)
Encodes categorical or numerical data from the given file applying specific data categorizers.
Public methodStatic memberCode exampleEncode(TextReader, Char, IndexCollection, Boolean, DictionaryInt32, Categorizer, IFormatProvider)
Encodes categorical or numerical data from the stream underlying the specified text reader applying specific data categorizers.
Public methodEquals
Determines whether the specified object is equal to the current object.
(Inherited from Object.)
Protected methodFinalize
Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object.)
Public methodStatic memberFromEncodedData
Initializes a new instance of the CategoricalDataSet class from previously encoded data.
Public methodCode exampleGetContingencyTable
Gets the contingency table representing the joint absolute frequency distribution of the specified categorical variables.
Public methodGetHashCode
Serves as the default hash function.
(Inherited from Object.)
Public methodGetType
Gets the Type of the current instance.
(Inherited from Object.)
Protected methodMemberwiseClone
Creates a shallow copy of the current Object.
(Inherited from Object.)
Public methodToString
Returns a string that represents the current object.
(Inherited from Object.)
Top
Remarks

A dataset is composed by a set of categorical variables, whose list is returned by property Variables, and a matrix, returned by property Data, consisting of the data observed for such variables at a given collection of individuals. Each matrix column is associated to one of the categorical variables under study, while the rows of the matrix are associated to the individuals.

Instantiation

New instances of the CategoricalDataSet class can be initialized from previously encoded data, through method FromEncodedData(ListCategoricalVariable, DoubleMatrix), or by encoding a data source, see, for example, Encode(TextReader, Char, IndexCollection, Boolean, DictionaryInt32, Categorizer, IFormatProvider).

The source can contain information about categorical or numerical variables observed at a given instance. Encoding methods take into account numerical variables by delegating their discretization to special categorizers. If needed, categorizers can be identified by splitting the range of the numerical data into multiple intervals in order to minimize the intra-interval heterogeneity of the given target, see, for example, CategorizeByEntropyMinimization(TextReader, Char, IndexCollection, Boolean, Int32, IFormatProvider)

Reverting an encoding operation is provided by method Decode.

Parts of the CategoricalDataSet can be selected through indexers, see, for example, ItemIndexCollection, IndexCollection.

Disjunctive forms

Data about a categorical variable can be represented in disjunctive form by splitting the information for the variable in as many binary variables as the number of variable categories. The disjunctive representation of a CategoricalDataSet is returned by Disjoin. Supplementary data, i.e. data containing information about the same variables observed at different individuals, can be obtained by method Disjoin(DoubleMatrix).

Serialization

Categorical data sets can be represented as JSON strings, see JsonSerialization.

See Also