CategoricalDataSetEncode(TextReader, Char, IndexCollection, Boolean) Method

Encodes categorical data from the stream underlying the specified text reader.

Definition

Namespace: Novacta.Analytics
Assembly: Novacta.Analytics (in Novacta.Analytics.dll) Version: 2.1.0+428f3840cfab98dda567bb0ed350b302533e273a
C#
public static CategoricalDataSet Encode(
	TextReader reader,
	char columnDelimiter,
	IndexCollection extractedColumns,
	bool firstLineContainsVariableNames
)

Parameters

reader  TextReader
The reader having access to the data stream.
columnDelimiter  Char
The delimiter used to separate columns in data lines.
extractedColumns  IndexCollection
The zero-based indexes of the columns from which data are to be extracted.
firstLineContainsVariableNames  Boolean
If set to true signals that the first line contains variable names.

Return Value

CategoricalDataSet
The dataset containing information about the streamed data.

Remarks

Each line from the stream is interpreted as the information about variables observed at a given instance. A line is split in tokens, each corresponding to a (zero-based) column, which in turn stores the data of a given variable. Columns are assumed to be separated each other by the character passed as columnDelimiter. Data from a variable are extracted only if the corresponding column index is in the collection extractedColumns.

Data are encoded applying the InvariantCulture.

Example

In the following example, a data stream is read to encode a categorical dataset. The stream contains data corresponding to two categorical variables.

Encoding a categorical dataset from a stream containing categorical data
using System;
using System.IO;

namespace Novacta.Analytics.CodeExamples
{
    public class CategoricalEncodeExample2  
    {
        public void Main()
        {
            // Create a data stream.
            string[] data = [
            "COLOR,NUMBER",
            "Red,Negative",
            "Green,Zero",
            "Red,Negative",
            "Black,Negative",
            "Black,Positive" ];

            MemoryStream stream = new();
            StreamWriter writer = new(stream);
            for (int i = 0; i < data.Length; i++) {
                writer.WriteLine(data[i].ToCharArray());
                writer.Flush();
            }
            stream.Position = 0;

            // Encode the categorical data set.
            StreamReader streamReader = new(stream);
            char columnDelimiter = ',';
            IndexCollection extractedColumns = IndexCollection.Range(0, 1);
            bool firstLineContainsColumnHeaders = true;
            CategoricalDataSet dataset = CategoricalDataSet.Encode(
                streamReader,
                columnDelimiter,
                extractedColumns,
                firstLineContainsColumnHeaders);

            // Decode and show the data set. 
            Console.WriteLine("Decoded data set:");
            Console.WriteLine();
            var decodedDataSet = dataset.Decode();
            int numberOfInstances = dataset.Data.NumberOfRows;
            int numberOfVariables = dataset.Data.NumberOfColumns;

            foreach (var variable in dataset.Variables) {
                Console.Write(variable.Name + ",");
            }
            Console.WriteLine();

            for (int i = 0; i < numberOfInstances; i++) {
                for (int j = 0; j < numberOfVariables; j++) {
                    Console.Write(decodedDataSet[i][j] + ",");
                }
                Console.WriteLine();
            }
        }
    }
}

// Executing method Main() produces the following output:
// 
// Decoded data set:
// 
// COLOR,NUMBER,
// Red,Negative,
// Green,Zero,
// Red,Negative,
// Black,Negative,
// Black,Positive,

Exceptions

ArgumentNullExceptionreader is null.
-or-
extractedColumns is null.
InvalidDataException The stream accessed by reader contains no data rows.
-or-
There is at least a row which contains not enough data for any column specified by extractedColumns. This can happen if there are missing columns, or if strings representing variable names or category labels, i.e. tokens extracted from the stream, are null or consist only of white-space characters. In some cases, the InnerException property is set to add further details about the occurred error.

See Also