Support Library
Dataset Requirements
Dataset Requirements for Sensor Data
The requirements below are provided for training and test datasets. Meeting these requirements will guarantee successful model training and prediction on the new data. We recommend using a text editor like Notepad++ to check the datasets.
Requirements for training datasets
A dataset must be a CSV file using UTF-8 or ISO-8859-1 encoding.
A dataset must have a minimum of 2 columns, 50 rows, and a header (51 rows in total). The first row in the dataset must contain the column names.
The file name must not contain the following characters: !/[+!@#$%^&*,. ?":{}\\/|<>()[]]
All feature values in the dataset must be numeric. For the classification task type, values of the target variable should start with 0.
A dataset must not have any empty values or values which represent empty values like “NA”, “NAN”, etc.
A comma, semicolon, pipe, caret, or tab must be used as a separator. CRLF or LF should be used as the end-of-line character. The separator and end-of-line character should be consistent inside the dataset.
All column names (values in the CSV file header) must be unique and must contain only letters (a-z, A-Z), numbers (0-9), hyphens (-), or underscores (_).
For the classification task type, a training dataset must have a minimum of 2 classes of target variable with at least 10 samples provided for each class.
Currently, Neuton supports only the EN-US locale for numbers, so:
You must use a dot as a decimal separator, and delete spaces and commas typically used to separate every third digit in your numeric fields. For example – "20,000.00" should be replaced with "20000.00"
If any numeric column is represented as a combination of a number and its corresponding unit, then only the number should be placed in the column. For example – "$20,000.00" should be replaced with "20000.00"
Date/time columns must be in epoch time representation or relative date format. For example, “10/18/2017” should be represented as “1508284800”.
In test datasets, the same timestamp format should be used in a manner consistent with the training dataset.
End-of-line symbols must be excluded from the field values.
In the case of sensor data from gyroscopes, accelerometers, magnetometers, electromyography (EMG), and other similar devices for creating models using Digital Signal Preprocessing, every row of a dataset should be device readings per unit of time with a label as a target. You should not shuffle signal labels or encode your signal for model creation.
For example, if the window is 8, then the dataset can be organized as follows:
Training Dataset
The number of lines must be excluded from the training dataset.
Data can be represented in the following types: INT8, INT16, FLOAT 32
List of requirements for test datasets (or new data)
When transferring data for inference on the device, the values must always be in the same order as in the training dataset.
Datatype should match the train data type.
A dataset must be a CSV file using UTF-8 or ISO-8859-1 encoding.
The first row in the dataset must contain the column names, and a comma, semicolon, pipe, caret, or tab must be used as a separator. CRLF or LF should be used as the end of a line character.
The test dataset must have the same file structure with the same requirements for the feature values as the training dataset. The order of fields must be the same as in the training dataset.
End-of-line symbols must be excluded from the field values.
Test Dataset
How to identify and change file encoding
The Neuton Platform supports UTF-8 and ISO-8859-1 encoding for CSV files. Please check your file encoding and convert it to one of the supported options.
To check the current encoding of a file, open the file in the text editor of your choice (for example Notepad++). You will find the file encoding specified in the bottom right corner of the window.
UTF-8
If your file encoding differs from the options supported, you’ll need to convert it to UTF-8 and save the file. To change the encoding, select the Encoding menu and click “Convert to UTF-8”. When the conversion is complete, you will see “UTF-8” specified in the bottom right corner. Save the file to use on the Neuton platform.
Encoding
Requirements for Audio Files
Raw audio files should be in wav or wave formats.
Audio files should all be placed in a folder.
The folder must contain subfolders with samples of the same class (label).
The subfolder name will be used for class labeling.
Folder Example
Audio Files
Dataset Requirements for Tabular Data
The requirements below are provided for training and test datasets. Meeting these requirements will guarantee successful model training and prediction on the new data. We recommend using a text editor like Notepad++ to check the datasets.
Requirements for training datasets
A dataset must be a CSV file using UTF-8 or ISO-8859-1 encoding.
A dataset must have a minimum of 2 columns, 50 rows, and headers.
The file name must not contain the following characters: !/[+!@#$%^&*,. ?":{}\\/|<>()[]]
All feature values in the dataset must be numeric, the target column can also be represented by string values for classification task type. For regression task types, the target variable must have only numeric values.
A dataset must not have any empty values or values which represent empty values like “NA”, “NAN”, etc.
The first row in the dataset must contain the column names. A comma, semicolon, pipe, caret, or tab must be used as a separator. CRLF or LF should be used as the end-of-line character. The separator and end-of-line character should be consistent inside the dataset.
All column names (values in the CSV file header) must be unique and must contain only letters (a-z, A-Z), numbers (0-9), hyphens (-), or underscores (_).
For the classification task type, a training dataset must have a minimum of 2 classes of target variable with at least 10 samples provided for each class.
Currently, Neuton supports only the EN-US locale for numbers, so:
You must use a dot as a decimal separator, and delete spaces and commas typically used to separate every third digit in your numeric fields. For example – "20,000.00" should be replaced with "20000.00"
If any numeric column is represented as a combination of a number and its corresponding unit, then only the number should be placed in the column. For example – "$20,000.00" should be replaced with "20000.00"
Date/time columns must be in epoch time representation or relative date format. For example, “10/18/2017” should be represented as “1508284800”.
In test datasets, the same timestamp format should be used in a manner consistent with the training dataset.
End-of-line symbols must be excluded from the field values.
Training Dataset
List of requirements for test datasets (or new data)
A dataset must be a CSV file using UTF-8 or ISO-8859-1 encoding.
For prediction on the platform (using a web interface) the first row in the dataset must contain the column names. A comma, semicolon, pipe, caret, or tab must be used as a separator. CRLF or LF should be used as the end of a line character. When transferring data for inference on the device, the values must always be in the same order as in the training dataset (without target).
The test dataset must have the same file structure with the same requirements for the feature values as the training dataset.
The order of fields must be the same as in the training dataset. For example: if in the training dataset the order of columns is ‘B, C, A’. Then the input data for prediction must be in the same order: value for feature B, after that - value for feature C, and the next one - value for feature A.
End-of-line symbols must be excluded from the field values.
Test Dataset
How to identify and change file encoding
The Neuton Platform supports UTF-8 and ISO-8859-1 encoding for CSV files. Please check your file encoding and convert it to one of the supported options.
To check the current encoding of a file, open the file in the text editor of your choice (for example Notepad++). You will find the file encoding specified in the bottom right corner of the window.
UTF-8
If your file encoding differs from the options supported, you’ll need to convert it to UTF-8 and save the file. To change the encoding, select the Encoding menu and click “Convert to UTF-8”. When the conversion is complete, you will see “UTF-8” specified in the bottom right corner. Save the file to use on the Neuton platform.
Encoding


Stay updated, join the community