Maximum Training Duration (hours)
For optimal use of the infrastructure, you can limit the model training time. Upon reaching the time limit during training, the platform will stop the training and save the best model.
Maximum Number of Coefficients
Neuton models are very small by default, but you may still limit the number of coefficients if necessary. By default, the number of coefficients is unlimited.
Perform Data Preprocessing`
This parameter enables automatic data preprocessing (enabled by default). With this option enabled a user may upload datasets with different data types such as textual data, categorical variables in string representation, floats, integers, dates, etc.
The automated data preprocessing stage includes:
automatic statistical analysis of data;
rule induction from data;
dataset cleaning;
missing values imputation using the tree-based machine learning models;
removal of duplicates and outliers removal;
variables transformation;
data sampling;
binomial and multinomial data correlation tests;
data scaling and normalization;
text fields detection.
All of the above actions transform data to a machine-learning-feasible format and amplify data significance to reflect the target variable state.
The automatic data preprocessing pipeline incorporates internally developed methods as well as some open source libraries.
Perform Feature Engineering
This parameter enables actions to extract additional information and to create new variables from the existing data (disabled by default).
Feature engineering is an essential part of any machine learning application. The automated feature engineering stage includes:
analysis of binomial and multinomial features correlations with the target variable;
creation of feature interactions based on linear models;
creation of feature interactions based on tree-based models;
search, examination, and removal of mutually correlated features;
feature importance ranking;
TF-IDF token generation for detected text fields.
All of the feature evaluation experiments are carried out in cross-validation mode to ensure the feature significance for building the final model, and in multithread mode to decrease the processing time.
Disabling Data Preprocessing / Enabling Feature Engineering In the event a user has performed data preprocessing manually and does not want any third-party interference in the data, the user can upload the prepared dataset for modeling and disable data preprocessing. If a user would like to build a model using additionally extracted features user can enable feature engineering.
Time Series
Time Series problems arise from time dependencies in the data. For example, data consists of sales values by day for the past year and you want to predict sales for the upcoming month. Neuton supports the following time formats: relative, unix-time, and timestamps in the en-US locale.
To consider a problem as a time series the problem must incorporate the following conditions:
the dataset includes a date/time column;
the target variable is dependent on the date/time;
you want to predict an event in the future (in a timeframe following the train set time range).
If the above conditions are met, you should enable Time Series on the platform.
If the time series option is enabled, the following additional settings must also be defined:
select date/time column: the column containing a date and/or time;
select the period units (minutes/hours/days/etc.) for the prediction timeframe;
specify the prediction timeframe.
If you would like to predict (e.g. sales) for the next month, select months in period units and type “1” in the “How many periods to predict” field.
If a time gap is expected between the training data (statistics) and the prediction window, select the desired values for the following gap settings:
select the period units for the gap between training and prediction
specify the gap between training and prediction in the “After how many periods to predict” field.
Model Settings
After the training parameters are defined in either of the modes (Base/Advanced) click “Start training” to start the training process.