Support Library
Explainability Office
Exploratory Data Analysis (EDA)
The EDA is the first explainability feature inside the Neuton model lifecycle. This tool automates initial data (training dataset) analysis and relation to the target variable. The EDA report is generated during model training. Given the potentially wide feature space, up to 20 of the most important features with the highest statistical significance are selected for EDA based on machine learning modeling. The EDA tool is available on the “My Solution” page and the “Training” tab, found via the Analytics Tools button.
The final model might indicate a slightly different set of features having the highest importance due to a comprehensive number of iterations during training. This could result in discrepancies between the features selected for EDA and the final Feature Importance Matrix.
In the “Exploratory Data Analysis” tool you can find the specified information on graphics in each of the following sections:
Dataset overview
This section displays brief data statistics of your training dataset and provides the following information: problem type, dataset dimension, missing values, and record number.
Continuous data distribution and relation to the target variable.
Visualization of each continuous variable yields two plots:
Variable density distribution chart
A Density plot visualizes the distribution of data across all rows in the dataset. This chart is a variation of a histogram that uses kernel smoothing to plot values, allowing smoother distributions by smoothing out the noise. The peaks of a Density plot help to display where values are concentrated over the interval.
Feature relation to the target variable (different for regression and classification task types)
This chart is presented in one of the following two formats: line chart, indicating the continuous variable changes with the changes in the continuous target variable (regression task type), or histogram, showing the mean continuous variable value for each of the classes of the target variable (classification task type).
Discrete data distribution and relation to the target variable
Visualization of each categorical variable yields two plots:
Histogram displaying feature categories count
Feature categories relation to the target variable (different for regression and classification task types)
This chart is presented in one of two formats, depending on task type: a histogram displaying the mean target variable for each of the feature categories (regression task type), or a histogram displaying the number of each of the target classes in each of the feature classes (classification task type).
Feature correlations
Visualization of the correlations in the data yields two plots:
Heatmap displaying the binary correlation of the 10 most important variables between each other and with the target variable (the 10 most important features are selected based on the binary correlation of the features with the target variable);
Histogram (horizontal) displaying the level of high mutual correlation between independent variable pairs.
Pairs are selected if the value of their mutual correlation exceeds 0.7.
Target variable distribution
Visualization of the target variable statistics is presented in one of the two formats:
Violin plot displaying the distribution, median, and outliers in the target variable (regression task type);
Histogram/count plot displaying the number and percentage of each of the target classes throughout the whole dataset (classification task type)
Outliers
Visualization of the outliers in the data is presented in one of the two plots:
Scatter plot displaying the variable distribution in relation to the target variable (regression task type);
Box plot displaying the variable distribution/quantiles/median and the outliers (classification task type).
Outliers are marked purple according to the plot legends.
Time Dependencies
A Time dependency plot is created if a date-time type column is presented in the data. Visualization of time dependency yields three plots each displaying a line chart of the target variable changes over time.
Missing Values
Visualization of the missing values in the data yields two histograms displaying each data feature as an equal bar with missing values indication against corresponding data indexes.
The “Missing Values Map Overall Dataset” plot displays all the data feature bars without feature names and with missing values percentage indication. The purpose of this plot is to give an overall visual representation of the missing values in the data.
The “Missing Values Map and percentage” plot displays only the columns which contain missing values with feature names, missing values percentage, and the corresponding locations of the missing values in the dataset.
Analytics Tools. Exploratory Data Analysis.
Exploratory Data Analysis Overview
Model Quality Index
The Model Quality Index and Model Quality Diagram help users understand the model quality during the training process, and the quality of the trained model. Model Quality Index determines the quality of the model based on the metric indicator values.
Model Quality Index Value Range: 1 - 100%
Minimum quality: 1%
Maximum quality: 100%
The Model Quality Index is calculated based on the training or the validation dataset (if applicable). The correlation between the Training Model Quality Index and the acceptable model predictive power depends significantly on the problem being solved by the model.
Thus, for tasks that do not require high model predictive power, the acceptable range of the Model Quality Index values is 75-100%, while for high-precision tasks it is 99-100%.
The Model Quality Index is the aggregated value for the metric indicator values.
Model Quality Diagram
The Model Quality Diagram simplifies the process of evaluating the model quality. The values of all metrics are evaluated on a scale from 0 to 1, where 1 is the most accurate model.
Model Quality Diagram
It also allows users to understand metric balance. When the figure displayed is close to the shape of a regular polygon, that conveys a perfect balance between all metric indicator values.
Feature Importance Matrix (FIM)
After the model has been trained, the platform displays a chart with the 10 features that had the most significant impact on the model prediction of the target variable. The chart can be found by clicking the Analytics Tools button on the My Solution page and inside each solution in the Training and Prediction tabs.
The Feature Importance Matrix helps to evaluate the impact of features on the model.

If the feature does not affect the model at all (normalized value = 0), then you can exclude this variable from building the model and performing predictions.
Feature Importance Matrix
FIM has the following control options:
Top 10/Bottom 10
This control option allows you to show the 10 most important or least important features.
Select features
The “Select Features” button allows you to select features of interest.
Classes (for binary and multiclass classification task types only)
These check boxes allow you to control what classes to display on a bar chart.
Model Interpreter – for disabled DSP
When you are viewing prediction results in the web interface, you can click any row to activate the Model Interpreter. Prediction must be enabled for the Model Interpreter to work correctly.
To comprehend the decision-making process and identify internal patterns, the platform simulated the output of the model across the entire variety of input variables and presented the result in the form of comprehensible slices of multidimensional space. At the same time, it ranked these slices by influence for each specific prediction.
You can also specify a threshold value for the prediction result and see the corresponding values of the original features. For the numeric features, you can see Feature Influence Indicator on the right section of the page. This indicator will show how the prediction will be affected when increasing/ decreasing the feature value.
Default feature values will correspond to the actual input features used for the prediction. You can then change one feature value at a time to see how the prediction would have been affected.
Model Interpreter
Model Interpreter Controls
The Model Interpreter offers the following controls:
“Exit” icon
Use this button to return to the full list of prediction results
Predicted Result
Shows the prediction result for the selected values of the original features (with the predicted class probability for the classification task type).
Threshold Value Slider
Using the slider (or typing manually) you can specify the threshold value of the target variable and Neuton will highlight the corresponding values of the original features for which the prediction result is above or below the threshold value. For the classification task input the “Minimal probability” of the predicted class is used as a threshold. This way you can quickly see which feature values you need as input to predict above the selected threshold.
List of features for analysis
Features are ordered according to their rank in the Feature Importance Matrix.
Features List
The categorical features are represented as bar charts. On the horizontal axis, you can select any feature value (available feature values derive from the training dataset feature variations). This means that if your test dataset feature has additional categories that the model hasn’t seen, those categories will not be available in the Model Interpreter. For each categorical feature, you can see the predicted target value for all the feature categories. For the classification task type, you can see the predicted class along with the probability.
The numeric features are represented as area charts. The feature values on the horizontal axis are ordered from min to max All the available feature space (from min to max) is divided into 100 sections. For each numeric feature value, you can see the corresponding prediction value.
For all features, hovering over any feature value will show you the corresponding prediction details. Also, for any feature, if you left-click and move the pointer to the desired feature value and release the left mouse button, the prediction result will be recalculated given the new feature input. You can click the Return button to return to the original feature value.
Feature influence
For the numeric features, you can see the feature influence on the prediction result. A positive feature influence value means the predicted target value will increase when the feature value is increased, a negative feature influence value means the predicted target value will decrease if the feature value is decreased.
Confidence Interval (for regression tasks only)
The Confidence Interval shows in what range the predicted value can change and with what probability.
Confidence Interval
Confusion Matrix (for classification tasks only)
The Confusion Matrix shows the number of correct and incorrect predictions based on the validation data.
Confusion Matrix
The matrix becomes available only at the end of training or if the user stops training when the model is consistent. You can find it under the list of metrics on the Training and Prediction page in each solution with a classification problem.
The Explainability Office is a set of Neuton’s explainability features:
Model Quality Index
The Model Quality Index and Model Quality Diagram help users understand the model quality during the training process, and the quality of the trained model. Model Quality Index determines the quality of the model based on the metric indicator values.
Model Quality Index Value Range: 1 - 100%
Minimum quality: 1%
Maximum quality: 100%
The Model Quality Index is calculated based on the training or the validation dataset (if applicable). The correlation between the Training Model Quality Index and the acceptable model predictive power depends significantly on the problem being solved by the model.
Thus, for tasks that do not require high model predictive power, the acceptable range of the Model Quality Index values is 75-100%, while for high-precision tasks it is 99-100%.
The Model Quality Index is the aggregated value for the metric indicator values.
Model Quality Diagram
The Model Quality Diagram simplifies the process of evaluating the model quality. The values of all metrics are evaluated on a scale from 0 to 1, where 1 is the most accurate model.
Model Quality Diagram
It also allows users to understand metric balance. When the figure displayed is close to the shape of a regular polygon, that conveys a perfect balance between all metric indicator values.
Confusion Matrix (for classification tasks only)
The Confusion Matrix shows the number of correct and incorrect predictions based on the validation data.
Confusion Matrix
The matrix becomes available only at the end of training or if the user stops training when the model is consistent. You can find it under the list of metrics on the Training and Prediction page in each solution with a classification problem.
Exploratory Data Analysis (EDA)
The EDA is the first explainability feature inside the Neuton model lifecycle. This tool automates initial data (training dataset) analysis and relation to the target variable. The EDA report is generated during model training. Given the potentially wide feature space, up to 20 of the most important features with the highest statistical significance are selected for EDA based on machine learning modeling. The EDA tool is available on the “My Solution” page and the “Training” tab, found via the Analytics Tools button.
The final model might indicate a slightly different set of features having the highest importance due to a comprehensive number of iterations during training. This could result in discrepancies between the features selected for EDA and the final Feature Importance Matrix.
In the “Exploratory Data Analysis” tool you can find the specified information on graphics in each of the following sections:
Dataset overview
This section displays brief data statistics of your training dataset and provides the following information: problem type, dataset dimension, missing values, and record number.
Continuous data distribution and relation to the target variable.
Visualization of each continuous variable yields two plots:
Variable density distribution chart
A Density plot visualizes the distribution of data across all rows in the dataset. This chart is a variation of a histogram that uses kernel smoothing to plot values, allowing smoother distributions by smoothing out the noise. The peaks of a Density plot help to display where values are concentrated over the interval.
Feature relation to the target variable (different for regression and classification task types)
This chart is presented in one of the following two formats: line chart, indicating the continuous variable changes with the changes in the continuous target variable (regression task type), or histogram, showing the mean continuous variable value for each of the classes of the target variable (classification task type).
Discrete data distribution and relation to the target variable
Visualization of each categorical variable yields two plots:
Histogram displaying feature categories count
Feature categories relation to the target variable (different for regression and classification task types)
This chart is presented in one of two formats, depending on task type: a histogram displaying the mean target variable for each of the feature categories (regression task type), or a histogram displaying the number of each of the target classes in each of the feature classes (classification task type).
Feature correlations
Visualization of the correlations in the data yields two plots:
Heatmap displaying the binary correlation of the 10 most important variables between each other and with the target variable (the 10 most important features are selected based on the binary correlation of the features with the target variable);
Histogram (horizontal) displaying the level of high mutual correlation between independent variable pairs.
Pairs are selected if the value of their mutual correlation exceeds 0.7.
Target variable distribution
Visualization of the target variable statistics is presented in one of the two formats:
Violin plot displaying the distribution, median, and outliers in the target variable (regression task type);
Histogram/count plot displaying the number and percentage of each of the target classes throughout the whole dataset (classification task type)
Outliers
Visualization of the outliers in the data is presented in one of the two plots:
Scatter plot displaying the variable distribution in relation to the target variable (regression task type);
Box plot displaying the variable distribution/quantiles/median and the outliers (classification task type).
Outliers are marked purple according to the plot legends.
Time Dependencies
A Time dependency plot is created if a date-time type column is presented in the data. Visualization of time dependency yields three plots each displaying a line chart of the target variable changes over time.
Missing Values
Visualization of the missing values in the data yields two histograms displaying each data feature as an equal bar with missing values indication against corresponding data indexes.
The “Missing Values Map Overall Dataset” plot displays all the data feature bars without feature names and with missing values percentage indication. The purpose of this plot is to give an overall visual representation of the missing values in the data.
The “Missing Values Map and percentage” plot displays only the columns which contain missing values with feature names, missing values percentage, and the corresponding locations of the missing values in the dataset.
Analytics Tools. Exploratory Data Analysis.
Exploratory Data Analysis Overview
Model Quality Index
The Model Quality Index and Model Quality Diagram help users understand the model quality during the training process, and the quality of the trained model. Model Quality Index determines the quality of the model based on the metric indicator values.
Model Quality Index Value Range: 1 - 100%
Minimum quality: 1%
Maximum quality: 100%
The Model Quality Index is calculated based on the training or the validation dataset (if applicable). The correlation between the Training Model Quality Index and the acceptable model predictive power depends significantly on the problem being solved by the model.
Thus, for tasks that do not require high model predictive power, the acceptable range of the Model Quality Index values is 75-100%, while for high-precision tasks it is 99-100%.
The Model Quality Index is the aggregated value for the metric indicator values.
Model Quality Diagram
The Model Quality Diagram simplifies the process of evaluating the model quality. The values of all metrics are evaluated on a scale from 0 to 1, where 1 is the most accurate model.
Model Quality Diagram
It also allows users to understand metric balance. When the figure displayed is close to the shape of a regular polygon, that conveys a perfect balance between all metric indicator values.
Feature Importance Matrix (FIM)
After the model has been trained, the platform displays a chart with the 10 features that had the most significant impact on the model prediction of the target variable. The chart can be found by clicking the Analytics Tools button on the My Solution page and inside each solution in the Training and Prediction tabs.
The Feature Importance Matrix helps to evaluate the impact of features on the model.

If the feature does not affect the model at all (normalized value = 0), then you can exclude this variable from building the model and performing predictions.
Feature Importance Matrix
FIM has the following control options:
Top 10/Bottom 10
This control option allows you to show the 10 most important or least important features.
Original features/ Original + features after feature engineering
This option allows you to control what features to display (original features from the training dataset or original features and features after feature engineering).
Select feature
The “Select Feature” button allows you to select features of interest.
Export FIM
The “Export FIM” button allows you to export FIM in JSON format.
Classes (for binary and multiclass classification task types only)
These check boxes allow you to control what classes to display on a bar chart.
Model Interpreter
When you are viewing prediction results in the web interface, you can click any row to activate the Model Interpreter. Prediction must be enabled for the Model Interpreter to work correctly.
To comprehend the decision-making process and identify internal patterns, the platform simulated the output of the model across the entire variety of input variables and presented the result in the form of comprehensible slices of multidimensional space. At the same time, it ranked these slices by influence for each specific prediction.
You can also specify a threshold value for the prediction result and see the corresponding values of the original features. For the numeric features, you can see Feature Influence Indicator on the right section of the page. This indicator will show how the prediction will be affected when increasing/ decreasing the feature value.
Default feature values will correspond to the actual input features used for the prediction. You can then change one feature value at a time to see how the prediction would have been affected.
Model Interpreter
Model Interpreter Controls
The Model Interpreter offers the following controls:
“Exit” icon
Use this button to return to the full list of prediction results
Predicted Result
Shows the prediction result for the selected values of the original features (with the predicted class probability for the classification task type).
Threshold Value Slider
Using the slider (or typing manually) you can specify the threshold value of the target variable and Neuton will highlight the corresponding values of the original features for which the prediction result is above or below the threshold value. For the classification task input the “Minimal probability” of the predicted class is used as a threshold. This way you can quickly see which feature values you need as input to predict above the selected threshold.
List of features for analysis
Features are ordered according to their rank in the Feature Importance Matrix.
Features List
The categorical features are represented as bar charts. On the horizontal axis, you can select any feature value (available feature values derive from the training dataset feature variations). This means that if your test dataset feature has additional categories that the model hasn’t seen, those categories will not be available in the Model Interpreter. For each categorical feature, you can see the predicted target value for all the feature categories. For the classification task type, you can see the predicted class along with the probability.
The numeric features are represented as area charts. The feature values on the horizontal axis are ordered from min to max All the available feature space (from min to max) is divided into 100 sections. For each numeric feature value, you can see the corresponding prediction value.
For all features, hovering over any feature value will show you the corresponding prediction details. Also, for any feature, if you left-click and move the pointer to the desired feature value and release the left mouse button, the prediction result will be recalculated given the new feature input. You can click the Return button to return to the original feature value.
Feature influence
For the numeric features, you can see the feature influence on the prediction result. A positive feature influence value means the predicted target value will increase when the feature value is increased, a negative feature influence value means the predicted target value will decrease if the feature value is decreased.
Confidence Interval (for regression tasks only)
The Confidence Interval shows in what range the predicted value can change and with what probability.
Confidence Interval
Confusion Matrix (for classification tasks only)
The Confusion Matrix shows the number of correct and incorrect predictions based on the validation data.
Confusion Matrix
The matrix becomes available only at the end of training or if the user stops training when the model is consistent. You can find it under the list of metrics on the Training and Prediction page in each solution with a classification problem.
Model Quality Index
The Model Quality Index and Model Quality Diagram help users understand the model quality during the training process, and the quality of the trained model. Model Quality Index determines the quality of the model based on the metric indicator values.
Model Quality Index Value Range: 1 - 100%
Minimum quality: 1%
Maximum quality: 100%
The Model Quality Index is calculated based on the training or the validation dataset (if applicable). The correlation between the Training Model Quality Index and the acceptable model predictive power depends significantly on the problem being solved by the model.
Thus, for tasks that do not require high model predictive power, the acceptable range of the Model Quality Index values is 75-100%, while for high-precision tasks it is 99-100%.
The Model Quality Index is the aggregated value for the metric indicator values.
Model Quality Diagram
The Model Quality Diagram simplifies the process of evaluating the model quality. The values of all metrics are evaluated on a scale from 0 to 1, where 1 is the most accurate model.
Model Quality Diagram
It also allows users to understand metric balance. When the figure displayed is close to the shape of a regular polygon, that conveys a perfect balance between all metric indicator values.
Feature Importance Matrix (FIM)
After the model has been trained, the platform displays a chart with the 10 features that had the most significant impact on the model prediction of the target variable. The chart can be found by clicking the Analytics Tools button on the My Solution page and inside each solution in the Training and Prediction tabs.
The Feature Importance Matrix helps to evaluate the impact of features on the model.

If the feature does not affect the model at all (normalized value = 0), then you can exclude this variable from building the model and performing predictions.
Feature Importance Matrix
FIM has the following control options:
Top 10/Bottom 10
This control option allows you to show the 10 most important or least important features.
Original features/ Original + features after feature engineering
This option allows you to control what features to display (original features from the training dataset or original features and features after feature engineering).
Select feature
The “Select Feature” button allows you to select features of interest.
Export FIM
The “Export FIM” button allows you to export FIM in JSON format.
Classes (for binary and multiclass classification task types only)
These check boxes allow you to control what classes to display on a bar chart.
Model Interpreter
When you are viewing prediction results in the web interface, you can click any row to activate the Model Interpreter. Prediction must be enabled for the Model Interpreter to work correctly.
To comprehend the decision-making process and identify internal patterns, the platform simulated the output of the model across the entire variety of input variables and presented the result in the form of comprehensible slices of multidimensional space. At the same time, it ranked these slices by influence for each specific prediction.
You can also specify a threshold value for the prediction result and see the corresponding values of the original features. For the numeric features, you can see Feature Influence Indicator on the right section of the page. This indicator will show how the prediction will be affected when increasing/ decreasing the feature value.
Default feature values will correspond to the actual input features used for the prediction. You can then change one feature value at a time to see how the prediction would have been affected.
Model Interpreter
Model Interpreter Controls
The Model Interpreter offers the following controls:
“Exit” icon
Use this button to return to the full list of prediction results
Predicted Result
Shows the prediction result for the selected values of the original features (with the predicted class probability for the classification task type).
Threshold Value Slider
Using the slider (or typing manually) you can specify the threshold value of the target variable and Neuton will highlight the corresponding values of the original features for which the prediction result is above or below the threshold value. For the classification task input the “Minimal probability” of the predicted class is used as a threshold. This way you can quickly see which feature values you need as input to predict above the selected threshold.
List of features for analysis
Features are ordered according to their rank in the Feature Importance Matrix.
Features List
The categorical features are represented as bar charts. On the horizontal axis, you can select any feature value (available feature values derive from the training dataset feature variations). This means that if your test dataset feature has additional categories that the model hasn’t seen, those categories will not be available in the Model Interpreter. For each categorical feature, you can see the predicted target value for all the feature categories. For the classification task type, you can see the predicted class along with the probability.
The numeric features are represented as area charts. The feature values on the horizontal axis are ordered from min to max All the available feature space (from min to max) is divided into 100 sections. For each numeric feature value, you can see the corresponding prediction value.
For all features, hovering over any feature value will show you the corresponding prediction details. Also, for any feature, if you left-click and move the pointer to the desired feature value and release the left mouse button, the prediction result will be recalculated given the new feature input. You can click the Return button to return to the original feature value.
Feature influence
For the numeric features, you can see the feature influence on the prediction result. A positive feature influence value means the predicted target value will increase when the feature value is increased, a negative feature influence value means the predicted target value will decrease if the feature value is decreased.
Confidence Interval (for regression tasks only)
The Confidence Interval shows in what range the predicted value can change and with what probability.
Confidence Interval
Confusion Matrix (for classification tasks only)
The Confusion Matrix shows the number of correct and incorrect predictions based on the validation data.
Confusion Matrix
The matrix becomes available only at the end of training or if the user stops training when the model is consistent. You can find it under the list of metrics on the Training and Prediction page in each solution with a classification problem.




Stay updated, join the community
slack