Neuton

Explainability Office^{®}

Explainability Office

When building solutions for machine learning, it is equally important to evaluate the quality of a model
and prediction results, as well as to be able to interpret them.

Hence we created the Explainability Office, a unique set of tools that allow users to evaluate model quality at every stage, identify the logic behind the model analysis, and therefore also understand why certain predictions have been made.

Hence we created the Explainability Office, a unique set of tools that allow users to evaluate model quality at every stage, identify the logic behind the model analysis, and therefore also understand why certain predictions have been made.

Interpretation

To comprehend the decision-making process and identify internal patterns, we simulate the output
of the model across the entire variety of input variables, and present the result in the form of
comprehensible slices of multidimensional space. At the same time, we rank these slices by
influence for each specific prediction.

Model Interpreter

The Model Interpreter is a tool that allows you to visually see the logic, direction and the
effects of changes in individual variables in the model. It also shows the importance of these
variables in relation to the target variable.

Read more
Feature Importance Matrix (FIM)

After the model has been trained, the platform displays a chart with the 10 features that had
the most significant impact on the model prediction power. You can also select any other
features to see their importance. FIM also has 2 modes, displaying either only the original
features or the features after feature engineering. For classification tasks you can see the
feature importance for every class.

Quality

Evaluate model quality at every stage:

Model Lifecycle: data

Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is a tool that automates graphical data analysis and highlights
the most important statistics in the context of a single variable, overall data,
interconnections, and in relation to the target variable in a training dataset. Given the
potentially wide feature space, up to 20 of the most important features with the highest
statistical significance are selected for EDA based on machine learning modeling.

Read more

Model Lifecycle: Training

Model Quality Diagram

Model Quality Diagram simplifies the process of evaluating the quality of the model, and also
allows users to look at the model from the perspective of various metrics simultaneously in a
single graphical view. We offer an extensive list of metrics describing the quality popular in
the data science community.

Model Lifecycle: Prediction

Besides well-known indicators for evaluation of model quality (e.g. probability and credibility
interval), we also calculate a set of additional indicators (row-level explainability):

Confidence Interval

The Confidence Interval, for regression problems, shows in what range the predicted value can
change and with what probability.

Model-to-Data Relevance Indicator

Model-to-data Relevance Indicator calculates the statistical differences between the data
uploaded for predictions and the data used for model training. Significant differences in the
data may indicate metric decay (model prediction quality degradation).

Model Lifecycle: APPLICATION

Historical Model-to-Data Relevance Indicator

Historical Model-to-data Relevance is an excellent signal for models to retrain. This indicator
is designed even for downloadable models, which allows to manage a model lifecycle even outside
the platform.

COMING SOON

Validate Model on New Data

Validate Model on New Data shows model metrics on new data to help determine whether the model
should be retrained to reflect the statistical changes and dependencies in new data. It also
shows metrics in multidimensional space (Model Quality Diagram).

EDA

In the «Exploratory Data Analysis» tool you can find the specified information
on graphics in each of the following sections:

Dataset overview

This section displays brief data statistics of your training dataset and provides the
following information: problem type, dataset dimension and number of missing values
recorded.

Continuous data distribution and relation to the target variable

Visualization of each continuous variable yields two plots:

Variable density distribution chart

A Density plot visualizing the distribution of data across all rows in the dataset.

This chart is a variation of a Histogram that uses kernel smoothing to plot values, allowing for smoother distributions by smoothing out the noise. The peaks of a Density plot help display where values are concentrated over the interval.

This chart is a variation of a Histogram that uses kernel smoothing to plot values, allowing for smoother distributions by smoothing out the noise. The peaks of a Density plot help display where values are concentrated over the interval.

Feature relation to the target variable (different for regression and classification task
types)

This chart is presented in one of the following two formats: line chart, indicating the continuous variable changes with the changes in the continuous target variable (regression task type) or histogram, showing the mean continuous variable value for each of the classes of the target variable (classification task type).

This chart is presented in one of the following two formats: line chart, indicating the continuous variable changes with the changes in the continuous target variable (regression task type) or histogram, showing the mean continuous variable value for each of the classes of the target variable (classification task type).

Discrete data distribution and relation to the target variable

Visualization of each categorical variable yields two plots:

Histogram displaying feature categories count

Feature categories relation to the target variable (different for regression and
classification task types)

This chart is presented in one of two formats, depending on task type: a histogram displaying the mean target variable for each of the feature categories (regression task type) or a histogram displaying the number of each of the target classes in each of the feature classes (classification task type).

This chart is presented in one of two formats, depending on task type: a histogram displaying the mean target variable for each of the feature categories (regression task type) or a histogram displaying the number of each of the target classes in each of the feature classes (classification task type).

Feature correlations

Visualization of the correlations in the data yields two plots:

Target variable distribution

Visualization of the target variable statistics is presented in one of two formats:

Outliers Visualization

Time Dependencies

A Time dependency plot is created if a date-time type column is presented in the data.
Visualization of time dependency yields three plots, each displaying a line chart of the
target variable changes over time. The difference between the charts is the level of data
aggregation:

Missing Values Visualization

The **«Missing Values Map Overall Dataset»** plot displays all the
data feature bars without feature names and with missing values percentage indicator. The
purpose of this plot is to give an overall visual representation of the missing values in
the data.

The **«Missing Values Map and percentage»** plot displays only the
columns which contain missing
values with feature names, missing values percentage and the corresponding locations of the
missing values in the dataset.

Register for our next Webinar