Predictive Model Monitoring with IBM watsonx.governance

13 min readSep 1, 2024

As generative AI and large language models (LLMs) capture the spotlight, traditional predictive models continue to be indispensable in solving critical business challenges across various industries. While LLMs are making waves in natural language processing and other advanced applications, predictive models remain the backbone of decision-making in sectors like finance, healthcare, retail, and manufacturing. These models drive everything from predictive analytics and customer insights to risk assessment and optimizing supply chains, highlighting the ongoing need for robust monitoring and governance solutions.

IBM watsonx.governance offers a comprehensive suite designed to govern predictive models effectively throughout their lifecycle, ensuring they meet the highest standards of accuracy, transparency, and compliance.

In this article, we will see how IBM AutoAI along with watsonx.governance, can be leveraged to build, monitor, and govern a predictive machine learning model effectively.

Core Components of Watsonx Governance

IBM Watsonx Governance is built around three key components that collectively provide a powerful framework for managing predictive models:

1. AI Factsheets:

Lifecycle Tracking: AI Factsheets offer comprehensive documentation for each model, capturing every stage of its lifecycle, from development through deployment. This tracking includes detailed metadata about the model and its underlying data, ensuring transparency and traceability.
Ongoing Stakeholder Alignment: Factsheets are continually updated to reflect changes in model performance and operational status. This real-time updating keeps stakeholders informed and facilitates effective communication across teams, ensuring that all relevant parties are aligned with the model’s current state and performance.

2. Watson OpenScale:

Metrics and Validation: Watson OpenScale provides continuous evaluation of predictive models through a set of rigorous metrics. These include accuracy, precision, recall, fairness, and drift, allowing businesses to maintain high standards in model performance.
Explainability: A standout feature of OpenScale is its ability to offer model explainability. For instance, it can show how changing specific input variables would lead to different outcomes, which is vital for understanding and justifying the decisions made by the model.
Monitoring: OpenScale continuously monitors models in real time, alerting stakeholders to any performance issues, such as model drift, that could affect the reliability of predictions.
Model Health Metrics: OpenScale tracks critical health metrics such as throughput, which measures the number of scoring requests and transaction records processed per second, and latency, which tracks the time taken to process these requests and records. These metrics ensure the model operates efficiently and responds promptly.

3. OpenPages:

Risk and Compliance: OpenPages is designed to address the complex landscape of AI regulation and model risk management. It helps organizations align their predictive models with regulatory requirements, providing tools for risk assessment and compliance management.
Dashboard and Workflow Management: It also offers powerful dashboards for tracking model performance and data quality. Additionally, it includes workflow management features that enforce compliance with specified conditions, ensuring that all necessary steps are followed before a model is deployed.
Issue Management: OpenPages enables businesses to assign, track, and resolve issues related to model governance. This ensures that any problems are promptly addressed, maintaining the integrity and reliability of predictive models.

A Business Use Case: Credit Risk Assessment

Consider a financial institution that uses predictive models to assess the credit risk associated with loan applications. This process is crucial as it helps determine whether applicants are likely to repay their loans, thereby impacting the institution’s financial stability. For example, when a bank receives a loan application, it must decide whether to approve or reject the loan based on the applicant’s profile. This decision carries two types of risks:

If the applicant is a good credit risk, meaning they are likely to repay the loan, rejecting the application could lead to a missed business opportunity for the bank.
If the applicant is a bad credit risk, meaning they are unlikely to repay loan, approving the application could result in financial loss for the bank.

These models use various attributes such as credit history, loan duration, loan amount, and other financial indicators to classify applicants as either good or bad credit risks. By effectively managing this model, the institution can ensure fair and accurate credit evaluations, which is essential for sound financial operations.

Practical Implementation

For credit risk classification, we will start by leveraging IBM AutoAI to streamline the model-building and training process. AutoAI simplifies this process by automating data preparation, feature engineering, model selection, and hyperparameter tuning. Using our credit risk dataset, which includes attributes such as credit history, loan duration, and loan amount, AutoAI will preprocess the data, train various models, and select the best-performing one based on accuracy and other performance metrics. This automation ensures the creation of a robust and well-tuned model for predicting credit risk, with the added benefit of seamless integration with IBM Cloud for easy deployment and scalability.

Once the model is trained, we will proceed with its deployment and ongoing governance using following suite of tools:

AI Factsheets: After deploying the model, we will create a Factsheet to document its purpose, data sources, and performance metrics. This documentation provides transparency and helps stakeholders understand the model’s functionality throughout its lifecycle.
OpenScale: For continuous monitoring, we will use OpenScale to track the model’s performance in real-time. OpenScale will monitor key metrics such as accuracy, fairness, drift, and model health metrics like latency and throughput, ensuring the model remains reliable and unbiased while operating efficiently.
OpenPages: OpenPages will manage compliance and risk with dynamic dashboards for real-time insights and customizable analytics. It features embedded workflows for task automation and risk management, allowing for detailed root cause analysis and streamlined compliance processes.

Create a Model Inventory

The watsonx.governance solution allows organizations to group and track their models based on use cases, or issues that models are attempting to solve. Each use case has its own lifecycle information for candidate models in development, testing, and production phases. These use cases are collected as assets in a Model Inventory.

Log in to Cloud Pak for Data using the Console Route URL, Username, and Password information.
From the home screen, click the hamburger menu in the upper left. Under AI governance section, click on AI use cases.
Click the gear icon next to the AI use cases to open the Manage menu.

Under Manage screen, click on Inventories. Then click New Inventory and fill in the inventory information.

Under General settings, click the toggle button for Governance console (IBM OpenPages) integration to activate it. This basically enables the watsonx.governance console (OpenPages) integration with Factsheets and the watsonx.governance monitoring service (OpenScale).

Create a Model Usecase

Now under watsonx.governance console (OpenPages), click the hamburger menu in the upper left. Under the Inventory section, click on Use Cases.
Click on New button, and fill out the mandatory fields to create a new model use case. A Model Usecase is meant to track and capture information about a collection of models that will be built to serve a particular purpose. It should be created whenever there is a business need requiring one or more models to be built. Model records can then be added as a child of the use case.

Train the Model using AutoAI

In this section, we will train a model with AutoAI, which is IBM’s rapid model prototyping service. This service can quickly generate predictive machine learning models from tabular data and save the output as either a Jupyter notebook or a ready-to-deploy model.

From the home screen, click the hamburger menu on the upper left. Under Projects section, click on New project.

Under Assets tab, click on the New asset. Select AutoAI under Automated builders tool type to create and run an AutoAI experiment based on the german credit risk dataset. Define details such as name, description and runtime configuration, then click on Create button.

Upload your dataset to the platform. Choose the target variable (label) that you want to predict.
Define constraints like maximum training time, available resources (e.g., CPU/GPU), and other parameters.
Optionally, you can tweak advanced settings like the number of folds for cross-validation, metric to optimize, data splitting methods, and more.
Initiate the Auto AI experiment. The tool will automatically preprocess the data, select features, and train multiple models. Auto AI will test different algorithms, tune hyperparameters, and rank models based on performance.

Once the experiment completes, review the top-performing model.

Click on Save As corresponding to the pipeline with best performing model on the provided metric. On the pop-up window, select Model as asset type to save the model and click on Create button.
Locate the saved AutoAI model in the list of assets and click on it to open the model information screen. Take a moment to review the details provided, noting that you can export this information as a PDF report by clicking the “Export report” link. The metadata displayed includes the creation date of the model, the creator’s identity, the prediction type, the algorithm used, and details about the training dataset. Scroll down to the “Training metrics” section to see the initial quality metrics generated by AutoAI during model creation. Additionally, the model’s input schema is also included.

Create a Deployment Space

Under Deployments section, click the New deployment space button to create a deployment space. A deployment space is an object in Cloud Pak for Data and watsonx that contains deployable assets, deployments, deployment jobs, associated input and output data, and the associated environments.
When the space is finished creating, you will be able to deploy models to it as REST endpoints, and can begin monitoring the models in the monitoring service.

Deploy the Model

Under Projects > Assets, you will find the AutoAI experiment and the saved notebook and model.
For the saved model, click on three dots > Promote to space. Select the target space and click on Promote button in order to promote the asset to the specified deployment space.

Go to the hamburger menu in the top left-hand corner, click the Deployments section, select the deployment space where the trained model template was promoted.
Navigate to the Assets tab, and find the Model asset. Click on the three dots on the right and select Deploy. The deployed model in the specified deployment space is now ready for evaluation.
When the deployment is finished, the Status in the displayed table will change to Deployed. Click on the name of the deployment.

Note that the API Reference tab provides details such as direct URLs to the model and code snippets in various programming languages for application developers to include the model in their apps.

Also, under AI Factsheet section — you can track all the metadata about the model and its underlying data, ensuring transparency and traceability. Factsheets basically allow us to track all the metadata and monitoring data and keep it associated with that particular model and centralise the information so as to allow different stakeholders to collaborate through one common platform and track metadata in an organised and intuitive way and accelerate model development lifecycle.

Track the Model

Under AI Factsheet tab, click on the Track in AI use case.

Provide the following details as shown below — specify the AI use case created in Openpages. Define the approach and version, then review and click on Track asset to start tracking the model.

Configure the Deployment Space for Monitoring

Before evaluating the model, we need to configure the deployment space for monitoring in Openscale. In this section, we will evaluate the deployed model for different metrics like drift, quality and fairness.

On the OpenScale (watsonx.governance monitoring service) dashboard, click on the name of your deployment.

Now we need to configure the monitor settings. Select the Actions > Configure monitors options to start configuring.

First we need to provide the information about the training data and deployed model output to prepare watsonx for monitoring and providing explanations for model transactions. We can setup model evaluation details either manually via uploading the training data on the UI shown below or run a notebook to generate a file that we can upload to configure evaluations.
Under Model info > Model details, click on edit icon for Configuration package tile. Then upload or connect to training data that is stored in a database or cloud storage to configure model evaluations.
Select the feature columns and label column. Then select the model output column name. Review the model summary and click Finish to complete setup.

Configure Fairness and Quality

To monitor fairness, you need to identify favorable and unfavorable outcomes, as well as monitored and reference groups.

In our particular example, “No Risk” represents positive or desirable outcome while “Risk” would be the unfavorable outcome, as it signifies that the individual is considered a credit risk. Use the checkboxes to mark “Risk” as Unfavorable and “No Risk” as Favorable. Click on the Next button.

The Sample size screen opens. Enter a value, let’s say 100 in the Minimum sample size field. This will allow you to calculate evaluations without needing more than 100 rows of data. Click on the Next button. The Metrics screen opens.
Multiple metrics are available for measuring fairness. Two of them (Disparate impact and Statistical parity difference) can be calculated at runtime strictly from data being submitted to the model. We will go with default monitored metric and its threshold value i.e., Disparate impact and 80% threshold value.
Click on the Next button. The Select the fields to monitor screen opens. We can either select the fields or use the Watson OpenScale recommended features to be monitored for fairness based on the analysis of the training data.

Click on the Save button to save your fairness configuration.

Next, we will configure the quality monitors. The Quality monitor evaluates how well your model predicts accurate outcomes. It identifies when model quality declines, so you can retrain your model appropriately.

Under Quality > Quality thresholds, click on the pencil icon to select the quality thresholds for different metrics available.
Enter 100 in the Minimum sample size field. And click on the Save button to save your configuration.

Configure Explainability

Explainability identifies the factors that influence a model outcome. In the Explainability section, click the edit button in the Explanation method tile to configure explainability. There are two types of explanations:

Local: Factors that influence a model outcome of a specific transaction. There are two methods available.

Global: Holistic factors that influence model outcomes in general. There is one method available.

Watsonx.governance offers two different algorithms to explain predictions: LIME (Local Interpretable Model-Agnostic explanations), and SHAP (SHapley Additive exPlanations). Global explanations use the SHAP algorithm while local explanations can use SHAP or LIME (enhanced).

Configure Drift

Drift refers to the decline in model performance due to changes in data or shifts in the relationships between inputs and outputs.

To configure drift, go to the Evaluations section on the left, and select Drift v2. Click on the edit icon in the Compute the drift archive tile. Since you uploaded the training data while setting up the monitors, Watson OpenScale can now compute the necessary statistics to measure drift. Click Next.

Keep the default drift thresholds as they are and click Next again. You’ll be taken to the Important features screen.
When building your model in AutoAI, you identified the features that had the most significant impact on the model’s output. Find those features in the list and check the boxes next to them to mark them as important.
After selecting all the important features, click Next to proceed. Leave the Minimum sample size at its default setting, then click Save. Watson OpenScale will begin training the drift model in the background, which may take up to five minutes. Once completed, the monitors will be fully configured, and you can evaluate the model.

Evaluate the AutoAI Model

Click on the Dashboard link in the upper left corner of the screen to return to the Insights dashboard.
Under Actions, click on the Evaluate now option. The Import test data panel opens.
Upload the payload and the feedback data and click on the Evaluate now button to begin the model evaluation.
When the evaluation has finished, take a moment to review the results.

We can also visualize model transactions according to predictions, feature values, and model confidence.

Conclusion

As AI continues to evolve, traditional predictive models remain a cornerstone of business decision-making. However, their growing complexity and the increased regulatory scrutiny demand a more sophisticated approach to governance. IBM watsonx.governance offers a comprehensive solution to these challenges, providing the tools needed to monitor, manage, and govern predictive models effectively. By leveraging watsonx.governance, organizations can ensure that their models not only deliver accurate predictions but also maintain the trust and compliance required in today’s business environment.