Who we are

Contacts

1815 W 14th St, Houston, TX 77008

281-817-6190

AI Artificial Intelligence AWS Machine Learning

Boost AI Fairness and Explainability with Amazon SageMaker Clarify

From hiring decisions to loan approvals and even healthcare recommendations, machine learning (ML) impacts our lives daily. Fairness and explainability are crucial in this context.

Fairness means data is balanced, and model predictions are fair across groups. Checking for fairness ensures that negative outcomes are fair across all groups, such as age or gender. Explainability means understanding which features have the most influence on the prediction.

Fair and explainable models have several benefits.

  • They enable the creation of transparent AI systems that will speed adoption and meet compliance objectives, especially in highly regulated environments like financial services, human resources, healthcare, and automated transportation.
  • Fair and explainable models also increase confidence in predictions, require less time for debugging, and give better insight into predictions.

Amazon SageMaker Clarify is an AWS service that detects data bias and explains model predictions. SageMaker Clarify can be used with tabular, natural language processing (NLP), and computer vision models.

HOW TO GET STARTED WITH SAGEMAKER CLARIFY

AWS provides many SageMaker Clarify sample notebooks on GitHub. The notebooks assume that you have an AWS account and have enough familiarity with SageMaker Studio to clone a Git repo and run a Jupyter notebook. If any of these workflows are not familiar, follow these setup steps to be able to run the Clarify sample notebooks. Having some exposure to SHapley Additive exPlanations (SHAP) will make interpreting the Clarify results easier.

The notebooks have all the normal machine learning pipeline steps like data acquisition, exploratory data analysis, data prep, model training, and deploying a SageMaker endpoint. The notebooks differ in the Clarify job configurations and the fairness and explainability results.

Now, let’s run through three sample notebooks.

FAIRNESS AND EXPLAINABILITY WITH TABULAR DATA

The notebook with tabular data uses the Adult dataset from the UC Irvine Machine Learning Repository, which has 14 features like age, education, occupation, race, and sex. The notebook uses an XGBoost model to predict whether income exceeds $50,000 annually.

The notebook ran for approximately 30 minutes on an ml.t3.medium instance with the Data Science 3.0 kernel. One cell in the notebook errored, but the error message described clearly how to fix the issue.

To measure fairness, a SageMaker Clarify job calculates pre-training bias and post-training bias metrics for the sensitive feature ‘Sex’ to see if the model favors males or females. The metrics are viewable inside SageMaker Studio and as HTML, .ipynb, JSON, or PDF files stored in S3. Figure 1 is a sampling of the output. It shows a bar chart showing the distribution of the labels (1 means income greater than $50,000) plus a table with the 9 fairness metrics for Females (Sex=0).

A bar chart showing the distribution of the labels (1 means income greater than $50,000) plus a table with the 9 fairness metrics for Females (Sex=0).

There are also Clarify results for the post-training metrics. Figure 2 is an additional part of the output, which shows the table list of the 13 post-training metrics.

Part of the output, which shows the table list of the 13 post-training metrics

Some skill is needed to interpret the results and decide which metrics are the best fairness indicators for this project. To deep dive, see the article Learn How Amazon SageMaker Clarify Detects Bias and Amazon’s whitepaper about AI Fairness and Explainability.

For explainability, the notebook shows how to configure a second Clarify job to uncover which features contribute to the model predictions. Like fairness metrics, the results can be viewed in SageMaker Studio or from files stored in S3. The results are given as SHAP values, giving the top 10 features (out of the 14 in the dataset) with the greatest feature attribution. For this notebook, “Capital Gain” and “Country” are the top two features contributing to the model predictions, as seen in Figure 3.

For this notebook,

EXPLAINABILITY WITH IMAGE DATA

The notebook with image data uses images from the Caltech-256 dataset to demonstrate image classification. This notebook ran for approximately 30 minutes on an ml.t3.medium instance with the Data Science 3.0 kernel. No code changes are required.

For explainability, the notebook runs a Clarify job configuration and puts the results into S3 as a PDF file. The results show three visuals for each image in the test set: the original image with the predicted classification, the image segments, and a heatmap of which features increase the overall confidence score and which features decrease the overall confidence score for each image.

Figure 4 shows an image that was classified as a gorilla. The heatmap on the far right has some pink and red areas, which are the segments that decrease the confidence score.Figure 5 shows the results for an image classified as a pyramid. The heat map on the right has fewer white/pink segments and no red segments.

This type of explainability for images helped uncover some computer vision projects gone wrong, like the project that classified animals in snow as wolves and the project that classified images with rulers as skin cancer.

TEXTUAL DATA EXAMPLE

The notebook with textual data uses a collection of women’s clothing reviews (compliments of Kaggle) to do sentiment analysis. Of the three notebooks, this one had the most errors and needed the most corrections. An ml.t3.medium instance could be used, but the ‘PyTorch 1.10 Python 3.8 CPU Optimized’ kernel was required. The pip installs needed corrections, and the instance size for training needed to be changed to ‘ml.p3.2xlarge’ due to a ResourceLimitExceeded error from using an ‘ml.g4dn.xlarge’ instance. As instance resource limits vary from one account to another, your instance type options may differ.

These are the installs needed:

!pip install “datasets[s3]==1.6.2”

!pip install captum – upgrade – quiet

!pip install – upgrade boto3

For the Clarify job, it uses TextConfig to specify the granularity of the explanations (sentence, paragraph, or token). After the Clarify job runs, the results are visualized to understand which parts of the review contributed to the labeling.

Figure 6 shows the results from the sentence configuration. Each sentence is highlighted to show whether it is negative, neutral, or positive.

Each sentence is highlighted to show whether it is negative, neutral, or positive.

Figure 7 shows the results for the token configuration. Each word is highlighted to show whether it is negative, neutral, or positive.

Each word is highlighted to show whether it is negative, neutral, or positive.

Again, some skill is required to dissect and understand the explainability report. I am wondering, in the second screenshot on line one, why “dress” is highlighted in a medium green, indicating that the word is mostly positive. Seeing that makes me wonder if the dataset was skewed to have mostly dress reviews that were positive? Do some negative reviews of dresses need to be included to make the data more balanced?

CONCLUSION

SageMaker Clarify offers solid features to improve fairness and explainability. Interpreting the results takes some skill and may require collaboration with the entire data science team. AWS gives some great sample notebooks for getting started and other resources for deep diving. I hope this helps you on your journey of creating fair and explainable AI.

REFERENCES

  1. Amazon SageMaker Clarify
  2. Learn How Amazon SageMaker Clarify Detects Bias