Case Study

Numida Reduces Fraud and Accelerates Underwriting with ML on AWS

At a glance

Numida partnered with New Math Data to build a machine learning pipeline that detects internal and external fraud in digital loan applications. Using AWS-native services including Amazon SageMaker, AWS Lambda, and Step Functions, the solution delivers real-time scoring, retrains weekly, and enables rapid iteration, improving risk assessment and laying the foundation for scalable fraud prevention across new financial products.

Industry

Use Case

Solution implemented

Designed and deployed an AWS-native machine learning pipeline using Amazon SageMaker and AWS Step Functions
Built a custom container to retrieve training data from Amazon RDS, execute training jobs on EC2, and store models and artifacts in S3
Selected XGBoost for its accuracy on structured data, resilience to missing values, and interpretable feature importance outputs
Automated on-demand inference with AWS Lambda for both individual and batch loan scoring
Integrated prediction outputs directly into RDS and underwriting workflows for real-time decision-making
Provided infrastructure as code and documentation to support long-term reuse via Numida’s Terraform modules

The value equation

F1 score of 0.98 on internal fraud test set (80/20 split, 300k+ records)
On-demand scoring at the point of submission using real-time Lambda inference
Data management and model iteration cycles reduced from days to hours
Infrastructure adopted as the foundation for future machine learning workflows
Strong alignment with AWS ecosystem for scalability and cost efficiency

Company Snapshot

Numida empowers Africa’s overlooked micro- and small businesses (MSBs) with digital financial services tailored for growth.

Location

Africa

Numida Builds Real-Time Fraud Detection to Support Scalable Credit Decision

As one of Africa’s leading digital lenders to micro- and small businesses (MSBs), Numida needed a way to identify both internal and external fraud before it impacted revenue or risk scores.

Internal fraud, where staff might improperly approve loans to boost commission or in collusion with applicants, required a sensitive, high-precision approach. External fraud, where applicants apply with no intent to repay, posed a scaling risk as Numida expanded its loan portfolio.

Working with New Math Data, Numida launched a proof of concept to move from manual fraud analysis to a production-grade, automated machine learning pipeline that retrains weekly and delivers scores in real time.

The pipeline not only flags suspicious activity at the point of submission but also establishes a robust foundation for model evolution as fraud patterns change.

Problem

Numida’s fraud detection was historically ad hoc. Loan records existed in Amazon RDS, but manual analysis and slow iterations made it hard to respond to evolving fraud techniques—especially as new financial products came online.

To prepare for scale, Numida needed a solution that could:

Support both internal and external fraud detection use cases
Handle highly imbalanced datasets (5% fraud / 95% non-fraud)
Deliver explainable predictions that could be trusted by underwriters
Integrate seamlessly into underwriting workflows with low latency
Automate model retraining and support infrastructure reuse

Solution

Building an AWS-Native ML Pipeline for Real-Time Fraud Scoring

New Math Data designed and implemented a machine learning pipeline tailored for structured data fraud detection using AWS-native services.

Training Workflow

Triggered by AWS Step Functions, a custom SageMaker training container pulls feature data from Amazon RDS, trains an XGBoost model on EC2, and saves results to S3.

Model Choice & Evaluation

XGBoost was selected for its strong performance on structured datasets and support for feature importance scoring. The F1 metric was used to evaluate success due to the imbalanced nature of fraud data. The model was trained on ~300,000 records (an 80/20 split) with only 5% labeled as fraudulent, and achieved a high precision F1 score of 0.98 on the test set.

On-Demand Inference

An AWS Lambda function scores incoming applications in real time, calling the trained model, writing results back to RDS, and logging outputs for further analysis.

Scalable Infrastructure

The entire infrastructure was provisioned using Terraform and handed over with clear documentation to enable reuse across future ML workflows.

Related Case Studies

Inspire Clean Energy

Streamlined Data Processing and MLOps with Spark

Vertically Integrated Utility Company

Migration of data systems and applications to AWS.

Ready to Transform Your Fintech Business?

See how New Math Data can transform your fintech business with AWS-powered innovation.

Tagged Data Engineering, Fintech, MLOps