Numida Reduces Fraud and Accelerates Underwriting with ML on AWS
At a glance
Numida partnered with New Math Data to build a machine learning pipeline that detects internal and external fraud in digital loan applications. Using AWS-native services including Amazon SageMaker, AWS Lambda, and Step Functions, the solution delivers real-time scoring, retrains weekly, and enables rapid iteration, improving risk assessment and laying the foundation for scalable fraud prevention across new financial products.
Industry
Use Case
- Machine Learning Infrastructure
- Data Engineering & Feature Pipelines
- ML Evaluation & Inference Automation
Solution implemented
- Designed and deployed an AWS-native machine learning pipeline using Amazon SageMaker and AWS Step Functions
- Built a custom container to retrieve training data from Amazon RDS, execute training jobs on EC2, and store models and artifacts in S3
- Selected XGBoost for its accuracy on structured data, resilience to missing values, and interpretable feature importance outputs
- Automated on-demand inference with AWS Lambda for both individual and batch loan scoring
- Integrated prediction outputs directly into RDS and underwriting workflows for real-time decision-making
- Provided infrastructure as code and documentation to support long-term reuse via Numida’s Terraform modules
The value equation
- F1 score of 0.98 on internal fraud test set (80/20 split, 300k+ records)
- On-demand scoring at the point of submission using real-time Lambda inference
- Data management and model iteration cycles reduced from days to hours
- Infrastructure adopted as the foundation for future machine learning workflows
- Strong alignment with AWS ecosystem for scalability and cost efficiency
Company Snapshot
Numida empowers Africa’s overlooked micro- and small businesses (MSBs) with digital financial services tailored for growth.
Location
Africa
Numida Builds Real-Time Fraud Detection to Support Scalable Credit Decision
As one of Africa’s leading digital lenders to micro- and small businesses (MSBs), Numida needed a way to identify both internal and external fraud before it impacted revenue or risk scores.
Internal fraud, where staff might improperly approve loans to boost commission or in collusion with applicants, required a sensitive, high-precision approach. External fraud, where applicants apply with no intent to repay, posed a scaling risk as Numida expanded its loan portfolio.
Working with New Math Data, Numida launched a proof of concept to move from manual fraud analysis to a production-grade, automated machine learning pipeline that retrains weekly and delivers scores in real time.
The pipeline not only flags suspicious activity at the point of submission but also establishes a robust foundation for model evolution as fraud patterns change.
Problem
Numida’s fraud detection was historically ad hoc. Loan records existed in Amazon RDS, but manual analysis and slow iterations made it hard to respond to evolving fraud techniques—especially as new financial products came online.
To prepare for scale, Numida needed a solution that could:
- Support both internal and external fraud detection use cases
- Handle highly imbalanced datasets (5% fraud / 95% non-fraud)
- Deliver explainable predictions that could be trusted by underwriters
- Integrate seamlessly into underwriting workflows with low latency
- Automate model retraining and support infrastructure reuse
Solution
Building an AWS-Native ML Pipeline for Real-Time Fraud Scoring
New Math Data designed and implemented a machine learning pipeline tailored for structured data fraud detection using AWS-native services.
Training Workflow
Triggered by AWS Step Functions, a custom SageMaker training container pulls feature data from Amazon RDS, trains an XGBoost model on EC2, and saves results to S3.
Model Choice & Evaluation
XGBoost was selected for its strong performance on structured datasets and support for feature importance scoring. The F1 metric was used to evaluate success due to the imbalanced nature of fraud data. The model was trained on ~300,000 records (an 80/20 split) with only 5% labeled as fraudulent, and achieved a high precision F1 score of 0.98 on the test set.
On-Demand Inference
An AWS Lambda function scores incoming applications in real time, calling the trained model, writing results back to RDS, and logging outputs for further analysis.
Scalable Infrastructure
The entire infrastructure was provisioned using Terraform and handed over with clear documentation to enable reuse across future ML workflows.
Related Case Studies
Vertically Integrated Utility Company
Migration of data systems and applications to AWS.
Ready to Transform Your Fintech Business?
See how New Math Data can transform your fintech business with AWS-powered innovation.