Case Study

Renewable Energy Retailer Cuts ML Model Training from 18 Hours to 20 Minutes with Databricks on AWS

At a glance

A US renewable energy retailer’s data science team was blocked by 18-hour model training times and dataset scale limits on their existing single-instance infrastructure, preventing them from iterating on critical pricing and forecasting models needed to support customer growth.

NMD deployed a Databricks lakehouse architecture on AWS with distributed Spark compute in 8 weeks, reducing model training time by 90% (18 hours to 20 minutes) and eliminating dataset processing limits, enabling 54x faster iteration on revenue-critical models while cutting compute costs by 75% per model run.

The solution is now in production across all environments (dev, staging, prod), providing a scalable MLOps foundation for unlimited customer growth without platform re-architecture.

Industry

Use Case

Solution implemented

Databricks lakehouse architecture on AWS EC2 with distributed Spark clusters replacing single-instance compute bottlenecks
Migrated existing MLflow server to Databricks' managed MLflow for unified experiment tracking, model versioning, and production deployment
Amazon S3 data lake for scalable storage of energy consumption data, customer profiles, and MLflow model artifacts
Terraform infrastructure-as-code for reproducible environment provisioning across dev, staging, and production
Custom VPC networking integrating Databricks with existing RDS databases, security groups, and subnets
Integrated data pipeline connecting Snowflake data warehouse, GitHub version control, and Airflow orchestration on Kubernetes
SSO integration for secure, streamlined access for data science and engineering teams
IAM instance-profile roles enabling secure cross-service access between Databricks clusters and AWS services

The value equation

54x faster iteration velocity on pricing and forecasting models, accelerating competitive response and revenue optimization
Eliminated dataset scale ceiling, enabling unlimited customer growth without costly platform re-architecture
75% reduction in compute costs per model run, making ML experimentation economically scalable
Production-grade MLOps foundation enabling reproducible, governed model deployment across dev, staging, and prod environments
Unified model lifecycle management through managed MLflow, reducing operational complexity and improving data scientist productivity

Company Snapshot

A US-based renewable energy retailer providing clean wind, solar, and hydro power to residential and commercial customers with predictable pricing models. The company leverages advanced data science and ML for dynamic pricing optimization, demand forecasting, and customer acquisition modeling to compete effectively in the clean energy marketplace.

Location

United States

Customer Situation

A US renewable energy retailer was experiencing critical bottlenecks in their data science operations. Model training times had ballooned to 18 hours per run, blocking their data science team from iterating on pricing optimization and demand forecasting models that directly impact revenue. Their existing infrastructure—running training jobs on large single EC2 instances alongside a sophisticated data platform of Snowflake, dbt, Airflow, and Kubernetes—had reached hard dataset scale limits that threatened to block customer growth.

In the competitive renewable energy retail market, an estimated 60-70% of providers struggle with similar data science scaling challenges as they grow beyond initial customer bases. The inability to quickly iterate on pricing models creates competitive disadvantage, while dataset processing limits force expensive platform migrations or cap growth potential.

Additionally, the team’s self-managed MLflow server lacked the reliability and unified experience needed for enterprise MLOps, creating reproducibility and portability challenges across their tiered development environments (dev, staging, prod).

NMD Solution

NMD reviewed the customer’s infrastructure and identified that the root technical problem was architectural: running ML training on single large EC2 instances created an unscalable bottleneck. As datasets grew with customer acquisition, training times increased linearly, and there was no path to horizontal scaling without complete re-architecture.

Solving this required migrating to a distributed compute paradigm using Databricks’ managed Spark clusters on AWS, which could process datasets in parallel and scale elastically based on workload demands. Additionally, migrating the existing MLflow server to Databricks’ managed service would provide a unified platform for model development, experiment tracking, management, and deployment.

The solution required custom VPC networking to integrate Databricks with the customer’s existing RDS instance, ensuring optimal security and performance within their existing AWS infrastructure.

What We Delivered

Within 8 weeks, NMD deployed the Databricks Lakehouse on AWS solution across all environments, achieving a 90% reduction in model training time (18 hours to 20 minutes) and enabling 54x faster iteration cycles on pricing and forecasting models. This breakthrough eliminated the experimentation bottleneck that was blocking competitive model improvements.

The distributed Spark architecture removed all dataset scale limits, unlocking unlimited customer growth without requiring future platform re-architecture—eliminating a major strategic risk. The 75% reduction in compute costs per model run (achieved through shorter job durations) made ML experimentation economically scalable, allowing the data science team to run more experiments without budget constraints.

The migration from self-managed MLflow to Databricks’ managed MLflow service provided a unified experience for model development, experiment tracking, model management, and deployment across all environments, with S3 utilized for model artifact storage. This improved MLOps reliability, reproducibility, and portability between dev, staging, and prod environments.

The solution integrated seamlessly with existing tools (Snowflake, Airflow, GitHub) via Terraform-managed infrastructure and custom VPC networking that connected Databricks to existing RDS databases using existing security groups and subnets. SSO integration provided secure, streamlined access for the data science and engineering teams.

Ready to Transform Your Energy & Utilities Business?

See how New Math Data can transform your utility business with AWS-powered innovation.

Tagged Utilities

Cloud Architecture

Data Engineering

AWS Competency

Cloud Migration

Data Science & Analytics

Generative AI

MLOps

AI Path to Production

Production-Grade Agentic Solutions

Migration and Operation

Enterprise Adoption

Enterprise Intelligence

Financial Services

Energy & Utilities

Healthcare & Life Sciences

Education

Careers

Resource Documents

Leadership

About Us

Texas DBITS

Events

News