Case Study

Renewable Energy Retailer Cuts ML Model Training from 18 Hours to 20 Minutes with Databricks on AWS

At a glance

A US renewable energy retailer’s data science team was blocked by 18-hour model training times and dataset scale limits on their existing single-instance infrastructure, preventing them from iterating on critical pricing and forecasting models needed to support customer growth.

NMD deployed a Databricks lakehouse architecture on AWS with distributed Spark compute in 8 weeks, reducing model training time by 90% (18 hours to 20 minutes) and eliminating dataset processing limits, enabling 54x faster iteration on revenue-critical models while cutting compute costs by 75% per model run.

The solution is now in production across all environments (dev, staging, prod), providing a scalable MLOps foundation for unlimited customer growth without platform re-architecture.

Industry

Use Case

Solution implemented

The value equation

Company Snapshot

A US-based renewable energy retailer providing clean wind, solar, and hydro power to residential and commercial customers with predictable pricing models. The company leverages advanced data science and ML for dynamic pricing optimization, demand forecasting, and customer acquisition modeling to compete effectively in the clean energy marketplace.

Location

United States

Customer Situation

A US renewable energy retailer was experiencing critical bottlenecks in their data science operations. Model training times had ballooned to 18 hours per run, blocking their data science team from iterating on pricing optimization and demand forecasting models that directly impact revenue. Their existing infrastructure—running training jobs on large single EC2 instances alongside a sophisticated data platform of Snowflake, dbt, Airflow, and Kubernetes—had reached hard dataset scale limits that threatened to block customer growth.

In the competitive renewable energy retail market, an estimated 60-70% of providers struggle with similar data science scaling challenges as they grow beyond initial customer bases. The inability to quickly iterate on pricing models creates competitive disadvantage, while dataset processing limits force expensive platform migrations or cap growth potential.

Additionally, the team’s self-managed MLflow server lacked the reliability and unified experience needed for enterprise MLOps, creating reproducibility and portability challenges across their tiered development environments (dev, staging, prod).

NMD Solution

NMD reviewed the customer’s infrastructure and identified that the root technical problem was architectural: running ML training on single large EC2 instances created an unscalable bottleneck. As datasets grew with customer acquisition, training times increased linearly, and there was no path to horizontal scaling without complete re-architecture.

Solving this required migrating to a distributed compute paradigm using Databricks’ managed Spark clusters on AWS, which could process datasets in parallel and scale elastically based on workload demands. Additionally, migrating the existing MLflow server to Databricks’ managed service would provide a unified platform for model development, experiment tracking, management, and deployment.

The solution required custom VPC networking to integrate Databricks with the customer’s existing RDS instance, ensuring optimal security and performance within their existing AWS infrastructure.

What We Delivered

Within 8 weeks, NMD deployed the Databricks Lakehouse on AWS solution across all environments, achieving a 90% reduction in model training time (18 hours to 20 minutes) and enabling 54x faster iteration cycles on pricing and forecasting models. This breakthrough eliminated the experimentation bottleneck that was blocking competitive model improvements.

The distributed Spark architecture removed all dataset scale limits, unlocking unlimited customer growth without requiring future platform re-architecture—eliminating a major strategic risk. The 75% reduction in compute costs per model run (achieved through shorter job durations) made ML experimentation economically scalable, allowing the data science team to run more experiments without budget constraints.

The migration from self-managed MLflow to Databricks’ managed MLflow service provided a unified experience for model development, experiment tracking, model management, and deployment across all environments, with S3 utilized for model artifact storage. This improved MLOps reliability, reproducibility, and portability between dev, staging, and prod environments.

The solution integrated seamlessly with existing tools (Snowflake, Airflow, GitHub) via Terraform-managed infrastructure and custom VPC networking that connected Databricks to existing RDS databases using existing security groups and subnets. SSO integration provided secure, streamlined access for the data science and engineering teams.

Ready to Transform Your Energy & Utilities Business?

See how New Math Data can transform your utility business with AWS-powered innovation.