Who we are

Contacts

1815 W 14th St, Houston, TX 77008

info@newmathdata.com

281-817-6190

General

Accurate by Design: Advanced Data Quality on AWS

Introduction Data pipelines in AWS orchestrate the movement and transformation of data across various AWS services. The core objective of these pipelines is to enable efficient data processing, analysis, and storage, ensuring that data is available where and when it is needed. Maintaining high data quality throughout this process is critical; it ensures reliability, accuracy, […]

General

The Problem of Overfitting in Machine Learning

By Lena Qian Introduction Machine learning stands as a pivotal element in contemporary data science, fundamentally altering the landscape of predictive analytics and decision-making across various domains. Despite its widespread adoption, a significant impediment persists in the form of overfitting, wherein machine learning models have high accuracy with training data but fail when presented with […]

General

The Art of Collaboration in Distributed Teams

Introduction In today’s rapidly evolving technological landscape, cloud data engineering projects are at the forefront of innovation, driving businesses towards unprecedented levels of efficiency, scalability, and data-driven decision-making. Central to the success of these projects is the concept of distributed teams – groups of individuals who work together from various geographical locations, leveraging the power […]

General

Advanced Unit Testing in AWS

Leveraging Moto and Pytest Introduction In the world of AWS development, ensuring the reliability, efficiency, and correctness of your cloud-based applications is paramount. As cloud solutions grow increasingly complex, so too does the challenge of effectively testing these systems. Traditional testing methods often fall short in the face of AWS’s vast and intricately interconnected services. […]

General

Reliability by design: Implementing Test Driven Development Strategies in Python Data Engineering

Introduction In the rapidly evolving field of data engineering, maintaining high-quality, reliable, and efficient data pipelines is crucial for businesses to make informed decisions and stay competitive. One methodology that has been instrumental in achieving these objectives is Test-Driven Development (TDD). At its core, TDD involves a simple, yet powerful cycle: write a failing test […]

General

LLMs and chatbots: a brief update

Generally and historically, data engineering, analytics, and science efforts focused on progressing from data to knowledge/wisdom. The emergence of LLMs allows for the decomposition of wisdom/knowledge back down to data. This can enable novel discovery, integrate with information systems, and drive automated processes. GenAI Categories Generation: Use bedrock models to create code, text, or images […]

General

Agile ‘thin slice’ technique: Explained

Introduction In today’s fast-paced development environment, the Agile methodology stands out for its emphasis on delivering functional features to users as early as possible. This approach challenges traditional, lengthy development cycles by advocating for the incremental release of a product’s most essential functionalities. By prioritizing early delivery, Agile aims to provide immediate value to users, […]