Who we are

Contacts

1815 W 14th St, Houston, TX 77008

281-817-6190

Clean-code Python

Clean Code’s Hidden Impact: Unraveling the Python Performance Paradox

Abstract Representation of Polymorphism

Python Performance: Issue 1 – The Polymorphism Rule

Welcome to Python Performance

Welcome to the Python Performance blog series. In this series, I will be exploring various performance topics in Python, with the aim to create a list of heuristics to help developers write more performant Python code before they ever start thinking about reaching for the tried and true timeit module or a code profiler. I hope you find this series informative and if you have any future topics you would like to see me explore, please reach out and let me know. We will be starting the series with a miniseries following up on a broader discussion of the impact of the Clean Code methodology on software performance.

Clean Code and Performance

starryai: A developer washing their computer screen

Quick Background on Clean Code

Clean Code was published by Robert Martin, aka Uncle Bob, in 2009. The purpose of the book was to outline what defined “professional” code, or how to code with “craftsmanship”. These are broad terms, but they basically mean how to write code that you’re proud of, that won’t cause an absurd WTF per minute count in code reviews, and that reduces the need for a “great rewrite”, where a team opts to rewrite an entire code base. The business arguments for Clean Code include reduced time to deliver new features and bug fixes, as well as reduced maintenance costs.

Clean Code vs Performance

In February 2023, Casey Muratori published the video ‘”Clean” Code, Horrible Performance’ and its accompanying blog post. While we can find people discussing the impact of Clean Code and performance long before this, Casey’s video did set off a large discussion recently on the issue. Casey lays out the case that the rules of Clean Code set out by Robert Martin, aka Uncle Bob, are terrible for performance. Casey states that there are only five rules of Clean Code that impact the structure of your code. They are, with the specific Clean Code heuristic in parentheses:

  • Prefer polymorphism to if-else statements (G23)
  • Code should not know about the internals of objects it’s working with (G14)
  • Functions should be small
  • Functions should do one thing (G30)
  • Don’t Repeat Yourself, aka the DRY principle (G5)

Interestingly, there is no specific heuristic that says functions should be small, although the discussion of the book and talks by “Uncle Bob” do explicitly say this. Specifically, he states functions should not exceed 20 lines, and should rarely be that long.

Prefer Polymorphism to If-Else

In this article we will focus on the first rule Casey identified, preferring polymorphism to if-else statements. This rule states that rather than have a function that does something based on the type of the argument, the functionality should rely on a common method for all valid argument types, implemented via inheritance. Uncle Bob justifies this by a general distaste for switch statements because they increase the depth complexity of a function. In a deeper reading, he does acknowledge there are situations where this rule should be broken (it’s a heuristic after all), but not for performance reasons.

The Control Case

All code for this discussion can be found on GitHub under the cc_polymorphism folder. A complete tabulation of the results and comparisons can be viewed in the file, final_results.txt.

In Chapter 6 of Clean Code, the rule is demonstrated by comparing a procedural vs object-oriented implementation of calculating the area of a Shape. In Python, we can implement the object-oriented approach with an abstract base class, or ABC. This might be more pythonically done with a Protocol, but I felt this was more true to both the Clean Code implementation as well as Casey’s. Some rough tests indicate that neither implementation impacts performance.

Here are our shape classes:

class Shape(ABC):
   @abstractmethod
   def area(self) -> float:
       pass


class ControlSquare(Shape):
   def __init__(self, width: float):
       self.width = width


   def area(self) -> float:
       return self.width * self.width


class ControlRectangle(Shape):
   def __init__(self, width: float, height: float):
       self.width = width
       self.height = height


   def area(self) -> float:
       return self.width * self.height


class ControlTriangle(Shape):
   def __init__(self, width: float, height: float):
       self.width = width
       self.height = height


   def area(self) -> float:
       return self.width * self.height / 2


class ControlCircle(Shape):
   def __init__(self, width: float):
       self.width = width


   def area(self) -> float:
       return math.pi * self.width / 2 * self.width / 2

You’ll notice I used a rather strange implementation of the circle by focusing on the diameter instead of the radius. This is for two reasons. First, I found it strange going through all of Casey’s examples, that width for most shapes was represented as the distance between two points on the perimeter of the shape, but not for circles, so using diameter corrects for that. Second, it will serve for demonstration purposes later on in this article. If you’re interested in the mathematics side of this different view on circles, see here.

Finally, we come to measuring performance. Our methodology will be as follows:

  • I will run our tests on a MacBook Air with an Apple M2 processor and 8 GB of RAM.
  • I will disable wifi and Bluetooth, close all applications, and run the tests from a terminal to try to minimize any other processes that might have an impact on performance.
  • I will generate a list of 1,000 instances of each kind of shape, including a list of randomly assorted shapes. We will use a function, calculate_total_area, which will calculate the total area of each list of shapes.
  • I will use Python’s timeit module to time running our objective function for each list 1,000,000 times to try to further reduce the impact of any background processes. We will also use the gc.enable()setup option to more properly mimic a production setting (I didn’t see a huge difference with this enabled or not).
  • Finally, we’ll report our results.

For our control setup, we get the following results (avg_time is reported in seconds).

================================ ==========
shape                              avg_time
================================ ==========
Control Square                   0.00003593
Control Rectangle                0.00003582
Control Triangle                 0.00004523
Control Circle                   0.00006176
Control Assorted                 0.00005991

Immediately we see some interesting discrepancies. There is a significant slowdown associated with the operations for Triangles and Circles, and surprisingly, the assorted shapes performed quite poorly as well. We will come back to this and other general optimizations below. But first …

What If We Just Didn’t Polymorphism?

Corey and Uncle Bob compared their versions of the above code to a more procedural implementation where the area is calculated in a generic function using either an if-else statement (Uncle Bob) or a switch statement (Corey). Since Python does not have a proper switch statement (the new match is not a switch statement, and if you try to use it as such, it has a similar performance to if/else statements), so we’ll use if/else statements. Our shapes now look like:

ShapeType = Enum('ShapeType', ['SQUARE', 'RECTANGLE', 'TRIANGLE', 'CIRCLE'])


class ShapeUnion:
   def __init__(self, shape_type: ShapeType, width: float, height: float=0.0):
       self.type = shape_type
       self.width = width
       self.height = height

And our area and objective function look like:

def get_area_if_else(shape: ShapeUnion):
   if shape.type == ShapeType.SQUARE:
       return shape.width * shape.width
   if shape.type == ShapeType.RECTANGLE:
       return shape.width * shape.height
   if shape.type == ShapeType.TRIANGLE:
       return shape.width * shape.height / 2
   if shape.type == ShapeType.CIRCLE:
       return pi * shape.width / 2 * shape.width / 2


def calculate_total_area_procedural(shapes: list[ShapeUnion]):
   accumulator = 0
   for shape in shapes:
       accumulator += get_area_if_else(shape)
   return accumulator

This new configuration gives us the following results

================================  ==========  ================
shape                               avg_time    %_diff_control
================================  ==========  ================
Procedural Square                 0.00008110      125.69949173
Procedural Rectangle              0.00017852      398.33595347
Procedural Triangle               0.00023916      428.77530013
Procedural Circle                 0.00030695      396.97474725
Procedural Assorted               0.00020989      250.32118351

This is a surprising result. Corey saw a 1.5x improvement in performance by using a generic shape type and a universal area function, but we saw a loss in performance between 1.25x and over 4x! Clearly, the “old fashioned” way, as Corey described it, is not the way to go with Python.

But We Want To Go Faster Not Slower

So if the silver bullets for lower-level languages actually incur a penalty for Python, what options do we have for improving our code performance?

Division is Hard

Recall from our control run, that we saw a significant drop in performance for both Triangles and Circles. The common element to both of these classes was that they used division in their area calculation. As any good math student knows, division is just multiplication by a float, so what would happen if we utilized this math fact? Here are our new Triangle and Circle classes:

class NoDivisionTriangle(Shape):
   def __init__(self, width: float, height: float):
       self.width = width
       self.height = height


   def area(self) -> float:
       return 0.5 * self.width * self.height


class NoDivisionCircle(Shape):
   def __init__(self, width: float):
       self.width = width


   def area(self) -> float:
       return math.pi * 0.5 * self.width * 0.5 * self.width

Rerunning with these changes yields the following (excluding squares and triangles since they were not changed).

================================  ==========  ================
shape                               avg_time    %_diff_control
================================  ==========  ================
No Division Triangle              0.00003981      -11.98874995
No Division Circle                0.00005412      -12.37280964
No Division Assorted              0.00005718       -4.55477939

We see that this simple switch from division to multiplication yielded performance improvements of about 12% for the triangle and circle shapes and about 5% for the assorted shapes. However, we can see that circles and assorted shapes are still significantly slow on the performance side. Let’s see what we can do to fix that.

Don’t Do the Same Calculation Again and Again

If we look at the formula we’ve used for the area of a circle (again, I acknowledge we took a non-traditional approach), we see that we keep multiplying by pi, 0.5, and 0.5 for every calculation. What if we hoisted the multiplication of these constants outside of the actual area calculation? Our circle class could become:

CIRCLE_CONSTANT = math.pi * 0.5 * 0.5


class PrecomputedConstCircle(Shape):
   def __init__(self, width: float):
       self.width = width


   def area(self) -> float:
       return CIRCLE_CONSTANT * self.width * self.width

And with this change, we end up with the following results, again truncated to only impacted values.

================================  ==========  ================
shape                               avg_time    %_diff_control
================================  ==========  ================
Precomputed Constants Circle      0.00004052      -34.39372981
Precomputed Constants Assorted    0.00005343      -10.82610396

This gives our circles a nearly 34% performance gain over our control implementation and an additional 25% gain over our no division implementation. This translated to an 11% boost for the assorted shapes. A couple of notes, in each test, we are using the more performant version of all shapes in the assorted shapes test, so here we are using the no division version of triangles. Second, where we define the circle constant has an impact on the performance of the calculation. Making the circle constant a class attribute rather than a module global was 17% slower, but still better than the no division version.

Issue Summary

Clean code gave developers in the early 2000’s a series of guidelines for how to write their code so that it was more maintainable over time and less prone to full rewrites. However, these guidelines actually lead to serious performance issues when applied in lower level languages. In comparison, the contradictory strategies that resulted in better performance for low level languages actually resulted in catastrophic performance loss in Python. However, we did find ways to still increase the performance of our Python code significantly.

At the end of this issue, we have our first rules of thumb for Python performance:

  • Functions that rely on a fixed interface rather than branching calculations based on type are significantly more performant.
  • Division by a number is an expensive operation, and if it is going to be done repeatedly, it is better to multiply by the float equivalent of the number’s inverse.
  • When doing calculations with multiple constants, it is better to precompute the full constant.

Next Issue

In the next issue, we’ll continue our line of reasoning to improve the performance of our shapes and explore the Clean Code rule “Code should not know about the internals of objects it’s working on”, that is, Feature Envy.