Green with Envy: Improving Python Performance with a Sprinkling of Feature Envy
Python Performance: Issue 2 – Feature Envy
Previous Issue Recap
In the previous issue we discussed the differences between the “Clean Code” version of calculating the cumulative area of a collection of shapes and “the old fashioned way”. Robert Martin, aka “Uncle Bob”, advocates for a “clean” polymorphic approach to the problem, where each shape exposes an area method, while Casey Muratori advocates for a “messy” switch statement approach to the problem, where a single function calculates each shape’s area based on its type. While Corey saw a 1.5x increase in performance comparing his approach in C++ to Uncle Bob’s approach, when we implemented the same pattern in Python we saw as much as a 4x decrease in performance! From there we explored a few smaller refactors of our shape classes to gain an improvement of up to 22% for our Circle classes specifically.
We finished issue 1 with the following definitions for our shapes. All code for this article can be found here under the cc_feature_envy folder.
class ControlSquare(Shape):
def __init__(self, width: float):
self.width = width
def area(self) -> float:
return self.width * self.width
class ControlRectangle(Shape):
def __init__(self, width: float, height: float):
self.width = width
self.height = height
def area(self) -> float:
return self.width * self.height
class NoDivisionTriangle(Shape):
def __init__(self, width: float, height: float):
self.width = width
self.height = height
def area(self) -> float:
return 0.5 * self.width * self.height
CIRCLE_CONSTANT = math.pi * 0.5 * 0.5
class PrecomputedConstCircle(Shape):
def __init__(self, width: float):
self.width = width
def area(self) -> float:
return CIRCLE_CONSTANT * self.width * self.width
As a quick reminder, we’re measuring the performance of these shapes when we sum the areas of 1,000 instances of each shape as well as 1,000 instances of random shapes.
Precompute Precompute Precompute
Before we get into measuring the performance impact of Feature Envy, let’s do some additional optimization of our shapes.
The first thing we’ll do is to learn from our last optimization in issue 1 (precompute constants), and take that a step further. Notice that once a shape’s widths have been set, the area is also already determined, because from a mathematical perspective, the area is a property of a 2D shape, not a derived value. So let’s hoist the area calculation into our constructors and fetch the value with a good old fashioned getter and check out the results. Our rectangle becomes the following, with similar changes implemented for the other shapes:
class GetterRectangle:
def __init__(self, width: float, height: float):
self.width = width
self.height = height
self._area = width * height
def area(self):
return self._area
Following an identical test setup as in the previous article, and comparing the performance to our original control implementation, we get the following results. Average time is always reported in seconds.
=============================== ========== ================ ===============
shape avg_time %_diff_control %_diff_precom
=============================== ========== ================ ===============
With Getters Square 0.00002929 -18.64875174 -18.26506910
With Getters Rectangle 0.00002950 -18.71327582 -18.56827372
With Getters Triangle 0.00002941 -36.52720589 -27.44246013
With Getters Circle 0.00002936 -52.73549787 -28.26446534
With Getters Assorted 0.00003912 -34.68020217 -27.12306413
Now that’s some real performance gains! We’re looking at anywhere from an 18% to 52% gain in performance from our original control and up to a 28% increase compared to our previous best. In addition, this is a universal speed up rather than optimizing individual shapes, so we’re really making some progress here.
An obvious criticism of this approach is that we’ve simply moved where the calculation is happening, and so we haven’t really improved anything. So let’s compare instantiation times.
==================== ================= ==================
shape avg_time %_diff_control
==================== ================= ==================
Control Square 0.000000000000328 0.000000000000000
Control Rectangle 0.000000000000429 0.000000000000000
Control Triangle 0.000000000000430 0.000000000000000
Control Circle 0.000000000000323 0.000000000000000
==================== ================= ==================
Getter Square 0.000000000000331 1.008145268830665
Getter Rectangle 0.000000000000446 3.933901076486294
Getter Triangle 0.000000000000446 3.820502828760949
Getter Circle 0.000000000000337 4.406072897531537
We definitely see a slow down in instantiation, but as a percentage it is noticeably smaller than the performance gains we saw by making this switch. In terms of raw numbers, multiplying this average time by 1,000 shows us that instantiating 1,000 instances of each shape is still something like 5 orders of magnitude faster than the time to sum the areas of 1,000 instances, so clearly the calculation of the area itself is not the only slow down we’re encountering in our test runs.
Another critique of this implementation is that it would be more Pythonic to make use of a Python property for accessing and managing the area attribute, but as you can see, this is actually quite a bit slower than the traditional getter for the individual shapes. Full code can be found in the GitHub repo.
=============================== ================
shape %_diff_getters
=============================== ================
With Properties Square 23.01360448
With Properties Rectangle 22.78751002
With Properties Triangle 22.54248087
With Properties Circle 22.62815483
With Properties Assorted -2.04247743
It is interesting that this approach is slightly faster for the assorted shapes. I don’t have a good explanation as to why.
Use The Language Features
One last minor tweak we can make before looking at feature envy, is to take advantage of a language feature in Python called slots. It’s a simple change that generally improves attribute access performance. For example, our Square class now becomes:
class SlotsSquare(Shape):
__slots__ = ('width', '_area')
def __init__(self, width: float):
self.width = width
self._area = width * width
def area(self):
return self._area
Applying this change to all the shapes, we get the following results.
=============================== ================
shape %_diff_getters
=============================== ================
With Slots Square -0.70821804
With Slots Rectangle -0.41752610
With Slots Triangle -0.33059178
With Slots Circle -0.13885755
With Slots Assorted 0.52279667
While this is certainly the smallest performance gain we’ve seen, every little bit counts. And again, we see a reverse in performance for the assorted category, defying expectations a bit. I will add that in repeated runs of these tests, we see the most variability in performance gains with slots, probably because the gains are so minimal to begin with.
Feature Envy
Briefly, feature envy is when a function or method interacts directly with the data (the attributes) of another object. Avoiding feature envy is one of the dictates of Clean Code. Violating this rule is fairly common in Python, and allows us to remove our getter in favor of directly accessing the area property. Our Triangle class becomes
class AttrTriangle:
__slots__ = ['width', 'height', 'area']
def __init__(self, width: float, height: float):
self.width = width
self.height = height
self.area = 0.5 * width * height
The other shapes are adjusted similarly. We also have to modify our cumulative area function in this case, since we no longer have an area method.
def calculate_total_area_attr(shapes: list[AttrShape]) -> float:
accumulator = 0
for shape in shapes:
accumulator += shape.area
return accumulator
With these changes in place we get the following results.
=============================== ========== ================ ==============
shape avg_time %_diff_control %_diff_slots
=============================== ========== ================ ==============
With Attributes Square 0.00001219 -66.14600013 -58.08857163
With Attributes Rectangle 0.00001289 -64.48095088 -56.12079024
With Attributes Triangle 0.00001287 -72.23214680 -56.10724829
With Attributes Circle 0.00001211 -80.50210113 -58.68990481
With Attributes Assorted 0.00002098 -64.96452968 -46.64212107
And that’s some serious improvement. We’re now looking at code that runs up to 80% faster than our original implementation and less than half the time of our previous best! Clearly with Python, a little feature envy can go a long way.
Back to Corey
At this point we’ve diverged from the path of exploration that Corey pursued because of the significant differences in performance. However, it’s worth it to come back to the source of inspiration for this series and compare again. Implementing Corey’s code utilizing feature envy in Python gives us something like the following.
class ShapeType(Enum):
SQUARE = auto()
RECTANGLE = auto()
TRIANGLE = auto()
CIRCLE = auto()
SHAPE_CONSTANTS_TBL = {
ShapeType.SQUARE: 1,
ShapeType.RECTANGLE: 1,
ShapeType.TRIANGLE: 0.5,
ShapeType.CIRCLE: CIRCLE_CONSTANT
}
class ShapeUnion:
def __init__(self, shape_type: ShapeType, width: float, height: float):
self.type = shape_type
self.width = width
self.height = height
def get_area_with_lookup(shapes: list[ShapeUnion]):
accumulator = 0
for shape in shapes:
accumulator = SHAPE_CONSTANTS_TBL[shape.type] * shape.width * shape.height
return accumulator
This method again uses a generic shape class, but now uses a lookup table to fetch a scalar constant for each shape type and calculate the area in a uniform way. We see the following results.
=============================== ========== ================ ==============
shape avg_time %_diff_control %_diff_attrs
=============================== ========== ================ ==============
Corey'S Way Square 0.00007156 98.77406263 487.15089322
Corey'S Way Rectangle 0.00007285 100.73815234 465.15632411
Corey'S Way Triangle 0.00006625 42.96332130 414.85190548
Corey'S Way Circle 0.00006500 4.64394914 436.69346545
Corey'S Way Assorted 0.00008091 35.11444398 285.65043577
Given the results we saw in issue 1, it’s really no surprise that we are still seeing some pretty atrocious performance here. It is worth noting though, that this performance is better than the original procedural implementation where Corey used a switch statement instead of Polymorphism. Additionally, the performance doesn’t get significantly better if we implement the lessons we’ve discussed in this series like precomputing the area in the object instantiation.
Issue Summary
So far we’ve seen that Clean Code is a mixed bag for performance in a Python program. Sticking to Clean Code’s preference for polymorphism was a winning proposition, but violating the rule against feature envy was unquestionably the way to go as well. On our way to reaching these conclusions, we added the following rules to our list of Python performance optimizations.
- Model your data properly, calculating what is in fact an attribute repeatedly in a method leads to serious performance hits down the road.
- Make use of the language features. By making our shape attributes slots, we were able improve our code performance.
- Favor direct attribute access whenever possible. Even a simple getter carries significant overhead.
Next Issue
In our next issue we’ll take a look at the three Clean Code rules Corey pulled out that focus on functions instead of classes. Namely, functions should be small, functions should do only one thing, and don’t repeat yourself.