I have built a data pipeline in Python (3.12). It typically processes hundreds of thousands of files and runs for several days. A particular pipeline function calls external web APIs that sometimes fail. I expect some exceptions, and I want to record them. However, if the exception frequency is above a threshold, I do want to abort immediately.
I implemented an exception class that I use to decorate the function above.
Implementation:
class ExceptionMonitor: """ Monitor the number of exceptions raised and the number of calls made to a function. """ def __init__(self, error_rate_threshold: float, min_calls: int): """ :param error_rate_threshold: The error rate threshold above which an exception is raised. :param min_calls: The minimum number of calls before the error rate is calculated. """ self.error_rate_threshold = error_rate_threshold self.min_calls = min_calls self.nb_exceptions = 0 self.nb_calls = 0 self.error_rate = 0 def _save_exception(self): with open("exceptions.log", "a") as f: f.write(f"{datetime.now()}\n") f.write(traceback.format_exc()) f.write("\n\n") def __call__(self, func): @functools.wraps(func) def wrapper(*args, **kwargs): try: return func(*args, **kwargs) except Exception: self.nb_exceptions += 1 if self.nb_calls >= self.min_calls: self.error_rate = self.nb_exceptions / self.nb_calls if self.error_rate > self.error_rate_threshold: raise self._save_exception() finally: self.nb_calls += 1 return wrapperUsage:
@ExceptionMonitor(error_rate_threshold=0.03, min_calls=100)def some_func(): call_web_api()Is this code idiomatic?
- \$\begingroup\$What does
nbstand for innb_callsornb_exceptions? Notebook?\$\endgroup\$muru– muru2025-04-05 12:31:57 +00:00CommentedApr 5 at 12:31
4 Answers4
Your code looks fine to me.
For some nitpicks:
- pass the exception to
_save_exceptionto ensure you're interacting with the correct error, and - pass the path to the log file to the class, or change to a
logging.Logger, to make logging to a different location easier to change.
It's likely to be irrelevant for most parameters, but I think thatself.nb_calls += 1 belongs in thetry block before calling the function rather than afinally block after.
To take a small example, consider@ExceptionMonitor(error_rate_threshold=0.5, min_calls=2)and function results like[success, success, exception]. I would argue that this does not clear the 50% threshold, however the code as written would say that it does.
This would also address@toolic's note about division by zero.
Since the code does a division calculation:
self.error_rate = self.nb_exceptions / self.nb_callsit would be good to add a check that the divisor (nb_calls) is not 0. It is initialized to 0 by__init__:
self.nb_calls = 0You might want to check thatmin_calls is greater than 0 as well, unless you want to support that case. If you do support 0, it might be worth explicitly documenting that.
- \$\begingroup\$Checking
nb_callsis unnecessary as long asmin_calls > 0, there is already a>=check. It would be better to only checkmin_callsin constructor IMO.\$\endgroup\$STerliakov– STerliakov2025-04-04 13:12:49 +00:00CommentedApr 4 at 13:12
One issue that I identified later:
The error rate calculation is cumulative, so error bursts will be missed. For example, if the error threshold is 0.05 (5%) andmin_calls is 10 and there are 8 consecutive exceptions from the 20th to the 27th call, the cumulative error rate will be 8/27 which is around 3%. Although technically correct, it's more useful to calculate the error rate within a sliding window, like this:
class ExceptionMonitor: """Monitor the number of exceptions raised and the number of calls made to a function.""" def __init__( self, error_rate_threshold: float, min_calls: int, window_sz: int = 100 ): """ :param error_rate_threshold: The error rate threshold in [0,1), above which an exception is raised. :param min_calls: The minimum number of calls before the error rate is calculated. :param window_sz: The size of the sliding window for calculating exception rate. """ self.error_rate_threshold = error_rate_threshold self.min_calls = min_calls self.exceptions_log = "exceptions.log" if os.path.exists(self.exceptions_log): os.remove(self.exceptions_log) if window_sz < min_calls: raise ValueError( f"Window size {window_sz} must be greater than or equal to min_calls {min_calls}." ) self.window_sz = window_sz self.recent_outcomes = deque(maxlen=window_sz) def _error_rate_exceeded(self): if len(self.recent_outcomes) < self.min_calls: return False return ( sum(self.recent_outcomes) / len(self.recent_outcomes) ) > self.error_rate_threshold def _save_exception(self): with open("exceptions.log", "a") as f: f.write(f"{datetime.now()}\n") f.write(traceback.format_exc()) f.write("\n\n") def __call__(self, func): @functools.wraps(func) def wrapper(*args, **kwargs): try: result = func(*args, **kwargs) self.recent_outcomes.append(False) return result except Exception: self.recent_outcomes.append(True) if self._error_rate_exceeded(): raise self._save_exception() return wrapper- \$\begingroup\$Nice self-answer +1\$\endgroup\$2025-04-12 10:32:29 +00:00CommentedApr 12 at 10:32
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.

