Fix 'different_runtime_limit' race conditions

This commit fixes 3 related issues with the way runtime_limit was administered; which could lead to race conditions (and hence: the wrong runtime_limit applying at some point in time). Post-fix, the folllowing holds: 1. We use thread_locals to store this info, since there are at least 2 sources of threaded code that touch this (snappea's workers and the django debugserver) 2. We distinguish between the "from connection settings" timeout and the "temporarily overridden" ones, since we cannot assume connection-initialization happens first (as per the comment in base.py) 3. We store runtime-limits per alias ('using'). Needed for [2] (each connection may have a different moment-of-initialization, clobbering CM-set values from the other connection) and also needed once you realize there may be different defaults for the timeouts. General context: I've recently started introducing the 'different runtime' helper quite a bit more; and across connections (snappea!), which created more and more doubts as to it actually working as advertised. Thoughts on "using" being required. I used to think "you can reason about a global timeout value, and the current transaction makes clear what you're actually doing", but as per the notes above that doesn't really work. Thoughts on reproducing: A few thoughts/notes on reproducing problems with race conditions. Basic note: that's always hairy. So in the end I settled on a solution that's hopefully easy to reason about, even if it's verbose. When I started work on this commit, I focussed on thread-safety; "proving the problem" consisted of F5/^R on a web page with 2 context managers with different timeouts, hoping to show that the stack unrolling didn't work properly. However, during those "tests" I noticed quite a few resets-to-5s (from the connection defaults), which prompted fix [2] from above.
2026-03-10 08:01:17 +00:00 · 2025-04-22 22:08:53 +02:00
parent 5c0e45a16d
commit 53d4be8183
5 changed files with 65 additions and 18 deletions
--- a/snappea/stats.py
+++ b/snappea/stats.py
@@ -107,7 +107,7 @@ class Stats:
            with immediate_atomic(using="snappea"):  # explicit is better than impl.; and we combine read/write here
                # having stats is great, but I don't want to hog task-processing too long (which would happen
                # precisely when the backlog grows large)
-                with different_runtime_limit(0.1):
+                with different_runtime_limit(0.1, using="snappea"):
                    try:
                        task_counts = Task.objects.values("task_name").annotate(count=Count("task_name"))
                    except OperationalError as e: