40 Commits

Author SHA1 Message Date
Klaas van Schelven
354af7ea0a Fix issues as reported by bandit or mark as nosec
Nothing worrying, but good to have checked this regardless
and important to have a green pipeline.

Fix #175
2025-07-30 12:16:40 +02:00
Klaas van Schelven
ad2aa08e0a Rename local var. for understanding 2025-07-03 11:44:01 +02:00
Klaas van Schelven
a097e25310 issue.stored_event_count: consequences for 'irrelevance'
document & assert
2025-03-31 15:25:59 +02:00
Klaas van Schelven
9b1911aded Fix issue.stored_event_count for eviction/retention 2025-03-31 14:51:58 +02:00
Klaas van Schelven
3c35ea5398 Eviction: note on newest-first behavior 2025-03-19 16:00:38 +01:00
Klaas van Schelven
6948f3a2e1 Dead code removal
'allowed as pass-in' but in fact we always pass-in
2025-03-19 15:53:51 +01:00
Klaas van Schelven
72ab0c68ef Log message: no need to mention 'include_never_evict'
because when you reach that point, it's always True
2025-03-19 15:43:05 +01:00
Klaas van Schelven
a2f3ad900b eviction-target not reached handling changes
this error has shown up for one of our users; I can't reproduce yet, but I can
make it better:

* log-don't-crash: not worth failing for this (drops the event, and also
  rolls back the transaction such that nothing is achieved regarding eviction)
* provide more info on-error (various counts)

NB: I've also changed the < into a <=, and combined it with a check on "loop
not done". I _think_ they are functionally equivalent, and that the new version
is simply more clear as well as slightly more efficient.

In my understanding: the old version simply looped one more time before giving
up (because it was < it needed one more iteration, and because there was no
explicit check on 'loop done' that inefficiency was needed in the old formulation).
I say "I think" because I don't have a test specific to the edge-case.
2025-03-19 15:32:39 +01:00
Klaas van Schelven
1d0c0c65ff Retention tests/clarification: filter_for_work 2025-03-19 14:33:00 +01:00
Klaas van Schelven
d3c6627556 Add a more complicated case to the retention tests
this one tests at least multiple epochs and irrelevances
2025-03-19 14:18:28 +01:00
Klaas van Schelven
1b7865d3b9 Eviction: Tests and rewrite-for-understanding of epoch_bounds_with_irrelevance 2025-03-19 11:56:55 +01:00
Klaas van Schelven
f548eab778 Merge branch 'main' into tag-search 2025-03-10 09:09:40 +01:00
Klaas van Schelven
0ade3c0f86 Add a comment about DB-CASCADE 2025-03-07 16:35:36 +01:00
Klaas van Schelven
96e07c4dc3 Tags: delete EventTag when Events are evicted
and document related things
2025-03-07 13:50:10 +01:00
Klaas van Schelven
97f03a8951 Rewrite 'eviction_target' comment 2025-03-06 14:03:51 +01:00
Klaas van Schelven
14bc3688c7 retention: deletion counts, more defensive idiom
the dict as returned by Django won't contain 'events.Event' if none are deleted;
no observed bug for this line, but good measure to fix it anyway
2025-02-17 21:36:08 +01:00
Klaas van Schelven
5559fba754 Introduce FileEventStorage
An (optional) way to store the `event_data` (full event as JSON)
outside the DB. This is expected to be useful for larger setups,
because it gives you:

* A more portable database (e.g. backups); (depeding on event size
  the impact on your DB is ~50x.
* Less worries about hitting "physical" limits (e.g. disk size, max
  file size) for your DB.

Presumably (more testing will happen going forwards) it will:

* Speed up migrations (especially on sqlite, which does full table
  copies)
* Speed up event ingestion(?)

Further improvements in this commit:

* `delete_with_limit` was removed; this removes one tie-in to MySQL/Sqlite
    (See #21 for this bullet)
2025-02-12 17:11:24 +01:00
Klaas van Schelven
9f61602fc1 Retention, internal: make max_event_count non-optional
It was optional in anticipation of other methods of eviction, but YAGNI,
and the idea of evicting in batches of 500 is baked in quite hard (for
good reasons).
2025-02-12 09:00:12 +01:00
Klaas van Schelven
615d2da4c8 Chache stored_event_count (on Issue and Projet)
"possibly expensive" turned out to be "actually expensive". On 'emu', with 1.5M
events, the counts take 85 and 154 ms for Project and Issue respectively;
bottlenecking our digestion to ~3 events/s.

Note: this is single-issue, single-project (presumably, the cost would be lower
for more spread-out cases)

Note on indexes: Event already has indexes for both Project & Issue (though as
the first item in a multi-column index). Without checking further: that appears
to not "magically solve counting".

This commit also optimizes the .count() on the issue-detail event list (via
Paginator).

This commit also slightly changes the value passed as `stored_event_count` to
be used for `get_random_irrelevance` to be the post-evication value. That won't
matter much in practice, but is slightly more correct IMHO.
2025-02-06 16:24:25 +01:00
Klaas van Schelven
c56611bc82 Note that MySQL supports DELETE w/ LIMIT too 2024-10-09 09:58:47 +02:00
Klaas van Schelven
3128392d9a Distinguish ingested_at and digested_at 2024-07-18 14:45:59 +02:00
Klaas van Schelven
c01d332e18 Rename ingest_order to digest_order and clarify event_count
* issue.event_count to digested_event_count
* event.ingest_order to event.digest_order
* issue.ingest_order to digest_order

This is generally more correct/explicit, and is also in preparation
of doing work on-digest (which may or may not happen)
2024-07-16 15:23:40 +02:00
Klaas van Schelven
14302783aa Eviction: Age based irrelevance with a base of 4 2024-07-02 09:03:17 +02:00
Klaas van Schelven
b145ef6631 Eviction: 500 per-eviction is a hard-limit; even for lowered max-events 2024-07-01 15:02:39 +02:00
Klaas van Schelven
ec01a64651 Evictions: delete_with_limit (don't overshoot) 2024-07-01 14:07:10 +02:00
Klaas van Schelven
c5df10e9cf Stress test: ability to use multiple dsns (projects) 2024-06-27 09:52:10 +02:00
Klaas van Schelven
e9ed7835c1 eviction_target bugfix; delete 'never_evict' if nothing else remains 2024-06-26 11:06:04 +02:00
Klaas van Schelven
833ebfe9ac Move code around & document it 2024-06-26 10:11:59 +02:00
Klaas van Schelven
f45995ce19 Evinction lowered target: no more lowering than 500 (for large quota) 2024-06-26 09:53:16 +02:00
Klaas van Schelven
9a96ab767a retention insights: don't ignore never_evict=True 2024-06-26 09:38:34 +02:00
Klaas van Schelven
653739a8f6 Eviction: use deletion counts to keep track of the work
This saves a query in the (small) loop (namely: selection counts of remaining items)

It also allows us to stop sooner (evict less).
2024-06-26 09:26:34 +02:00
Klaas van Schelven
fe6c955465 never_evict events that are a Historic Turning Point
Both for technical (foreign keys) and business reasons (these are events you
care about)
2024-06-24 22:50:00 +02:00
Klaas van Schelven
2bdc357a87 Eviction: logging 2024-06-24 13:58:45 +02:00
Klaas van Schelven
69a40480fd Retention/eviction: more small fixes/cleanup 2024-06-24 11:48:21 +02:00
Klaas van Schelven
bdc6193214 Add tool to generate insight in retention (and fix bugs that that insight revelead) 2024-06-24 10:59:04 +02:00
Klaas van Schelven
63afba020a Eviction: 95% 'lowered target' 2024-06-24 09:24:03 +02:00
Klaas van Schelven
82b229613b Fix: store generator in list b/c repeated evaluation 2024-06-24 09:12:07 +02:00
Klaas van Schelven
5e2cc0575f Retention, small fixes (from Friday) 2024-06-23 22:20:18 +02:00
Klaas van Schelven
ea6aa9bbca Retention/quotas: something that 'seems to work' (doesn't immediately crash) 2024-06-21 11:50:13 +02:00
Klaas van Schelven
c2b821589d Retention, WIP (yesterday) 2024-06-21 09:28:04 +02:00