Commit Graph

98 Commits

Author SHA1 Message Date
Klaas van Schelven
308034aadd Issue-delete from the UI (in the list-view)
See #50
2025-07-04 21:25:57 +02:00
Klaas van Schelven
28b2ce0eaf Various models: .project SET_NULL => DO_NOTHING
Like e45c61d6f0, but for .project.

I originally thought `SET_NULL` would be a good way to "do stuff later", but
that's only so the degree that [1] updates are cheaper than deletes and [2]
2nd-order effects (further deletes in the dep-tree) are avoided.

Now that we have explicit Project-deletion (deps-first, delayed, properly batched)
the SET_NULL behavior is always a no-op (but with cost in queries).

As a result, in the test for project deletion (which has deletes for many
of the altered models), the following 12 queries are no longer done:

```
SELECT "projects_project"."id", [..many fields..] FROM "projects_project" WHERE "projects_project"."id" = 1
DELETE FROM "projects_projectmembership" WHERE "projects_projectmembership"."project_id" IN (1)
DELETE FROM "alerts_messagingserviceconfig" WHERE "alerts_messagingserviceconfig"."project_id" IN (1)
UPDATE "releases_release" SET "project_id" = NULL WHERE "releases_release"."project_id" IN (1)
UPDATE "issues_issue" SET "project_id" = NULL WHERE "issues_issue"."project_id" IN (1)
UPDATE "issues_grouping" SET "project_id" = NULL WHERE "issues_grouping"."project_id" IN (1)
UPDATE "events_event" SET "project_id" = NULL WHERE "events_event"."project_id" IN (1)
UPDATE "tags_tagkey" SET "project_id" = NULL WHERE "tags_tagkey"."project_id" IN (1)
UPDATE "tags_tagvalue" SET "project_id" = NULL WHERE "tags_tagvalue"."project_id" IN (1)
UPDATE "tags_eventtag" SET "project_id" = NULL WHERE "tags_eventtag"."project_id" IN (1)
UPDATE "tags_issuetag" SET "project_id" = NULL WHERE "tags_issuetag"."project_id" IN (1)
```
2025-07-03 21:49:49 +02:00
Klaas van Schelven
6b9e4d8011 Project.delete_deferred(): first version (WIP)
Implemented using a batch-wise dependency-scanner in delayed
(snappea) style.

* no real point-of-entry in the (regular, non-admin) UI yet.
* no hiding of Projects which are delete-in-progress from the UI

* lack of DRY
* some unnessary work (needed in the Issue-context, but not here)
  is still being done.

See #50
2025-07-03 21:01:28 +02:00
Klaas van Schelven
2baa4446fd Issue deletion: event pre-deletes (storage, project-counts) 2025-07-03 13:18:28 +02:00
Klaas van Schelven
e45c61d6f0 Various models: .issue and .grouping; SET_NULL => DO_NOTHING
I originally thought `SET_NULL` would be a good way to "do stuff later", but
that's only so the degree that [1] updates are cheaper than deletes and [2]
2nd-order effects (further deletes in the dep-tree) are avoided.

Now that we have explicit Issue-deletion (deps-first, delayed, properly batched)
the SET_NULL behavior is always a no-op (but with cost in queries).

As a result, in the test for issue deletion (which has deletes for many
of the altered models), the following 8 queries are no longer done:

```
SELECT "issues_grouping"."id", [..many fields..] FROM "issues_grouping" WHERE "issues_grouping"."id" IN (1)
UPDATE "events_event" SET "grouping_id" = NULL WHERE "events_event"."grouping_id" IN (1)

[.. a few moments later..]

SELECT "issues_issue"."id", [..many fields..] FROM "issues_issue" WHERE "issues_issue"."id" = 'uuid'
UPDATE "issues_grouping" SET "issue_id" = NULL WHERE "issues_grouping"."issue_id" IN ('uuid')
UPDATE "issues_turningpoint" SET "issue_id" = NULL WHERE "issues_turningpoint"."issue_id" IN ('uuid')
UPDATE "events_event" SET "issue_id" = NULL WHERE "events_event"."issue_id" IN ('uuid')
UPDATE "tags_eventtag" SET "issue_id" = NULL WHERE "tags_eventtag"."issue_id" IN ('uuid')
UPDATE "tags_issuetag" SET "issue_id" = NULL WHERE "tags_issuetag"."issue_id" IN ('uuid')
```

(breaks the tests b/c of constraints and not always using factories; will fix next)
2025-07-03 11:33:58 +02:00
Klaas van Schelven
e5dbeae514 Issue.delete_deferred(): first version (WIP)
Implemented using a batch-wise dependency-scanner in delayed
(snappea) style.

* no tests yet.
* no real point-of-entry in the (regular, non-admin) UI yet.
* no hiding of Issues which are delete-in-progress from the UI
* file storage not yet cleaned up
* project issue counts not yet updated
* dangling tag values: no cleanup mechanism yet.

See #50
2025-06-27 12:52:59 +02:00
Klaas van Schelven
aad0f624f9 Fix: issue-list indexes must have project first
because we always filter by project before ordering;

the now-removed first_seen index was simply unused
2025-05-06 22:19:31 +02:00
Klaas van Schelven
49e6700d4a Grouping.grouping_key: hash it for the index 2025-05-06 11:32:19 +02:00
Klaas van Schelven
392f5a30be Add index for Grouping.grouping_key (and project) 2025-05-05 22:45:33 +02:00
Klaas van Schelven
524f5ea45e Issue Tag display: for low event-counts, show more tags
and for high event-counts, display a warning about what is hidden
2025-03-31 09:56:31 +02:00
Klaas van Schelven
cd7f3978cf Improve tag-overview performance
* denormalize IssueTag.key; this allows for key to be used in and index
  (issue, key, count).

* rewrite to grouping-first, per-key-query-second. i.e. reverts part of
  bbfee84c6a. Reasoning: I don't want to rely on "mostly unique" always
  guessing correctly, and we don't dynamically determine that yet. Which
  means that (in the single query version) if you'd have a per-event value for
  some tag, you could end up iterating over as many values as there are events,
  which won't work.

* in tags.py, do the tab-check first to avoid doing the tag-calculation twice.

* further denormalation (of key__key, of value__str) actually turns out to not
  be required for both the grouping and indivdual queries to be fast.

Performance tests, as always, against sqlite3.

--

Roads not taken/background

* This commit removes a future TODO that "A point _could_ be made for
  ['issue', '?value?' 'count']", I tried both versions of that index
  (against the group-then-query version, the only one which I trust)
  but without denormalization of key, I could not get it to be fast.

* I thought about a hybrid approach (for those keys with low counts of values
  do the single-query thing) but as it stands the extra complexity isn't worth
  it.

---
on the 1.2M events, 3 (user defined) tags / event test env this
basically lowers the time from "seconds" to "miliseconds".
2025-03-12 14:14:05 +01:00
Klaas van Schelven
3ee6f29f9c tags: fix the indexes
this is the part I was able to do with careful reading (and rerunning the
tests); actual performance implications will be checked based on this
2025-03-07 20:59:21 +01:00
Klaas van Schelven
f76d3f4f40 Merge branch 'main' into tag-search 2025-03-05 16:05:17 +01:00
Klaas van Schelven
381a5caae4 Issue.calculated_* fields: fix lengths
as in a717dd7374, but for Issue as well as Event.
The need for this was exposed by running the testsuite
against mysql; this commit fixes the tests.
2025-03-05 11:14:19 +01:00
Klaas van Schelven
0de8261440 Restore mostly_unique filter
botched in bbfee84c6a
2025-03-03 15:58:20 +01:00
Klaas van Schelven
bbfee84c6a issue tags: single query rather than one-per-tag 2025-03-03 13:42:18 +01:00
Klaas van Schelven
e6bc660731 Add note about per-key tag pages 2025-03-03 13:25:41 +01:00
Klaas van Schelven
1ae5bb3fd1 Tags: no cutoff when there are many
this idea was superceded by doing it explicitly in 00c49443eb
2025-03-03 13:23:52 +01:00
Klaas van Schelven
5930740e0b Tags: as a separate tab 2025-03-03 12:56:20 +01:00
Klaas van Schelven
124f90b403 'Issue Tags' box: show on all issue-related pages
now that it's no longer tied to the event...
2025-03-03 11:00:11 +01:00
Klaas van Schelven
00c49443eb Add 'mostly_unique' property to tags 2025-03-03 10:52:28 +01:00
Klaas van Schelven
7a30de3840 Issue Tags: select_related 2025-02-28 10:09:53 +01:00
Klaas van Schelven
60e25dac42 Issue Tags display: 'Other', sorting (WIP) 2025-02-28 09:48:01 +01:00
Klaas van Schelven
2c444c6e80 Display Issue (not event) tags in the RHS detail; WIP 2025-02-28 09:33:58 +01:00
Klaas van Schelven
10f8e10607 DB indexes for the issue-lits (including filters)
simply by reasoning about what they should be; no performance testing (on the issue-list
and on the event-ingestion) was done for these)
2025-02-18 10:32:06 +01:00
Klaas van Schelven
615d2da4c8 Chache stored_event_count (on Issue and Projet)
"possibly expensive" turned out to be "actually expensive". On 'emu', with 1.5M
events, the counts take 85 and 154 ms for Project and Issue respectively;
bottlenecking our digestion to ~3 events/s.

Note: this is single-issue, single-project (presumably, the cost would be lower
for more spread-out cases)

Note on indexes: Event already has indexes for both Project & Issue (though as
the first item in a multi-column index). Without checking further: that appears
to not "magically solve counting".

This commit also optimizes the .count() on the issue-detail event list (via
Paginator).

This commit also slightly changes the value passed as `stored_event_count` to
be used for `get_random_irrelevance` to be the post-evication value. That won't
matter much in practice, but is slightly more correct IMHO.
2025-02-06 16:24:25 +01:00
Klaas van Schelven
c42aa9118a Describe role of Grouping and how it relates to Issue
third time's a charm (5e5b53abed, 48307daa0f)
2025-01-31 15:25:18 +01:00
Klaas van Schelven
6497f482ae Correctly order Turningpoints (as per comment) 2024-12-16 22:04:03 +01:00
Klaas van Schelven
68f2e714d5 Fix resolve-from-list on MySQL
Mysteriously, "Truncated incorrect DOUBLE value". But we have no Double fields.
Answer: adding a value to a field (with "+") tries to convert to Double first
on MySQL. Using Concat solves it.

Showed up in all paths exept "resolved by next".

Fix #14
2024-11-22 17:32:20 +01:00
Klaas van Schelven
db486adb35 Rewrite comments on 'reopen' and 'issue_is_regression' 2024-09-17 23:01:41 +02:00
Klaas van Schelven
eb08bd562c When there's no (meaningful) release info, don't display it 2024-09-12 13:58:36 +02:00
Klaas van Schelven
e59fd3a225 Implement 'occurs_in_last_release' 2024-09-12 09:49:22 +02:00
Klaas van Schelven
3128392d9a Distinguish ingested_at and digested_at 2024-07-18 14:45:59 +02:00
Klaas van Schelven
717a632b7d check_for_thresholds refactoring: 'metadata' is superfluous
because it was basically the input-tuple (in a different format)
2024-07-18 09:43:37 +02:00
Klaas van Schelven
65ea181f37 vbc-unmute: reduce calls to the expensive check
as done in the previous commit for project quota
2024-07-17 15:33:15 +02:00
Klaas van Schelven
c01d332e18 Rename ingest_order to digest_order and clarify event_count
* issue.event_count to digested_event_count
* event.ingest_order to event.digest_order
* issue.ingest_order to digest_order

This is generally more correct/explicit, and is also in preparation
of doing work on-digest (which may or may not happen)
2024-07-16 15:23:40 +02:00
Klaas van Schelven
5ce840f62f Move period_utils to separate file 2024-07-15 14:38:35 +02:00
Klaas van Schelven
93365f4c8d Period-counting using SQL instead of custom-made (PoC)
The direct cause for this was the following observation: there was no mechanism
in place to safeguard counted events across evictions, i.e. the following order
of events was not accounted for:

* ingest/digest a bunch of events (PCs correctly updated)
* eviction (PC still correct)
* server/snappea restart (PC reloaded, but based on new events. not correct).

I though about various approaches to fix this (e.g. snapshotting) but in the end
such approaches added even more complexity to the PC mechanism. I decided to first
check how non-performant the SQL route would be, and this PoC seems to say: just
go SQL.

There's also a small semantic change (probably in the direction of what you'd
expect), namely: the periods are no longer 'calendar' periods.
2024-07-15 14:28:13 +02:00
Klaas van Schelven
edff0e219c PeriodCounter: remove event-based approach
Replacing it with passing the thresholds on each call to `inc`.

The event-based approach was broken in a multi-process setup (such as having a separate
gunicorn and snappea), because the unmute events would be registered GUI-side
(gunicorn), and the single process where the counting happened had a different PC
instance.

The solution is to get rid of the event-listener approach, and just make an inventory of
the threshold-checks that need to be done right before each call to `inc`. Because the
calls to `inc` happen in a single process (we [will] enforce this elsewhere) this fixes
the problem.

During refactoring it became clear that this is probably a good idea anyway: many
comments about corner-cases could be removed.

Other things I found:

* The now-removed `_digest_event_python_postprocessing` did more than Python alone (it
  also touched the DB for unmutes) so that was probably a separate bug (now fixed).

* In the event-listener-based code, I foresaw the need for `on_become_false` (but did
  not use it yet). The idea was probably that this could be useful in the quota setting
  (a quota can become unmet after a while) but in fact it isn't useful, because when a
  quota becomes unmet you'd still need to check all quota and OR them.

Tests have not been truly refactored (the new architecture probably points to a new
desired set of tests) but rather have been made to run in the simplest way possible.
2024-07-09 09:31:36 +02:00
Klaas van Schelven
fe6c955465 never_evict events that are a Historic Turning Point
Both for technical (foreign keys) and business reasons (these are events you
care about)
2024-06-24 22:50:00 +02:00
Klaas van Schelven
5e2cc0575f Retention, small fixes (from Friday) 2024-06-23 22:20:18 +02:00
Klaas van Schelven
cef1127e48 Make user-model swappable
I may just need this later, and doing it this late was already painful enough.
2024-05-29 10:22:57 +02:00
Klaas van Schelven
41a4913299 Implement SNAPPEA_TASK_ALWAYS_EAGER 2024-04-19 21:41:42 +02:00
Klaas van Schelven
c50780ab4e Use atomic transactions in views 2024-04-18 13:15:46 +02:00
Klaas van Schelven
d75bede5dd Show current status for issues 2024-04-16 21:54:36 +02:00
Klaas van Schelven
d89e3d4dd5 Add 'next-materialized historic annotation 2024-04-16 09:31:12 +02:00
Klaas van Schelven
875f306079 Reduce queries of 'history' view
* select_related for users (which are displayed in many locations)
* use 'xxx_id' if that's all you need
2024-04-15 15:06:27 +02:00
Klaas van Schelven
8e44f7f68e Unmute reason: show in email alert 2024-04-15 10:17:18 +02:00
Klaas van Schelven
ad93e22fff Fix the double-creating of TurningPoints for time-based-unmute 2024-04-15 09:55:22 +02:00
Klaas van Schelven
490899975b Add tests for TurningPoint creation
this also proves one existing bug: the double-creating of TurningPoints
for time-based-unmute
2024-04-15 09:51:30 +02:00