Commit Graph

59 Commits

Author SHA1 Message Date
Klaas van Schelven
628f7bde6e Comment about TagValue counts
See #272
2025-11-17 14:47:28 +01:00
Klaas van Schelven
60bbf8c606 send_json/stress_test utils: Prettier tag-sending, pt.2 2025-11-15 15:44:19 +01:00
Klaas van Schelven
16eccea851 Fix null constraint failure when remote_addr is None and user is '{{auto}}'
Fix #229
2025-09-23 10:14:28 +02:00
Klaas van Schelven
144e570db6 MySQL: do not save 2 queries in store_tags
it doesn't support the relevant machinery
2025-09-04 15:20:23 +02:00
Klaas van Schelven
e0cb4b6369 Save another query in store_tags
(analogous to the parent commit)

made possible by Django 5.2
2025-09-04 14:10:52 +02:00
Klaas van Schelven
9a13bdb83d Save a query in store_tags
made possible by Django 5.2
2025-09-04 13:52:08 +02:00
Klaas van Schelven
13dbc4dd29 Use remote_addr for '{{auto}}' ip_addr tags
See #165
2025-07-28 11:12:53 +02:00
Klaas van Schelven
770ccb1622 Fixed command's 'running in background' output
'Oskar' on discord pointed out 2 distinct commands had
identical output which was confusing
2025-07-14 16:12:25 +02:00
Klaas van Schelven
1965b0f8c2 vacuum_eventless_issuetags: tune batch-size
See #134
2025-07-08 16:16:54 +02:00
Klaas van Schelven
674d84909f TagValue pruning (for vacuum_eventless_issuetags) 2025-07-08 15:59:55 +02:00
Klaas van Schelven
9741844821 vacuum_eventless_issuetags: tests (and minor fix)
See #134
2025-07-08 15:32:43 +02:00
Klaas van Schelven
4dd525d0d0 Missing import 2025-07-08 15:29:14 +02:00
Klaas van Schelven
d62e53fdf8 store_tags: support 'very many' (~500) tags 2025-07-08 15:21:26 +02:00
Klaas van Schelven
a247528baa TagKey __str__ 2025-07-08 15:06:46 +02:00
Klaas van Schelven
dc25e044f0 Add store_tags test for 2 separate Issues
(there were some doubts whether this works; this test takes
those doubts away)
2025-07-08 15:06:20 +02:00
Klaas van Schelven
7f416ac920 vacuum_eventless_issuetags command
In the light of the discussion on #134, this implements the "clean up later"
solution: a vacuum task that deletes IssueTags no longer referenced by any
EventTag on the same Issue.

This doesn't prevent stale IssueTags from being created but ensures they are
eventually removed, enabling follow-up cleanup (e.g. of TagValues).

Performance-wise, this is a relatively safe path forward; it can run off-hours
or not at all, depending on preferences. Semantically it's the least clear:
whether an Issue appears to be tagged may now depend on whether vacuum has run.

No tests yet; no immediate TagValue cleanup.
2025-07-08 13:21:57 +02:00
Klaas van Schelven
28b2ce0eaf Various models: .project SET_NULL => DO_NOTHING
Like e45c61d6f0, but for .project.

I originally thought `SET_NULL` would be a good way to "do stuff later", but
that's only so the degree that [1] updates are cheaper than deletes and [2]
2nd-order effects (further deletes in the dep-tree) are avoided.

Now that we have explicit Project-deletion (deps-first, delayed, properly batched)
the SET_NULL behavior is always a no-op (but with cost in queries).

As a result, in the test for project deletion (which has deletes for many
of the altered models), the following 12 queries are no longer done:

```
SELECT "projects_project"."id", [..many fields..] FROM "projects_project" WHERE "projects_project"."id" = 1
DELETE FROM "projects_projectmembership" WHERE "projects_projectmembership"."project_id" IN (1)
DELETE FROM "alerts_messagingserviceconfig" WHERE "alerts_messagingserviceconfig"."project_id" IN (1)
UPDATE "releases_release" SET "project_id" = NULL WHERE "releases_release"."project_id" IN (1)
UPDATE "issues_issue" SET "project_id" = NULL WHERE "issues_issue"."project_id" IN (1)
UPDATE "issues_grouping" SET "project_id" = NULL WHERE "issues_grouping"."project_id" IN (1)
UPDATE "events_event" SET "project_id" = NULL WHERE "events_event"."project_id" IN (1)
UPDATE "tags_tagkey" SET "project_id" = NULL WHERE "tags_tagkey"."project_id" IN (1)
UPDATE "tags_tagvalue" SET "project_id" = NULL WHERE "tags_tagvalue"."project_id" IN (1)
UPDATE "tags_eventtag" SET "project_id" = NULL WHERE "tags_eventtag"."project_id" IN (1)
UPDATE "tags_issuetag" SET "project_id" = NULL WHERE "tags_issuetag"."project_id" IN (1)
```
2025-07-03 21:49:49 +02:00
Klaas van Schelven
3b3ce782c5 Fix the tests for prev. commit
tests were broken b/c not respecting constraints / not properly using the factories.
2025-07-03 11:33:58 +02:00
Klaas van Schelven
e45c61d6f0 Various models: .issue and .grouping; SET_NULL => DO_NOTHING
I originally thought `SET_NULL` would be a good way to "do stuff later", but
that's only so the degree that [1] updates are cheaper than deletes and [2]
2nd-order effects (further deletes in the dep-tree) are avoided.

Now that we have explicit Issue-deletion (deps-first, delayed, properly batched)
the SET_NULL behavior is always a no-op (but with cost in queries).

As a result, in the test for issue deletion (which has deletes for many
of the altered models), the following 8 queries are no longer done:

```
SELECT "issues_grouping"."id", [..many fields..] FROM "issues_grouping" WHERE "issues_grouping"."id" IN (1)
UPDATE "events_event" SET "grouping_id" = NULL WHERE "events_event"."grouping_id" IN (1)

[.. a few moments later..]

SELECT "issues_issue"."id", [..many fields..] FROM "issues_issue" WHERE "issues_issue"."id" = 'uuid'
UPDATE "issues_grouping" SET "issue_id" = NULL WHERE "issues_grouping"."issue_id" IN ('uuid')
UPDATE "issues_turningpoint" SET "issue_id" = NULL WHERE "issues_turningpoint"."issue_id" IN ('uuid')
UPDATE "events_event" SET "issue_id" = NULL WHERE "events_event"."issue_id" IN ('uuid')
UPDATE "tags_eventtag" SET "issue_id" = NULL WHERE "tags_eventtag"."issue_id" IN ('uuid')
UPDATE "tags_issuetag" SET "issue_id" = NULL WHERE "tags_issuetag"."issue_id" IN ('uuid')
```

(breaks the tests b/c of constraints and not always using factories; will fix next)
2025-07-03 11:33:58 +02:00
Klaas van Schelven
e58be0018f Tag models: no CASCADE
CASCADE was defined for keys & values, but in practice those are never directly
deleted except in the very case in which it has been established that they are
'orphaned', i.e. no longer being referrred to. That's exactly the case in which
CASCADE is superfluous.

As a result, in the test for issue deletion (which contains a prune of
tagvalue), the following 3 queries are no longer done:

```
SELECT "tags_tagvalue"."id", "tags_tagvalue"."project_id", "tags_tagvalue"."key_id", "tags_tagvalue"."value" FROM "tags_tagvalue" WHERE "tags_tagvalue"."id" IN (1)
DELETE FROM "tags_eventtag" WHERE "tags_eventtag"."value_id" IN (1)
DELETE FROM "tags_issuetag" WHERE "tags_issuetag"."value_id" IN (1)
```
2025-07-03 11:33:58 +02:00
Klaas van Schelven
ee9add5e5f Vacuum Tags command
See #135
2025-07-02 21:43:51 +02:00
Klaas van Schelven
38397bf2f2 Remove superfluous comment 2025-06-27 12:57:24 +02:00
Klaas van Schelven
e5dbeae514 Issue.delete_deferred(): first version (WIP)
Implemented using a batch-wise dependency-scanner in delayed
(snappea) style.

* no tests yet.
* no real point-of-entry in the (regular, non-admin) UI yet.
* no hiding of Issues which are delete-in-progress from the UI
* file storage not yet cleaned up
* project issue counts not yet updated
* dangling tag values: no cleanup mechanism yet.

See #50
2025-06-27 12:52:59 +02:00
Klaas van Schelven
4ca15c7159 fix make_consistent on mysql
Problem: on mysql `make_consistent` cannot always clean up `Event`s, because
`EventTag` objects still point to them, leading to an integrityerror.
The problem does not happen for `sqlite`, because sqlite does FK-checks
on-commit. And the offending `EventTag` objects are "eventually cleaned up" (in
the same transaction, in make_consistent)

This is the "mostly works" solution, for the scenario we've encountered.
Namely: remove EventTags which have no issue before removing Events. This works
in practice because of the way Events-to-cleanup were created in the UI in
practice, namely by removal of some Issue in the admin, triggering a `SET_NULL`
on the `issue_id`. Removal of issue implies an analagous `SET_NULL` on the
`EventTag`'s `issue_id`, and by removing those `EventTag`s before proceeding
with the `Event`s, you avoid the FK constraint triggering.

We don't want to fully reimplement `CASCADE` (as in Django) here, and the
values of `on_delete` are "Design Decision Needed" and non-homogonous anyway,
and we might soon implement proper deletions (see #50) anyway, so the "mostly
works" solution will have to do for now.

Fixes #132
2025-06-26 10:56:31 +02:00
Klaas van Schelven
70e0f147b5 Tags in event_data can be lists; deal with that
Fix #130
2025-06-26 09:06:21 +02:00
Klaas van Schelven
abaa1d9b2f Don't crash on non-str tag-values
Fixes #76
2025-04-06 15:00:54 +02:00
Klaas van Schelven
5d4271e350 Add user.etc tags in deduce_tags 2025-04-05 08:14:11 +02:00
Klaas van Schelven
2d51426618 Fix user tag deduction
although it looks (in the UI) like user info is a context, it's really
just a top-level attribute in the event-data
2025-03-31 09:42:29 +02:00
Klaas van Schelven
cda7e454c9 init_tags command: avoid unbounded WAL growth 2025-03-13 13:20:35 +01:00
Klaas van Schelven
1d8d6f1ac6 'flatten' migrations for tags
unreleased migrations: preference to flatten those;
happens to also fix mysql tests (for which the datamigraion failed)
2025-03-13 09:23:25 +01:00
Klaas van Schelven
ba5c291f57 Search performance: use Event.issue when searching
In b031792784 using Event.issue was made conditional (if we already filter
by Tag, the tag encodes that info already, and it was assumed adding the
WHERE elsewhere would confuse the query optimizer).

As per that commit's message, the measurements that led me to that decision
were probably wrong. I now simply think: the more places you narrow your
search, the easier your DB will have it.

Measuring turns out: this is indeed so, for all cases (in the order of 20-30%),
for which this still matters (the present fix is on the now-less-visitied path)
2025-03-12 21:36:51 +01:00
Klaas van Schelven
1eea9268a5 Optimization: Search on EvenTag without involving Event if possible
When searching by tag, there is no need to join with Event; especially when
just counting results or determining first/last digest_order (for navigation).

(For the above "no need" to be actually true, digest_order was denormalized
into EventTag).

The above is implemented in `search_events_optimized`.

Further improvements:

* the bounds of `digest_order` are fetched only once; for first/last this info
  is reused.

* explicitly pass `event_qs_count` to the templates

* non-event pages used to calculate a "last event" to generate a tab with a
  correct event.id; since we simply have the "last" idiom, better use that.
  this also makes clear the "none" idiom was never needed, we remove it again.

Results:

Locally (60K event DB, 30K events on largest issue) my testbatch now
runs in 25% of time (overall).

* The effect on the AND-ing are in fact very large (13% runtime remaining)
* The event details page is not noticably improved.
2025-03-12 20:38:07 +01:00
Klaas van Schelven
cd7f3978cf Improve tag-overview performance
* denormalize IssueTag.key; this allows for key to be used in and index
  (issue, key, count).

* rewrite to grouping-first, per-key-query-second. i.e. reverts part of
  bbfee84c6a. Reasoning: I don't want to rely on "mostly unique" always
  guessing correctly, and we don't dynamically determine that yet. Which
  means that (in the single query version) if you'd have a per-event value for
  some tag, you could end up iterating over as many values as there are events,
  which won't work.

* in tags.py, do the tab-check first to avoid doing the tag-calculation twice.

* further denormalation (of key__key, of value__str) actually turns out to not
  be required for both the grouping and indivdual queries to be fast.

Performance tests, as always, against sqlite3.

--

Roads not taken/background

* This commit removes a future TODO that "A point _could_ be made for
  ['issue', '?value?' 'count']", I tried both versions of that index
  (against the group-then-query version, the only one which I trust)
  but without denormalization of key, I could not get it to be fast.

* I thought about a hybrid approach (for those keys with low counts of values
  do the single-query thing) but as it stands the extra complexity isn't worth
  it.

---
on the 1.2M events, 3 (user defined) tags / event test env this
basically lowers the time from "seconds" to "miliseconds".
2025-03-12 14:14:05 +01:00
Klaas van Schelven
b031792784 Event (tag) search: performance improvement
Done by denormalizing EventTag.issue, and adding that into an index. Targets:

* get-event-within-query (when it's 'last' or 'first')
* .count (of search query results)
* min/max (for the first/prev/next/last buttons)

(The min/max query's performance significantly improved by the addition of
the index, but was also rewritten into a simple SELECT rather than MIN/MAX).

When this code was written, I thought I had spectacularly improved performance.
I now believe this was based on an error in my measurements, but that this
still represents (mostly) an improvement, so I'll let it stand and will take
it from here in subsequent commits.
2025-03-12 14:11:43 +01:00
Klaas van Schelven
14b99c3880 assertEquals -> assertEual (Python 3.12)
<<insert remarks about fashion police>>

yes this isn't the first time
2025-03-10 15:45:12 +01:00
Klaas van Schelven
3ee6f29f9c tags: fix the indexes
this is the part I was able to do with careful reading (and rerunning the
tests); actual performance implications will be checked based on this
2025-03-07 20:59:21 +01:00
Klaas van Schelven
f8113916dd Fix the tests
literally: the tests were always broken; in 39bddb14b7 I never
ran the tests before comitting
2025-03-07 20:46:11 +01:00
Klaas van Schelven
0ade3c0f86 Add a comment about DB-CASCADE 2025-03-07 16:35:36 +01:00
Klaas van Schelven
96e07c4dc3 Tags: delete EventTag when Events are evicted
and document related things
2025-03-07 13:50:10 +01:00
Klaas van Schelven
994e218e27 Notes on not-implemented tags 2025-03-06 15:58:54 +01:00
Klaas van Schelven
39bddb14b7 handled: searchable as a tag
also: don't display this in the detail view when the value isn't actually
in the data
2025-03-06 15:19:55 +01:00
Klaas van Schelven
0ad87be045 Notes on limitiations of TSTTCPW full_text search 2025-03-06 14:38:16 +01:00
Klaas van Schelven
2c9d5c80ed Search: support for quoted values
also adds tests and factors out the query parsing
2025-03-06 11:23:18 +01:00
Klaas van Schelven
1fa7436b2d Search: factor out commonalities between issue and event search 2025-03-06 09:34:53 +01:00
Klaas van Schelven
fbb021ee2f Testcase for Search 2025-03-06 09:26:43 +01:00
Klaas van Schelven
20a54381dc Refactor: move tags/search stuff to its own module 2025-03-06 09:26:35 +01:00
Klaas van Schelven
3d62fba8e9 Add (some) tests for deduce_tags 2025-03-05 20:21:20 +01:00
Klaas van Schelven
c8ecf508de Tags: on event details page show calculated tags
(not just the explicitly provided ones)
2025-03-03 11:29:07 +01:00
Klaas van Schelven
406472b6d4 os.version as a tag 2025-03-03 10:55:15 +01:00
Klaas van Schelven
33ed3242c2 Fix browser.version tag-deduction 2025-03-03 10:54:17 +01:00