In the light of the discussion on #134, this implements the "clean up later"
solution: a vacuum task that deletes IssueTags no longer referenced by any
EventTag on the same Issue.
This doesn't prevent stale IssueTags from being created but ensures they are
eventually removed, enabling follow-up cleanup (e.g. of TagValues).
Performance-wise, this is a relatively safe path forward; it can run off-hours
or not at all, depending on preferences. Semantically it's the least clear:
whether an Issue appears to be tagged may now depend on whether vacuum has run.
No tests yet; no immediate TagValue cleanup.
Like e45c61d6f0, but for .project.
I originally thought `SET_NULL` would be a good way to "do stuff later", but
that's only so the degree that [1] updates are cheaper than deletes and [2]
2nd-order effects (further deletes in the dep-tree) are avoided.
Now that we have explicit Project-deletion (deps-first, delayed, properly batched)
the SET_NULL behavior is always a no-op (but with cost in queries).
As a result, in the test for project deletion (which has deletes for many
of the altered models), the following 12 queries are no longer done:
```
SELECT "projects_project"."id", [..many fields..] FROM "projects_project" WHERE "projects_project"."id" = 1
DELETE FROM "projects_projectmembership" WHERE "projects_projectmembership"."project_id" IN (1)
DELETE FROM "alerts_messagingserviceconfig" WHERE "alerts_messagingserviceconfig"."project_id" IN (1)
UPDATE "releases_release" SET "project_id" = NULL WHERE "releases_release"."project_id" IN (1)
UPDATE "issues_issue" SET "project_id" = NULL WHERE "issues_issue"."project_id" IN (1)
UPDATE "issues_grouping" SET "project_id" = NULL WHERE "issues_grouping"."project_id" IN (1)
UPDATE "events_event" SET "project_id" = NULL WHERE "events_event"."project_id" IN (1)
UPDATE "tags_tagkey" SET "project_id" = NULL WHERE "tags_tagkey"."project_id" IN (1)
UPDATE "tags_tagvalue" SET "project_id" = NULL WHERE "tags_tagvalue"."project_id" IN (1)
UPDATE "tags_eventtag" SET "project_id" = NULL WHERE "tags_eventtag"."project_id" IN (1)
UPDATE "tags_issuetag" SET "project_id" = NULL WHERE "tags_issuetag"."project_id" IN (1)
```
I originally thought `SET_NULL` would be a good way to "do stuff later", but
that's only so the degree that [1] updates are cheaper than deletes and [2]
2nd-order effects (further deletes in the dep-tree) are avoided.
Now that we have explicit Issue-deletion (deps-first, delayed, properly batched)
the SET_NULL behavior is always a no-op (but with cost in queries).
As a result, in the test for issue deletion (which has deletes for many
of the altered models), the following 8 queries are no longer done:
```
SELECT "issues_grouping"."id", [..many fields..] FROM "issues_grouping" WHERE "issues_grouping"."id" IN (1)
UPDATE "events_event" SET "grouping_id" = NULL WHERE "events_event"."grouping_id" IN (1)
[.. a few moments later..]
SELECT "issues_issue"."id", [..many fields..] FROM "issues_issue" WHERE "issues_issue"."id" = 'uuid'
UPDATE "issues_grouping" SET "issue_id" = NULL WHERE "issues_grouping"."issue_id" IN ('uuid')
UPDATE "issues_turningpoint" SET "issue_id" = NULL WHERE "issues_turningpoint"."issue_id" IN ('uuid')
UPDATE "events_event" SET "issue_id" = NULL WHERE "events_event"."issue_id" IN ('uuid')
UPDATE "tags_eventtag" SET "issue_id" = NULL WHERE "tags_eventtag"."issue_id" IN ('uuid')
UPDATE "tags_issuetag" SET "issue_id" = NULL WHERE "tags_issuetag"."issue_id" IN ('uuid')
```
(breaks the tests b/c of constraints and not always using factories; will fix next)
CASCADE was defined for keys & values, but in practice those are never directly
deleted except in the very case in which it has been established that they are
'orphaned', i.e. no longer being referrred to. That's exactly the case in which
CASCADE is superfluous.
As a result, in the test for issue deletion (which contains a prune of
tagvalue), the following 3 queries are no longer done:
```
SELECT "tags_tagvalue"."id", "tags_tagvalue"."project_id", "tags_tagvalue"."key_id", "tags_tagvalue"."value" FROM "tags_tagvalue" WHERE "tags_tagvalue"."id" IN (1)
DELETE FROM "tags_eventtag" WHERE "tags_eventtag"."value_id" IN (1)
DELETE FROM "tags_issuetag" WHERE "tags_issuetag"."value_id" IN (1)
```
Implemented using a batch-wise dependency-scanner in delayed
(snappea) style.
* no tests yet.
* no real point-of-entry in the (regular, non-admin) UI yet.
* no hiding of Issues which are delete-in-progress from the UI
* file storage not yet cleaned up
* project issue counts not yet updated
* dangling tag values: no cleanup mechanism yet.
See #50
Problem: on mysql `make_consistent` cannot always clean up `Event`s, because
`EventTag` objects still point to them, leading to an integrityerror.
The problem does not happen for `sqlite`, because sqlite does FK-checks
on-commit. And the offending `EventTag` objects are "eventually cleaned up" (in
the same transaction, in make_consistent)
This is the "mostly works" solution, for the scenario we've encountered.
Namely: remove EventTags which have no issue before removing Events. This works
in practice because of the way Events-to-cleanup were created in the UI in
practice, namely by removal of some Issue in the admin, triggering a `SET_NULL`
on the `issue_id`. Removal of issue implies an analagous `SET_NULL` on the
`EventTag`'s `issue_id`, and by removing those `EventTag`s before proceeding
with the `Event`s, you avoid the FK constraint triggering.
We don't want to fully reimplement `CASCADE` (as in Django) here, and the
values of `on_delete` are "Design Decision Needed" and non-homogonous anyway,
and we might soon implement proper deletions (see #50) anyway, so the "mostly
works" solution will have to do for now.
Fixes#132
In b031792784 using Event.issue was made conditional (if we already filter
by Tag, the tag encodes that info already, and it was assumed adding the
WHERE elsewhere would confuse the query optimizer).
As per that commit's message, the measurements that led me to that decision
were probably wrong. I now simply think: the more places you narrow your
search, the easier your DB will have it.
Measuring turns out: this is indeed so, for all cases (in the order of 20-30%),
for which this still matters (the present fix is on the now-less-visitied path)
When searching by tag, there is no need to join with Event; especially when
just counting results or determining first/last digest_order (for navigation).
(For the above "no need" to be actually true, digest_order was denormalized
into EventTag).
The above is implemented in `search_events_optimized`.
Further improvements:
* the bounds of `digest_order` are fetched only once; for first/last this info
is reused.
* explicitly pass `event_qs_count` to the templates
* non-event pages used to calculate a "last event" to generate a tab with a
correct event.id; since we simply have the "last" idiom, better use that.
this also makes clear the "none" idiom was never needed, we remove it again.
Results:
Locally (60K event DB, 30K events on largest issue) my testbatch now
runs in 25% of time (overall).
* The effect on the AND-ing are in fact very large (13% runtime remaining)
* The event details page is not noticably improved.
* denormalize IssueTag.key; this allows for key to be used in and index
(issue, key, count).
* rewrite to grouping-first, per-key-query-second. i.e. reverts part of
bbfee84c6a. Reasoning: I don't want to rely on "mostly unique" always
guessing correctly, and we don't dynamically determine that yet. Which
means that (in the single query version) if you'd have a per-event value for
some tag, you could end up iterating over as many values as there are events,
which won't work.
* in tags.py, do the tab-check first to avoid doing the tag-calculation twice.
* further denormalation (of key__key, of value__str) actually turns out to not
be required for both the grouping and indivdual queries to be fast.
Performance tests, as always, against sqlite3.
--
Roads not taken/background
* This commit removes a future TODO that "A point _could_ be made for
['issue', '?value?' 'count']", I tried both versions of that index
(against the group-then-query version, the only one which I trust)
but without denormalization of key, I could not get it to be fast.
* I thought about a hybrid approach (for those keys with low counts of values
do the single-query thing) but as it stands the extra complexity isn't worth
it.
---
on the 1.2M events, 3 (user defined) tags / event test env this
basically lowers the time from "seconds" to "miliseconds".
Done by denormalizing EventTag.issue, and adding that into an index. Targets:
* get-event-within-query (when it's 'last' or 'first')
* .count (of search query results)
* min/max (for the first/prev/next/last buttons)
(The min/max query's performance significantly improved by the addition of
the index, but was also rewritten into a simple SELECT rather than MIN/MAX).
When this code was written, I thought I had spectacularly improved performance.
I now believe this was based on an error in my measurements, but that this
still represents (mostly) an improvement, so I'll let it stand and will take
it from here in subsequent commits.
maybe maybe the passed-in event also avoids this, but the present
method will always do what you expect and this is obvious upon
reading, rather than have the reader think about something