Commit Graph

161 Commits

Author SHA1 Message Date
Klaas van Schelven
5f29a20765 send_json: print understandable json errors
at least: this works in the current set of circumstances
2024-09-16 14:20:21 +02:00
Klaas van Schelven
f1b75aab81 api.json.schema: put back in code, make test fail on invalidness and related fixes
This reverts course on 4201fbd778, and restores event.schema.json from that
commit.  In that commit we said: 'this is not used'. Not true: it's used in a
test, though this test used the validity check to silently skip.

In this commit:

1. Do _not_ just silently skip invalid samples. Since we have a way of properly
   validating, let's use that so that we know how useful the samples that we have
   actually are.

2. Deal with "_meta", a field that we sometimes see in the "private samples" (data
   that ultimately comes from running a somewhat recent python-sdk against my
   actual codebase). The need for this was exposed by [1]

3. Add a test for the up-to-date-ness of event.json.schema

4. remove special-cased attribute-checks in `is_valid`; `send_json` was, at the
   time, an opportunistic way to just get my hands on some sample data. the
   approach at validation reflected that: I just did some tests on the existence
   of certain attributes to determine which json files were even events. But in
   the end I did a full validation using an API schema, which kinda made the
   whole business useless. This commit cleans up the individual checks.
2024-09-16 11:28:05 +02:00
Klaas van Schelven
4201fbd778 remove vendored event.schema.json from codebase, document where it can be found in the future 2024-09-13 17:47:25 +02:00
Klaas van Schelven
497fbacb03 Remove event-samples from codebase
Properly sourced and licensed they are now at https://github.com/bugsink/event-samples
2024-09-13 11:11:12 +02:00
Klaas van Schelven
eec8d51491 Remove various non-TODOs
either already done, or more of a 'this is a way this code could potentially
evolve in the future' (but not a 'we must do this')
2024-09-13 10:05:22 +02:00
Klaas van Schelven
2e70197825 Add check_migrations command
this way, at least in the Docker setup, you'll get a meaningful error when you
try to start up a half-baked server
2024-09-09 15:31:14 +02:00
Klaas van Schelven
6bb853cd22 Comments about sqlite: take mysql into account too 2024-08-28 15:40:06 +02:00
Klaas van Schelven
129a8db421 Fix various flake8 errors 2024-08-21 09:31:05 +02:00
Klaas van Schelven
f7972cbec0 use datetime.timezone.utc
RemovedInDjango50Warning: The django.utils.timezone.utc alias is deprecated. Please update your code to use datetime.timezone.utc instead.
2024-08-21 09:01:20 +02:00
Klaas van Schelven
1bfac5d8c6 assertEquals -> assertEual (Python 3.12)
<<insert remarks about fashion police>>
2024-08-21 08:49:49 +02:00
Klaas van Schelven
3128392d9a Distinguish ingested_at and digested_at 2024-07-18 14:45:59 +02:00
Klaas van Schelven
717a632b7d check_for_thresholds refactoring: 'metadata' is superfluous
because it was basically the input-tuple (in a different format)
2024-07-18 09:43:37 +02:00
Klaas van Schelven
b211ba4c1e Document possible way forward for counting all ingested events 2024-07-18 09:21:59 +02:00
Klaas van Schelven
f48c48f7e5 Implement 429 for the deprecated 'store' endpoint too 2024-07-18 09:19:37 +02:00
Klaas van Schelven
927587c132 Stress test interuptible, still show results 2024-07-17 17:33:30 +02:00
Klaas van Schelven
ec0877edb7 Document yet another problem with 'real streaming' and Nginx 2024-07-17 17:13:30 +02:00
Klaas van Schelven
65ea181f37 vbc-unmute: reduce calls to the expensive check
as done in the previous commit for project quota
2024-07-17 15:33:15 +02:00
Klaas van Schelven
51a53c09a4 quota: check as little as possible & check-on-digest
Also fix various off-by-one errors with the help of tests
2024-07-17 14:48:19 +02:00
Klaas van Schelven
8849a3e44b Don't write to the DB on-ingest
In the previous commit I put the code for a small performance-experiment.
The results are (very) obvious: don't do this. Response times go through
the roof, and more importantly, the server becomes unreliable. Reason:
time-outs caused by waiting for the write-lock.
2024-07-16 16:39:12 +02:00
Klaas van Schelven
0c964cfcc8 Add project.ingested_event_count (input for performance-experiment) 2024-07-16 15:48:16 +02:00
Klaas van Schelven
c01d332e18 Rename ingest_order to digest_order and clarify event_count
* issue.event_count to digested_event_count
* event.ingest_order to event.digest_order
* issue.ingest_order to digest_order

This is generally more correct/explicit, and is also in preparation
of doing work on-digest (which may or may not happen)
2024-07-16 15:23:40 +02:00
Klaas van Schelven
d56a8663a7 Remove the periodCounter and the PC registry
direct consequence of switching to SQL-based counting
2024-07-16 15:08:05 +02:00
Klaas van Schelven
5ce840f62f Move period_utils to separate file 2024-07-15 14:38:35 +02:00
Klaas van Schelven
93365f4c8d Period-counting using SQL instead of custom-made (PoC)
The direct cause for this was the following observation: there was no mechanism
in place to safeguard counted events across evictions, i.e. the following order
of events was not accounted for:

* ingest/digest a bunch of events (PCs correctly updated)
* eviction (PC still correct)
* server/snappea restart (PC reloaded, but based on new events. not correct).

I though about various approaches to fix this (e.g. snapshotting) but in the end
such approaches added even more complexity to the PC mechanism. I decided to first
check how non-performant the SQL route would be, and this PoC seems to say: just
go SQL.

There's also a small semantic change (probably in the direction of what you'd
expect), namely: the periods are no longer 'calendar' periods.
2024-07-15 14:28:13 +02:00
Klaas van Schelven
c42c85c050 Quota: only trigger when _over_ quota 2024-07-15 13:40:58 +02:00
Klaas van Schelven
fbee32c79a Remove some 'maybe' comments for 'drop immediately' 2024-07-15 11:02:08 +02:00
Klaas van Schelven
d5bfe70488 Comments on the finer points of quota 2024-07-15 11:00:20 +02:00
Klaas van Schelven
d68aff05ca Quota 2024-07-15 09:37:36 +02:00
Klaas van Schelven
c403d906cd stress-test: report on errors 2024-07-15 09:26:27 +02:00
Klaas van Schelven
49a395fb86 use the envelope_header's DSN if it is available 2024-07-12 10:41:16 +02:00
Klaas van Schelven
b5321f3685 Notes on streaming 2024-07-12 10:12:07 +02:00
Klaas van Schelven
eb01c61947 Fix typo in comment 2024-07-10 14:00:59 +02:00
Klaas van Schelven
90a55e522b Remove 2 'untested behavior' notes
I _think_ I meant that I had never actually seen those code-paths in action
(i.e. the note was not about automated tests but rather any kind of visual
confirmation that it worked) but I have seen that now
2024-07-10 14:00:47 +02:00
Klaas van Schelven
1ed6522126 Clarify why get_pc_registry must be done before event-creation 2024-07-09 13:27:16 +02:00
Klaas van Schelven
eb23d44962 Enforce a single pc_registry for a single ingesting process
Using a pid-file that's implied by the ingestion directory.

We do this in `get_pc_registry`, i.e. on the first request. This means failure is
in the first request on the 2nd process.

Why not on startup? Because we don't have a configtest or generic on-startup location
(yet). Making _that_ could be another source of fragility, and getting e.g. the nr
of processes might be non-trivial / config-dependent.
2024-07-09 13:14:27 +02:00
Klaas van Schelven
edff0e219c PeriodCounter: remove event-based approach
Replacing it with passing the thresholds on each call to `inc`.

The event-based approach was broken in a multi-process setup (such as having a separate
gunicorn and snappea), because the unmute events would be registered GUI-side
(gunicorn), and the single process where the counting happened had a different PC
instance.

The solution is to get rid of the event-listener approach, and just make an inventory of
the threshold-checks that need to be done right before each call to `inc`. Because the
calls to `inc` happen in a single process (we [will] enforce this elsewhere) this fixes
the problem.

During refactoring it became clear that this is probably a good idea anyway: many
comments about corner-cases could be removed.

Other things I found:

* The now-removed `_digest_event_python_postprocessing` did more than Python alone (it
  also touched the DB for unmutes) so that was probably a separate bug (now fixed).

* In the event-listener-based code, I foresaw the need for `on_become_false` (but did
  not use it yet). The idea was probably that this could be useful in the quota setting
  (a quota can become unmet after a while) but in fact it isn't useful, because when a
  quota becomes unmet you'd still need to check all quota and OR them.

Tests have not been truly refactored (the new architecture probably points to a new
desired set of tests) but rather have been made to run in the simplest way possible.
2024-07-09 09:31:36 +02:00
Klaas van Schelven
471b69e956 Stress test: ability to generate random event types: 2024-06-27 10:49:25 +02:00
Klaas van Schelven
c5df10e9cf Stress test: ability to use multiple dsns (projects) 2024-06-27 09:52:10 +02:00
Klaas van Schelven
fe6c955465 never_evict events that are a Historic Turning Point
Both for technical (foreign keys) and business reasons (these are events you
care about)
2024-06-24 22:50:00 +02:00
Klaas van Schelven
ea6aa9bbca Retention/quotas: something that 'seems to work' (doesn't immediately crash) 2024-06-21 11:50:13 +02:00
Klaas van Schelven
c2b821589d Retention, WIP (yesterday) 2024-06-21 09:28:04 +02:00
Klaas van Schelven
7cce0c58ab Simplify code
by moving the updating of the denormalized fields up, we can remove an assymmetry
2024-06-20 09:12:24 +02:00
Klaas van Schelven
a1e842fee1 Remove no-longer-true/relevant comment
The shaving-off of queries that's discussed in the comment is no longer
relevant because the associated branch is fully seen as ValidationError
these days.
2024-06-19 16:58:21 +02:00
Klaas van Schelven
8ad6059722 Complete migration reset 2024-06-14 10:29:10 +02:00
Klaas van Schelven
fb66b04be9 Document playground.bugsink.com performance findings 2024-05-23 14:24:04 +02:00
Klaas van Schelven
4647c1b498 Stress test: better stats 2024-05-23 12:36:14 +02:00
Klaas van Schelven
f0c255a346 Force envelope use in stress tests 2024-05-23 11:02:44 +02:00
Klaas van Schelven
23d7f2172d Add showstat (for snappea queue size) 2024-05-23 08:37:48 +02:00
Klaas van Schelven
78a85087a2 'performance' info for event bytes count 2024-05-22 20:43:10 +02:00
Klaas van Schelven
c92e3ec772 Add INGEST_STORE_BASE_DIR 2024-05-20 09:44:38 +02:00