Files
bugsink/api
Klaas van Schelven f1b75aab81 api.json.schema: put back in code, make test fail on invalidness and related fixes
This reverts course on 4201fbd778, and restores event.schema.json from that
commit.  In that commit we said: 'this is not used'. Not true: it's used in a
test, though this test used the validity check to silently skip.

In this commit:

1. Do _not_ just silently skip invalid samples. Since we have a way of properly
   validating, let's use that so that we know how useful the samples that we have
   actually are.

2. Deal with "_meta", a field that we sometimes see in the "private samples" (data
   that ultimately comes from running a somewhat recent python-sdk against my
   actual codebase). The need for this was exposed by [1]

3. Add a test for the up-to-date-ness of event.json.schema

4. remove special-cased attribute-checks in `is_valid`; `send_json` was, at the
   time, an opportunistic way to just get my hands on some sample data. the
   approach at validation reflected that: I just did some tests on the existence
   of certain attributes to determine which json files were even events. But in
   the end I did a full validation using an API schema, which kinda made the
   whole business useless. This commit cleans up the individual checks.
2024-09-16 11:28:05 +02:00
..

Findings about event.schema.json

There are 2 locations where this file can be sourced (a good and a bad one): The 2 locations have diverged (of course!)

sentry-data-schemas

In the sentry-data-schemas repo:

This is MIT-licenced.

The repo contains a setup.py, but:

  • the result of that didn't make it to pypi
  • the result of that is Python files (mypy) and does not contain the json file.

Sentry (the main repo)

In the sentry repo:

This is not 'real' Open Source.

Notes on divergence:

The main point of divergence (other than just the fact that the laws of nature force code to drift apart) is that the sentry's codebase has, as per the commmit that adds it:

added "project_id" field (in the API this would have been added from the URL path)

See also the "caveats" section here:

https://github.com/getsentry/sentry-data-schemas?tab=readme-ov-file#relayeventschemajson

In short, the more reasons to just use the "upstream" API.

Said in another way: we act more as the "relay" than as "getsentry/sentry", because we do ingest straight in the main process. So we should adhere to the relay's spec.

Notes on use:

Bugsink, as it stands, doesn't use event.schema.json much.

  • We have --valid-only as a param on send_json, but I appear to have used that only sporadically (back in nov 2023)
  • We could at some point in the future [offer the option to] throw events through a validator before proceeding with digesting. At that point we'll re-vendor event.schema.json (from the sentry-data-schemas repo)
  • Reading this file is useful, but we can do that straight from the source.