This reverts course on 4201fbd778, and restores event.schema.json from that
commit. In that commit we said: 'this is not used'. Not true: it's used in a
test, though this test used the validity check to silently skip.
In this commit:
1. Do _not_ just silently skip invalid samples. Since we have a way of properly
validating, let's use that so that we know how useful the samples that we have
actually are.
2. Deal with "_meta", a field that we sometimes see in the "private samples" (data
that ultimately comes from running a somewhat recent python-sdk against my
actual codebase). The need for this was exposed by [1]
3. Add a test for the up-to-date-ness of event.json.schema
4. remove special-cased attribute-checks in `is_valid`; `send_json` was, at the
time, an opportunistic way to just get my hands on some sample data. the
approach at validation reflected that: I just did some tests on the existence
of certain attributes to determine which json files were even events. But in
the end I did a full validation using an API schema, which kinda made the
whole business useless. This commit cleans up the individual checks.
Findings about event.schema.json
There are 2 locations where this file can be sourced (a good and a bad one): The 2 locations have diverged (of course!)
sentry-data-schemas
In the sentry-data-schemas repo:
This is MIT-licenced.
The repo contains a setup.py, but:
- the result of that didn't make it to pypi
- the result of that is Python files (mypy) and does not contain the json file.
Sentry (the main repo)
In the sentry repo:
- https://github.com/getsentry/sentry/blob/master/src/sentry/issues/event.schema.json
- https://github.com/getsentry/sentry/blob/6b96e8f0c484/src/sentry/issues/event.schema.json
This is not 'real' Open Source.
Notes on divergence:
The main point of divergence (other than just the fact that the laws of nature force code to drift apart) is that the sentry's codebase has, as per the commmit that adds it:
added
"project_id"field (in the API this would have been added from the URL path)
See also the "caveats" section here:
https://github.com/getsentry/sentry-data-schemas?tab=readme-ov-file#relayeventschemajson
In short, the more reasons to just use the "upstream" API.
Said in another way: we act more as the "relay" than as "getsentry/sentry", because we do ingest straight in the main process. So we should adhere to the relay's spec.
Notes on use:
Bugsink, as it stands, doesn't use event.schema.json much.
- We have
--valid-onlyas a param onsend_json, but I appear to have used that only sporadically (back in nov 2023) - We could at some point in the future [offer the option to] throw events through a validator before proceeding with digesting. At that point we'll re-vendor event.schema.json (from the sentry-data-schemas repo)
- Reading this file is useful, but we can do that straight from the source.