mirror of
https://github.com/jlengrand/OpenGraphKt.git
synced 2026-03-10 08:31:23 +00:00
* Update dependency com.fleeksoft.ksoup:ksoup-network to v0.2.5 (#40) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Update whole ksoup to 0.2.5 * Update dependency gradle to v8.14.3 (#37) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Update dependency org.junit:junit-bom to v5.14.0 (#36) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Update dependency gradle to v8.14.3 (#43) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Update plugin org.jetbrains.kotlin.jvm to v2.2.20 (#35) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Update ktor from scraper * Update settings * Update settings * Update gradle/actions action to v4.4.4 (#31) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Adds claude init * Upgrades to Java 24 --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
#Scrape test module
The scrape test module is intended to test the immplementation of the library at scale by parsing a large amount of webpages and checking the quality of its results
Data
At this moment
I'd like a more varied set of data from different types of sources, and the current set mostly seem to contain homepages but it's surprisingly hard to find.
Running the tests
For various reasons, I am not uploading the actual data of the various URLs. To run the analysis yourself:
- Run
Scraper.ktonce, which will grab all the webpages and place them in thedata/webfolder. - Run
ParserTest.kt, which will run theParseron each of those web pages and check whether the tags can be extracted, and if the page is considered valid.