* Update dependency com.fleeksoft.ksoup:ksoup-network to v0.2.5 (#40) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Update whole ksoup to 0.2.5 * Update dependency gradle to v8.14.3 (#37) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Update dependency org.junit:junit-bom to v5.14.0 (#36) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Update dependency gradle to v8.14.3 (#43) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Update plugin org.jetbrains.kotlin.jvm to v2.2.20 (#35) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Update ktor from scraper * Update settings * Update settings * Update gradle/actions action to v4.4.4 (#31) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Adds claude init * Upgrades to Java 24 --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
3.0 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
OpenGraphKt is a minimalist Kotlin multiplatform library for parsing Open Graph protocol tags from HTML. It wraps Ksoup (a Kotlin port of JSoup) to extract and structure Open Graph metadata.
Current Status: Pre-alpha - Protocol implementation is complete for og: tags, but type system needs refinement.
Project Structure
This is a multi-module Gradle project:
opengraphkt/- Core library module (published to Maven Central asfr.lengrand:opengraphkt)demo/- Local file parsing examplesdemo-remote/- Remote URL parsing examples (see Main.kt for usage)scrape-test/- Testing/scraping utilities
Common Commands
Build and Test
./gradlew build # Build all modules
./gradlew test # Run all tests
./gradlew :opengraphkt:test # Run tests for core library only
Code Coverage
./gradlew koverXmlReport # Generate XML coverage report
./gradlew koverVerify # Verify coverage meets 70% minimum threshold
Publishing
./gradlew publishToMavenLocal # Publish to local Maven repo for testing
Architecture
Core Components
Parser (Parser.kt): Main entry point that accepts multiple input types:
parse(url: URL)- Fetches and parses remote HTMLparse(html: String)- Parses raw HTML stringparse(file: File)- Parses local HTML fileparse(document: Document)- Parses Ksoup Document
The parser extracts meta[property^=og:] tags and builds structured data models.
Data Models (Models.kt): Type-safe representations of Open Graph data:
Data- Main container withisValid()method checking required fields (title, type, image, url)- Base types:
Image,Video,Audio - Content-specific types:
Article,Book,Profile - Music types:
MusicSong,MusicAlbum,MusicPlaylist,MusicRadioStation - Video types:
VideoMovie,VideoEpisode
Key Implementation Details
Tag Grouping: Tags are grouped by namespace (prefix before first colon) to handle structured properties like og:image:width, og:image:height that belong to the preceding og:image tag.
Date Handling: ISO 8601 datetime parsing with fallback for date-only formats (appends T00:00:00Z).
Structured Property Association: Images/Videos/Audio with their metadata (width, height, type, etc.) are associated by parsing sequential tags - each base tag (og:image) is paired with following attribute tags (og:image:width) until the next base tag.
Development Notes
- JVM Toolchain: Java 24 (see
jvmToolchain(24)in build files) - Testing: CI matrix tests on Java 17 and 23 via GitHub Actions
- Dependencies: Core library uses Ksoup (v0.2.5) for HTML parsing and network requests
- Maven Coordinates: Group
fr.lengrand, artifactopengraphkt, currently at0.1.2-SNAPSHOT - Code Coverage: Kover plugin enforces 70% minimum coverage threshold