Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Ksoup is a Kotlin Multiplatform library for working with HTML and XML. It's a port of the renowned Java library Jsoup.

License

NotificationsYou must be signed in to change notification settings

fleeksoft/ksoup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ksoup is a Kotlin Multiplatform library for working with real-world HTML and XML. It's a port of the renowned Java library,jsoup, and offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM and CSS selectors.

KotlinMIT LicenseMaven Central

badge-androidbadge-iosbadge-macbadge-tvosbadge-jvmbadge-linuxbadge-windowsbadge-jsbadge-wasm

🚨 Deprecation Notice

The following extension libraries aredeprecated and will be removed in a future release:

  • ksoup-korlibs (I/O extension)
  • ksoup-network-korlibs (Network extension)
  • ksoup-network-ktor2 (Network extension)

Recommendation:

  • For I/O capabilities: Useksoup-kotlinx extension
  • For network capabilities: Useksoup-network extension (based on Ktor 3)

Ksoup implements theWHATWG HTML5 specification, parsing HTML to the same DOM as modern browsers do, but with support for Android, JVM, and native platforms.

Features

  • Scrape and parse HTML from a URL, file, or string
  • Find and extract data using DOM traversal or CSS selectors
  • Manipulate HTML elements, attributes, and text
  • Clean user-submitted content against a safe-list to prevent XSS attacks
  • Output tidy HTML

Ksoup is adept at handling all varieties of HTML found in the wild.

Getting started

Library Structure

Ksoup follows a modular architecture:

  • Core Library (com.fleeksoft.ksoup:ksoup): The main library that provides HTML/XML parsing from strings
  • Optional I/O Extensions: Add capabilities for parsing from files and other sources
  • Optional Network Extensions: Add capabilities for fetching and parsing from URLs

Installation

Include the dependencies in yourcommonMain. Latest versionMaven Central

1. Core Library

Start with the core library. This is all you need if you're only parsing HTML/XML from strings.

// Required core libraryimplementation("com.fleeksoft.ksoup:ksoup:<version>")

2. I/O Extensions (Optional)

Add one of these extensions only if you need to parse HTML/XML from files or other sources.

Choose one of the following I/O libraries:

  1. kotlinx-io (Recommended)

    // Optional: Add this if you need file parsing capabilities// Provides Ksoup.parseFile, Ksoup.parseSource & Other InputStream APIsimplementation("com.fleeksoft.ksoup:ksoup-kotlinx:<version>")
  2. okio

    // Optional: Add this if you need file parsing capabilities// Provides Ksoup.parseFile, Ksoup.parseSource & Other InputStream APIsimplementation("com.fleeksoft.ksoup:ksoup-okio:<version>")
  3. korlibs-io(DEPRECATED: Use kotlinx-io instead)

    // Deprecated: Not recommended for new projects// Provides Ksoup.parseFile, Ksoup.parseStream & Other InputStream APIsimplementation("com.fleeksoft.ksoup:ksoup-korlibs:<version>")

3. Network Extensions (Optional)

Add one of these extensions only if you need to fetch and parse HTML/XML directly from URLs.

Choose one of the following network libraries:

  1. Ktor 3 (Recommended)

    // Optional: Add this if you need to fetch HTML/XML from URLs// Provides Ksoup.parseGetRequest, Ksoup.parseSubmitRequest, Ksoup.parsePostRequestimplementation("com.fleeksoft.ksoup:ksoup-network:<version>")
  2. Ktor 2(DEPRECATED: Use Ktor 3 instead)

    // Deprecated: Not recommended for new projects// Provides Ksoup.parseGetRequest, Ksoup.parseSubmitRequest, Ksoup.parsePostRequestimplementation("com.fleeksoft.ksoup:ksoup-network-ktor2:<version>")
  3. korlibs-io Network(DEPRECATED: Use Ktor 3 instead)

    // Deprecated: Not recommended for new projects// Provides Ksoup.parseGetRequest, Ksoup.parseSubmitRequest, Ksoup.parsePostRequestimplementation("com.fleeksoft.ksoup:ksoup-network-korlibs:<version>")

Ksoup supportsCharsets

  • Standard charsets are already supported byKsoup IO, but for extended charsets, please addcom.fleeksoft.charset:charset-ext, For more details, visit theCharsets Documentation

Parsing HTML from a String with Ksoup

For API documentation you can checkJsoup. Most of the APIs work without any changes.

val html="<html><head><title>One</title></head><body>Two</body></html>"val doc:Document=Ksoup.parse(html= html)println("title =>${doc.title()}")// Oneprintln("bodyText =>${doc.body().text()}")// Two

This snippet demonstrates how to useKsoup.parse for parsing an HTML string and extracting the title and body text.

Fetching and Parsing HTML from a URL using Ksoup

//Please note that the com.fleeksoft.ksoup:ksoup-network library is required for Ksoup.parseGetRequest.val doc:Document=Ksoup.parseGetRequest(url="https://en.wikipedia.org/")// suspend function// orval doc:Document=Ksoup.parseGetRequestBlocking(url="https://en.wikipedia.org/")println("title:${doc.title()}")val headlines:Elements= doc.select("#mp-itn b a")headlines.forEach { headline:Element->val headlineTitle= headline.attr("title")val headlineLink= headline.absUrl("href")println("$headlineTitle =>$headlineLink")}

Parsing XML

val doc:Document=Ksoup.parse(xml, parser=Parser=Parser.xmlParser())

Parsing Metadata from Website

//Please note that the com.fleeksoft.ksoup:ksoup-network library is required for Ksoup.parseGetRequest.val doc:Document=Ksoup.parseGetRequest(url="https://en.wikipedia.org/")// suspend functionval metadata:Metadata=Ksoup.parseMetaData(element= doc)// suspend function// orval metadata:Metadata=Ksoup.parseMetaData(html=HTML)println("title:${metadata.title}")println("description:${metadata.description}")println("ogTitle:${metadata.ogTitle}")println("ogDescription:${metadata.ogDescription}")println("twitterTitle:${metadata.twitterTitle}")println("twitterDescription:${metadata.twitterDescription}")// Check com.fleeksoft.ksoup.model.MetaData for more fields

In this example,Ksoup.parseGetRequest fetches and parses HTML content from Wikipedia, extracting and printing news headlines and their corresponding links.

Ksoup Public functions

  • Ksoup.parse(html: String, baseUri: String = ""): Document
  • Ksoup.parse(html: String, parser: Parser, baseUri: String = ""): Document
  • Ksoup.parse(reader: Reader, parser: Parser, baseUri: String = ""): Document
  • Ksoup.clean( bodyHtml: String, safelist: Safelist = Safelist.relaxed(), baseUri: String = "", outputSettings: Document.OutputSettings? = null): String
  • Ksoup.isValid(bodyHtml: String, safelist: Safelist = Safelist.relaxed()): Boolean

Ksoup I/O Public functions

  • Ksoup.parseInput(input: InputStream, baseUri: String, charsetName: String? = null, parser: Parser = Parser.htmlParser()) from (ksoup-io, ksoup-okio, ksoup-kotlinx, ksoup-korlibs)
  • Ksoup.parseFile from (ksoup-okio, ksoup-kotlinx, ksoup-korlibs)
  • Ksoup.parseSource from (ksoup-okio, ksoup-kotlinx)
  • Ksoup.parseStream from (ksoup-korlibs)

Ksoup Network Public functions

  • Suspend functions
    • Ksoup.parseGetRequest
    • Ksoup.parseSubmitRequest
    • Ksoup.parsePostRequest
  • Blocking functions
    • Ksoup.parseGetRequestBlocking
    • Ksoup.parseSubmitRequestBlocking
    • Ksoup.parsePostRequestBlocking

For further documentation, please check here:Jsoup

Ksoup vs. Jsoup Benchmarks: Parsing & Selecting 448KB HTML Filetest.tx

Ksoup vs Jsoup

Open source

Ksoup is an open source project, a Kotlin Multiplatform port of jsoup, distributed under the MIT License, Version 2.0. The source code of Ksoup is available onGitHub.

Development and Support

For questions about usage and general inquiries, please refer toGitHub Discussions.

If you wish to contribute, please read theContributing Guidelines.

To report any issues, visit ourGitHub issues, Please ensure to check for duplicates before submitting a new issue.

License

Ksoup is open source software licensed under theMIT License.

This project is a Kotlin Multiplatform port ofJsoup, created by Jonathan Hedley.
Portions of this library are derived from jsoup and retain their originalMIT License,
© 2009–2025 Jonathan Hedley.

About

Ksoup is a Kotlin Multiplatform library for working with HTML and XML. It's a port of the renowned Java library Jsoup.

Topics

Resources

License

Stars

Watchers

Forks


[8]ページ先頭

©2009-2025 Movatter.jp