Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[DRAFT] guide: integrating chDB, Cloud and Scikit learn#4700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
Blargian wants to merge5 commits intoClickHouse:main
base:main
Choose a base branch
Loading
fromBlargian:chdb_scikit_learn

Conversation

@Blargian
Copy link
Member

Summary

Adds a guide which details how you can use Cloud, chDB and Scikit learn together to train a model and run inference.

Covers:

  • How to query data from Cloud using chDB with Arrow for efficient transfer
  • How chDB can be used to easily switch back and forth between familiar DataFrames and processing in ClickHouse
  • How to train a binary classifier on a subset of the UK property price datasets (predict property type = flat or property type = detached house)
  • How to use that model in chDB to run inference
  • How to use that model with ClickHouse using UDFs

To do:

  • How to run inference in Cloud (executable UDFs not yet GA)

Checklist

@vercel
Copy link

vercelbot commentedNov 4, 2025

@Blargian is attempting to deploy a commit to theClickHouse Team onVercel.

A member of the Team first needs toauthorize it.

@vercel
Copy link

vercelbot commentedNov 5, 2025
edited
Loading

The latest updates on your projects. Learn more aboutVercel for GitHub.

ProjectDeploymentPreviewUpdated (UTC)
clickhouse-docsReadyReadyPreviewNov 5, 2025 9:42am

import confusion_matrix from '@site/static/images/use-cases/AI_ML/Scikit/confusion_matrix.png';

# Classifying UK property types with chDB and scikit-learn

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
:::note[TL;DR]
This guide demonstrates how chDB complements scikit-learn for ML workflows by building a binary classifier that predicts UK property types. You'll learn how to:
- Use chDB for fast feature engineering on 11.8M records from ClickHouse Cloud
- Build and train a Random Forest classifier achieving~87% accuracy
- Deploy the model back to ClickHouse via UDFs for real-time inference
The pattern shown here applies to any binary classification problem where you need efficient data preprocessing at scale.
Time required: 45-60 minutes
:::

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@dhtclkdhtclkdhtclk left review comments

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@Blargian@dhtclk

[8]ページ先頭

©2009-2025 Movatter.jp