This is a proof of concept for some level of somewhat native LLM integration.
This manages vector indexing, searching and LLM querying, with vectors stored & queried in the local BookStack database, and an external LLM service (OpenAI, Ollama etc...) queried for embeddings & query execution using locally found vector-based results for context.

The result is a LLM generated response to user query, with a list of relevant BookStack content used for the query as reference.

Issues & Questionables

MySQL lacks proper support. MySQL 9 supports vector columns and indexes, but you can't query distance in the community versions. MariaDB is good from 11.7.
- Not sure how I feel about MariaDB specific features/functionality. Current implementation uses MariaDB specific functions.
No idea how non-en/mixed language content works for these systems yet.
Do we set official targets of support? How common is OpenAI API compatibility across other providers?
Does embedding size vary much across difference services/models?
- Seems to:all-minilm=384,openai:text-embedding-3-small=1536,openai:text-embedding-3-large=3072
- Inserting 384 length embeddings intovector(1536) mariadb column seemed to result in 0 hex values. Seems like the column size has to be correct, can't insert under?
  - Response from Sergei at MariaDB on this:
    - "all vectors in a column are strictly the same size, because the distance between two vectors of different sizes is not defined. if you store 384-dim vectors in vector(1536) you can pad them with zeros and the result will be exactly the same as storing them in vector(384) except that it'll use about 5x more storage and memory"
  - Will need to provide controls/mechanism to change this.
How is those accessed/shown in the UI? Option in the search bar maybe to toggle search type? Action/reference from the normal search page?
Whatneeds to be configurable? Looks like a lot of things that could be tweaked, but don't want to be maintaining loads of specific options. Need to assess what's most important.

Considerations

This will really require a queue worker so indexing can be done in the background.
- This does mean that search index may be behind content.
Would need migration handling due to limited DB support, so maybe have the migration have widespread support by default but add vector parts via a command?
For a nice UX, would probably need to have results queried via JS rather than results on page load. Not a big deal, but I generally try to have read/view operations be non-JS friendly.
Lots of context-window/vector-size considerations to be made that have not been implemented yet.
We'd probably want to cap the context documents (at least those shown to a user) at a certain score to avoid showing completely irrelevant content.

Todo

Resolve above issues/considerations where needed.
Need to ensure vectors handled on content delete/update.
Resolve code TODOs
Cover functionality with testing
Show warning for LLM/AI-based output (inaccurate)
Extract text from views
Check against dark mode
Check accessibility
Add JS server-side-event lib to readme attribution list?

Implementation Notes

In some scenarios extra web-server config may be needed to prevent server-side buffering from messing with server-side-event handling. Not crucial, as won't break things, but can delay the response until finished on server, providing a worse experience. For example, for nginx proxies:proxy_buffering off; proxy_cache off;.

ssddanbrown self-assigned this

Mar 24, 2025

Copy link

MemberAuthor

ssddanbrown commentedMay 15, 2025

Preview:

bs-llm-demo.mp4

ssddanbrown mentioned this pull request

May 15, 2025

LLM / Artificial Intelligence trained on my own BookStack library in order to find what past notes?#5611

Open

1 task

Copy link

jrejaud commentedJun 5, 2025

ChatGPT just opened integrations, would be lit if I could use ChatGPT to search Bookstack :)

Copy link

ItsNoted commentedJul 31, 2025

This is awesome Dan! Would be even more amazing to see integration with Ollama so we can self host everything locally and not have to worry about calling ChatGPT :)

Copy link

MemberAuthor

ssddanbrown commentedAug 1, 2025

@ItsNoted local LLM support would be an important requirement really for an implementation since I know that matters a lot to our kind of audience, and when building I tested this in bulk using Ollama (since they are fairly OpenAI API compatible) and all seemed fairly functional.

ssddanbrown added3 commits

August 17, 2025 09:43

Vectors: Built content vector indexing system

2d55482

Vectors: Got basic LLM querying working using vector search context

b9ecf55

Vectors: Added command to regenerate for all

e611b32

Also made models configurable.Tested system scales via 86k vector entries.

ssddanbrown force-pushed thevectors branch froma023bed toe611b32Compare

August 17, 2025 08:52

ssddanbrownand others added5 commits

August 19, 2025 11:04

Improved vector text chunking

54f883e

Vectors: Started front-end work, moved to own controller

2c3100e

Vectors: Split out vector search and llm query runs

88ccd9e

Added a formal object type to carry across vector search results.Added permission application and entity combining with vector searchresults.Also updated namespace from vectors to queries.

Vectors: Updated query response to use server-side-events

8eef5a1

Allowing the vector query results and the LLM response to each come backover the same HTTP request at two different times via a somewhatstandard.Uses a package for JS SSE client, since native browser client does notsupport over POST, which is probably important for this endpoint as wedon't want crawlers or other bots abusing this via accidentally.

Vectors: Finished core fetch & display functionality

bb08f62

Copy link

ivano-buffa commentedSep 19, 2025

Hi@ssddanbrown,
Any update on this?

Little suggestion: perhaps CrewAI might help? Less mature that LangGraph but highly powerful.

The idea of indexing your own instance of Bookstack to privately "talk" to your own data and "retrieve" what you need, will definitely give to Bookstack the boost that it deserves.

My best.
~Ivano

Copy link

MemberAuthor

ssddanbrown commentedSep 19, 2025

@trendpx I spent some time on this last month, but got to a point where I was not happy with the vector search system. It didn't feel it was working well enough to justify the addition cost in a way which would generally work well out of the box. It can work for sure, but with a lot of environment/model/context specific tweaking.

I want to revisit this with an alternative & simpler approach, without vectors, to test this kind of flow:

User enters query -> LLM consulted to extract key search terms -> Terms used for BookStack search -> Relevant pages and original query sent to LLM for response.

I'd prefer not to require any specific extra external platforms/libraries where possible.

ssddanbrown mentioned this pull request

Sep 25, 2025

AI Agent Options#5807

Open

2 tasks

ssddanbrown mentioned this pull request

Nov 21, 2025

Feature Request: Native Semantic Search + Optional Natural-Language Q&A Layer (“Talk to Your Docs”)#5904

Open

1 task

Copy link

samschultzponsys commentedDec 2, 2025

This looks incredible. I would love to embed an agent into a site if thats possible, I know that would be out of the scope of bookstack and is more up to openwebui etc.

Labels

None yet

Movatterモバイル変換

Uh oh!

Vector Search & LLM Integration#5552

Are you sure you want to change the base?

Vector Search & LLM Integration#5552

Uh oh!

Conversation

ssddanbrown commentedMar 24, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Issues & Questionables

Considerations

Todo

Implementation Notes

Uh oh!

ssddanbrown commentedMay 15, 2025

Uh oh!

jrejaud commentedJun 5, 2025

Uh oh!

ItsNoted commentedJul 31, 2025

Uh oh!

ssddanbrown commentedAug 1, 2025

Uh oh!

ivano-buffa commentedSep 19, 2025

Uh oh!

ssddanbrown commentedSep 19, 2025

Uh oh!

samschultzponsys commentedDec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

6 participants

ssddanbrown commentedMar 24, 2025•
edited
Loading