Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Websearch feature WIP#76

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
dzirtusss wants to merge1 commit intoRenset:main
base:main
Choose a base branch
Loading
fromdzirtusss:sergey/websearch

Conversation

dzirtusss
Copy link

@dzirtusssdzirtusss commentedJan 31, 2025
edited
Loading

Websearch feature

Make it possible for AI to search the web if needed. As started in#67

Actullally in very ealy frankenstain state, mostly investigations atm.

cedbeu, adammenges, and statico reacted with thumbs up emoji
@dzirtusss
Copy link
Author

dzirtusss commentedJan 31, 2025
edited
Loading

@Renset this works in "some way" now, but creates a ton of furhter questions:

About macai code:

  1. thesendMessage flow is not very suitable for back&forth message sending (as needed for tools). So it looks like injection inside injection atm. Most probably, it will need better abstractions. Some message bus???
  2. GPT actually sends not 1 search request, but may continue a discussion with several requests to tools, I've observed 1,2,3 atm. to refine what it may be thinking. Thus tools implementation, really need to be multi-turn.
  3. closures vs async - this is my first code in swift, and your project, but worth upgrading overall atm
  4. will need "similar" logic for other GPTs on some final steps, at least for Claude.
  5. there should be logic to handle several different tools potentially

About websearch:

  1. search plugin - this is even more complicated itself. Atm, I've made a simple ruby wrapper for faster prototying, and found "free search-api server" for experimenting (but it was hard as most are paid or require registration). Good-enough for testing, but most probably it will need additional configuration for real-life and user api-keys (like most AI chats do, but not all).
  2. the plugin logic as well need to be more complex. I've initially naively thought to just pass-through it, but actually, search-api returns just urls, and plugin most probably will need to scrape say first 3 websearches to extract meaningful content of them. Maybe remove inline assets (js/css/svg/etc) for less tokens.
  3. maybe there can be another tool registered that will fetch the url (by gpt requests), so that gpt might orchestrate what he is interested in searching.

This all make me think of a need for plugins/extensions infrastructure for macai, as this way it will be easire to iterate on tools and/or create user-own tools. E.g. to have a "tools" folder which can be just whatever scripts, written in whatever language (ruby, python, shell, etc for easier iterating), which macai will just refer based on gpt requests.

So... overall, I'm looking how to do websearch search logic working really on some minimals.

To be continued...

image
Renset reacted with thumbs up emojiRenset reacted with eyes emoji

@dzirtusss
Copy link
Author

As well just found an exellent example of how to implement tools/plugins library - on Typing Mind, where user can create whatever he wants - mostly by adding a json and a code (which is a pure js).

Ideally this is something that should be implemented in macai - extendable and simple.

imageimage
Renset and cedbeu reacted with thumbs up emoji

@Renset
Copy link
Owner

@dzirtusss Thanks for sharing your insights and ideas!
Actually, I'll need some time to think about this and probably process it in the 'background thread' of my brain - I've been quite busy over the last week and haven't had enough time to see your changes before.

At first glance I think you are on the right track, but here are my initial thoughts:

  • A plugin system based on JS code looks promising indeed.
  • On the other hand, I always considered macai as asimple tool that is fun and pleasant to use (like good old Cocoa apps), and implementing such a flexible and powerful tool makes 'yet another' Typing Mind - is this good? Honestly, I don't know yet
  • I have some doubts about my Swift skills to implement a plugin system in the 'right' way (i.e. secure, extensible and with good performance), so we'll probably need more effort from you, and probably from someone else with good Swift skills.
  • I'm pretty sure there are some 'basic' features that should be implemented anyway (with or without plugin system): web search, file upload and more code runners for shell, python etc.

@Renset
Copy link
Owner

Renset commentedFeb 20, 2025
edited
Loading

@dzirtusss for web search function, I see that this approach might work well:

  1. Convert user prompt to search query, using prepaired system prompt
  2. Use some free search API (the most questionable part, because we have to ask users obtain their own API Key for most services even in free tier), but I really see no alternative there
  3. Select 1-5 top links (based on settings?)
  4. Integratefirecrawl to convert webpages into markdown documents.
  5. Simulate user message and integrate markdown as a part of request

Also, I believe this workflow can be achieved without implementing complex plugin system

What do you think?

@dzirtusss
Copy link
Author

What you try to do is actually some very simplistic implementation of what "tools" protocol do.

It might work, but I guess it will not be what it is for. The whole idea of the web search is not to search and then pass to the model, but let the model itself decide does it needs to search or not. And what info is missing. (at least all other tools implement this way).

IMO:

  1. I guess you don't need this in general. It is a model that generates the "query", you just need to pass it to search-api as "query text". As I observed, models are rather smart for this. Plus model can generate several queries depending on the results. So... it is like a tool only.
  2. I don't see either. However, google has generous tier for "ordinary" use. Andhttps://wiby.me/json/ is totally free w/o api-key or registration, however, I know nothing about that service. I mean, api keys is a goto.
  3. Maybe. Or do even differently. As I digged a bit deeper of what other progs are doing, they have 2 services - web-search (which is the search itself - e.g. calling google-search-api), and web-browsing. If model is supplied with those 2 tools, I guess it is smart enough to generate calls to web-browsing.
  4. Yes, this I also observed - progs do convert page to markdown (vs sendind raw html). I guess mostly to save on tokens, as IMO model is smart-enough to read html. But as well there can be JS non-server-rendered pages, or bot blocking by User-Agent, or... just to quickly name a few possible obstacles. However even ability to fetch some good-behaving pages might be very helpful already.
  5. It might work.

Actually, I guess, what I'd be implementing:

  1. Add ability to send messages back & forth with the model (the "tools protocol"). This is rudimentary done in my PR in not proper way. This will solve 80% of problems and enable whatever other "tools/plugins" easily.
  2. After p1 writing tool/plugin is trivial, even if you want to stay "in-house" and do it in the app only. As tool by design is just a function, that accepts input as a message, do something with it (search, crawl, whatever) and return output as a message.
  3. Basically you even not need to hide "tool" messages. It may be a feature. More chats now try to show this part of interaction as well, not hide it.

So... I guess you will anyway come to implementing this. Just in one step or with "shortcuts". This is where the direction is going - think "thinking models", RAG, etc. It is model that asks something and gets the answer. And chat UI need to be capable of supporting this.

Sorry if I make your task heavier or something that you tried to "shortcut". I just don't feel it is correct way, sorry.

PS offtop not related to websearch - there is nowhttps://github.com/MachatoApp/machato OSS, you might look for some inspiration there.

Renset reacted with thumbs up emoji

@Renset
Copy link
Owner

Renset commentedFeb 22, 2025
edited
Loading

@dzirtusss

What you try to do is actually some very simplistic implementation of what "tools" protocol do.

I see your point, and thanks for putting so much effort into this and sharing your ideas!

I've been thinking about it, and while I see the potential, I'm still leaning towards keeping macai relatively simple. I want something that's easy to use and maintain, and I'm not sure the tools protocol is the way to goat this point. It feels like it could add a lot of complexity. Also, I don't plan to compete with more complex tools that are now releasing new features every day.

Also, to be honest, I don't have that much free time, and I don't really have the bandwidth to implement and maintain something that extensive right now.

But I'm still interested in exploring a simpler web search implementation, maybe along the lines of what I mentioned earlier with Firecrawl. Also, my vision is that it should be toggleable - let the user control whether LLM should search the web or not.

If you'd like to work on something like that, let me know.

Thanks again for your help!

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@dzirtusss@Renset

[8]ページ先頭

©2009-2025 Movatter.jp