- Notifications
You must be signed in to change notification settings - Fork352
move santi blog from hypercloud to postgresml#698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Merged
Uh oh!
There was an error while loading.Please reload this page.
Merged
Changes fromall commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Jump to file
Failed to load files.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
There are no files selected for viewing
2 changes: 1 addition & 1 deletionpgml-dashboard/Cargo.lock
Some generated files are not rendered by default. Learn more abouthow customized files appear on GitHub.
Oops, something went wrong.
Uh oh!
There was an error while loading.Please reload this page.
2 changes: 2 additions & 0 deletionspgml-dashboard/src/api/docs.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
75 changes: 75 additions & 0 deletions...-sdk-build-end-to-end-vector-search-applications-without-openai-and-pinecone.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| --- | ||
| author:Santi Adavani | ||
| description:The PostgresML Python SDK is designed to facilitate the development of end-to-end vector search applications without OpenAI and Pinecone. With this SDK, you can seamlessly manage various database tables related to documents, text chunks, text splitters, LLM (Large Language Model) models, and embeddings. By leveraging the SDK's capabilities, you can efficiently index LLM embeddings using PgVector for fast and accurate queries. | ||
| image:https://postgresml.org/dashboard/static/images/blog/sdk_code.png | ||
| image_alt:"Introducing PostgresML Python SDK: Build End-to-End Vector Search Applications without OpenAI and Pinecone" | ||
| --- | ||
| #Introducing PostgresML Python SDK: Build End-to-End Vector Search Applications without OpenAI and Pinecone | ||
| <divclass="d-flex align-items-center mb-4"> | ||
| <imgwidth="54px"height="54px"src="/dashboard/static/images/team/santi.jpg"style="border-radius:50%;"alt="Author" /> | ||
| <divclass="ps-3 d-flex justify-content-center flex-column"> | ||
| <p class="m-0">Santi Adavani</p> | ||
| <p class="m-0">June 01, 2023</p> | ||
| </div> | ||
| </div> | ||
| We are excited to introduce a Python SDK for PostgresML that streamlines the development of scalable vector search applications on PostgreSQL databases. Traditionally, building a vector search application requires spinning up an application database, connecting to external OpenAI or HuggingFace REST API services for generating embeddings, and integrating with vector databases like Pinecone for indexing and search. This approach increases infrastructure footprint, maintenance efforts, and query latency. | ||
| With the PostgresML Python SDK, developers now have a unified solution. They can effortlessly manage a single application database where they can handle: document management, embedding generation, indexing, and searching. This eliminates the need for multiple infrastructure components, simplifies maintenance, and reduces query latencies. The SDK offers a comprehensive set of tools for managing database tables related to documents, text chunks, text splitters, LLM models, and embeddings, enabling seamless integration of advanced search functionalities. | ||
| <imgsrc="/dashboard/static/images/blog/sdk_code.png"alt="Sample code to build a vector search application using Python SDK"> | ||
| ##Key Features | ||
| ###Automated Database Management | ||
| The Python SDK automates the management of various database tables, eliminating the complexity of setting up and maintaining the data structure required for vector search applications. With this automated system, you can focus on building robust search functionalities while the SDK handles the underlying database management. | ||
| ###Embedding Generation from Open Source Models | ||
| Leveraging the Python SDK, you gain access to a vast collection of open source models. These models have been trained on extensive datasets and capture the semantic meaning of text. With just a few lines of code, you can generate embeddings using these models, enabling powerful analysis and search capabilities in your application. | ||
| ###Flexible and Scalable Vector Search | ||
| The Python SDK seamlessly integrates with PgVector, a PostgreSQL extension designed for efficient vector-based indexing and querying. By leveraging the power of PgVector, you can perform advanced searches, rank results by relevance, and retrieve accurate and meaningful information from your database. The SDK ensures that your vector search application scales effortlessly to handle increasing amounts of data. | ||
| ##How the Python SDK Works | ||
| The Python SDK simplifies the development of vector search applications by abstracting away the complexities of database management and indexing. Here's an overview of how it works: | ||
| ###Document and Text Chunk Management | ||
| The SDK simplifies the process of upserting documents and generating text chunks by offering a user-friendly interface. It allows you to effortlessly add and configure various text splitters to generate text chunks of different sizes, overlaps, and file formats, such as Python and Markdown. | ||
| ###Open Source Model Integration | ||
| With the SDK, you can seamlessly incorporate a wide range of open source models from HuggingFace into your application. These models capture the semantic meaning of text and enable powerful analysis and search capabilities. Generating high-quality embeddings from these models is a breeze with the Python SDK. | ||
| ###Embedding Indexing | ||
| The Python SDK utilizes the PgVector extension to efficiently index the embeddings generated by the open source models. This indexing process optimizes search performance and allows for fast and accurate retrieval of relevant results, even with large volumes of data. | ||
| ###Querying and Search | ||
| Once the embeddings are indexed, the SDK provides intuitive methods for executing vector-based searches on the documents and text chunks stored in the PostgreSQL database. You can easily execute queries and retrieve search results with precise and relevant information. | ||
| ##Use Cases | ||
| The Python SDK's embedding capabilities find applications in various scenarios, including: | ||
| ###Search | ||
| By comparing embeddings of query strings and documents, you can retrieve search results ranked by their relevance or similarity to the query. This allows users to find the most relevant information quickly and effectively. | ||
| ###Clustering | ||
| Utilizing embeddings, you can group text strings based on their similarity. By measuring the similarity between embeddings, you can identify clusters or groups of text strings that share common characteristics, providing valuable insights for data analysis. | ||
| ###Recommendations | ||
| Embeddings play a crucial role in recommendation systems. By identifying items with related text strings based on their embeddings, you can deliver personalized recommendations to users, enhancing user experience and engagement. | ||
| ###Anomaly Detection | ||
| Anomaly detection involves identifying outliers or anomalies in data. By quantifying the similarity between text strings using embeddings, you can identify anomalies that have little relatedness to the rest of the data, aiding in anomaly detection tasks. | ||
| ###Classification | ||
| Embeddings are valuable in classification tasks, where text strings are classified based on their most similar label. By comparing the embeddings of text strings and labels, you can accurately classify new text strings into predefined categories. | ||
| ##Get Started with the Python SDK | ||
| To get started with the Python SDK for scalable vector search on PostgreSQL, visit our[GitHub repository](https://github.com/postgresml/postgresml/tree/master/pgml-sdks/python/pgml). You'll find comprehensive documentation, code examples, and installation instructions to help you integrate the SDK into your projects seamlessly. | ||
| We're excited to see how the Python SDK transforms your vector search applications, enabling fast, accurate, and scalable search functionalities. Should you have any questions or need assistance please do not hesitate to reach out to us on[Discord](https://discord.gg/DmyJP3qJ7U) or send an[email](mailto:team@postgresml.org). | ||
| Happy coding and happy searching! | ||
Binary file addedpgml-dashboard/static/images/blog/sdk_code.png
Loading
Sorry, something went wrong.Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.