Monday, September 2, 2024
Building LLM Applications with OpenAI, Vector DBs and LangChain
Posted by

Introduction: The Modern LLM Application Stack
Large Language Models (LLMs) have revolutionized how we build AI applications. In this guide, we'll explore how to combine OpenAI's powerful models with vector databases and LangChain to create sophisticated AI applications. We'll cover everything from basic setup to advanced patterns for production deployment.
Setting Up the Environment
First, let's install the required dependencies:
pip install openai langchain chromadb python-dotenv
Core Components
1. OpenAI Integration
First, let's set up OpenAI integration with proper environment management:
from dotenv import load_dotenv
import os
from openai import OpenAI
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def get_completion(prompt: str, model="gpt-3.5-turbo") -> str:
messages = [{"role": "user", "content": prompt}]
response = client.chat.completions.create(
model=model,
messages=messages,
temperature=0.7
)
return response.choices[0].message.content
2. Vector Database Setup
We'll use Chroma as our vector store. Here's how to set it up:
import chromadb
from chromadb.config import Settings
client = chromadb.Client(Settings(
chroma_db_impl="duckdb+parquet",
persist_directory="db"
))
collection = client.create_collection(
name="documents",
metadata={"hnsw:space": "cosine"}
)
3. LangChain Integration
LangChain helps orchestrate the interaction between OpenAI and our vector store:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
embeddings = OpenAIEmbeddings()
text_splitter = CharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
def create_knowledge_base(documents):
texts = text_splitter.split_documents(documents)
vectorstore = Chroma.from_documents(
documents=texts,
embedding=embeddings,
persist_directory="db"
)
return vectorstore
Building the Application
Let's combine these components to create a question-answering system:
def create_qa_chain(vectorstore):
qa_chain = RetrievalQA.from_chain_type(
llm=OpenAI(),
chain_type="stuff",
retriever=vectorstore.as_retriever(),
return_source_documents=True
)
return qa_chain
def query_documents(qa_chain, query: str):
response = qa_chain({"query": query})
return {
"answer": response["result"],
"sources": [doc.page_content for doc in response["source_documents"]]
}
Production Best Practices
1. Error Handling
Always implement robust error handling:
from typing import Optional, Dict, Any
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def safe_query(qa_chain, query: str) -> Optional[Dict[str, Any]]:
try:
return query_documents(qa_chain, query)
except Exception as e:
logger.error(f"Error processing query: {str(e)}")
return None
2. Rate Limiting and Caching
Implement rate limiting and caching to optimize API usage:
from functools import lru_cache
from ratelimit import limits, sleep_and_retry
ONE_MINUTE = 60
MAX_CALLS_PER_MINUTE = 60
@sleep_and_retry
@limits(calls=MAX_CALLS_PER_MINUTE, period=ONE_MINUTE)
@lru_cache(maxsize=1000)
def cached_completion(prompt: str) -> str:
return get_completion(prompt)
3. Environment Configuration
Use environment variables for configuration:
# .env
OPENAI_API_KEY=your-api-key
EMBEDDING_MODEL=text-embedding-ada-002
COMPLETION_MODEL=gpt-3.5-turbo
MAX_TOKENS=500
Deployment Considerations
- Scalability: Use async operations for better performance:
async def async_process_queries(queries: List[str]):
tasks = [safe_query(qa_chain, query) for query in queries]
return await asyncio.gather(*tasks)
- Monitoring: Implement proper logging and monitoring:
import prometheus_client as prom
query_latency = prom.Histogram('query_latency_seconds', 'Time spent processing queries')
query_counter = prom.Counter('queries_total', 'Total number of queries processed')
- Cost Management: Track token usage:
def estimate_tokens(text: str) -> int:
return len(text.split()) * 1.3 # Rough estimate
def track_usage(prompt: str, response: str):
prompt_tokens = estimate_tokens(prompt)
response_tokens = estimate_tokens(response)
logger.info(f"Usage - Prompt: {prompt_tokens}, Response: {response_tokens}")
Conclusion
Building LLM applications requires careful consideration of various components and their integration. By following these patterns and best practices, you can create robust, production-ready applications that leverage the power of OpenAI's models while maintaining scalability and reliability.
Remember to:
- Always handle API keys securely
- Implement proper error handling and monitoring
- Consider rate limits and costs
- Cache responses when possible
- Use async operations for better performance
The complete code for this tutorial is available on GitHub.