Skip to content

Advanced RAG Techniques

Retrieval Augmented Generation (RAG) is a powerful approach that enhances AI responses by retrieving relevant information from a knowledge base. This guide explores advanced techniques for implementing RAG with the Memory API.

Semantic search goes beyond simple keyword matching to find content based on meaning and intent. When using the Memory API’s semantic search capabilities:

Semantic search finds memories that are conceptually similar to your query, even if they don't contain the exact same words.

To get the most out of semantic search with the Memory API, consider these optimization techniques:

The way you phrase your search query significantly impacts results:

// Less effective query
const basicQuery = "user preferences";
// More effective query with context and specificity
const enhancedQuery = "User preferences for interface customization and notification settings";

More specific, contextual queries yield better results by providing clearer semantic intent.

Combine semantic search with metadata filtering for more precise results:

// Semantic search with metadata filtering
const response = await fetch('https://api.example.com/memories/semantic-search', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_API_KEY'
},
body: JSON.stringify({
tenantId: "tenant-123",
query: "User interface preferences",
tags: ["preferences", "ui"],
fromDate: "2023-01-01",
limit: 5,
minScore: 0.7
})
});

The minScore parameter controls the minimum similarity threshold for results:

// Higher threshold for stricter matching
const strictSearch = {
query: "Dark mode preferences",
minScore: 0.8 // Only very similar results
};
// Lower threshold for broader matching
const broadSearch = {
query: "Dark mode preferences",
minScore: 0.5 // More diverse results
};
Start with a threshold around 0.7 and adjust based on your specific needs. Lower thresholds return more results but may include less relevant matches.

For complex queries, implement a multi-stage retrieval process:

  1. Initial Broad Search: Retrieve a larger set of potentially relevant memories
  2. Reranking: Apply additional criteria to rank the initial results
  3. Filtering: Remove irrelevant or redundant information
// Example of multi-stage retrieval
async function multiStageRetrieval(query, tenantId) {
// Stage 1: Initial broad search
const initialResponse = await fetch('https://api.example.com/memories/semantic-search', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_API_KEY'
},
body: JSON.stringify({
tenantId,
query,
minScore: 0.5,
limit: 20
})
});
const initialResults = await initialResponse.json();
// Stage 2: Reranking (example: prioritize recent memories)
const reranked = initialResults.memories.sort((a, b) => {
// Sort by recency and relevance
const recencyScore = (new Date(b.createdAt) - new Date(a.createdAt)) / (1000 * 60 * 60 * 24);
return (b.score * 0.7) + (recencyScore * 0.3) - (a.score * 0.7) - (recencyScore * 0.3);
});
// Stage 3: Filtering (example: remove duplicates)
const uniqueContent = new Set();
const filtered = reranked.filter(memory => {
const isDuplicate = uniqueContent.has(memory.content);
uniqueContent.add(memory.content);
return !isDuplicate;
});
return filtered.slice(0, 5); // Return top 5 after processing
}

Combine different search methods for more robust results:

async function hybridSearch(query, entityId, tenantId) {
// Get entity-related memories
const entityResponse = await fetch(`https://api.example.com/entities/${entityId}/memories`, {
headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
});
const entityMemories = await entityResponse.json();
// Get semantically similar memories
const semanticResponse = await fetch('https://api.example.com/memories/semantic-search', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_API_KEY'
},
body: JSON.stringify({
tenantId,
query,
minScore: 0.7
})
});
const semanticMemories = await semanticResponse.json();
// Combine and deduplicate results
const allMemories = [...entityMemories.memories, ...semanticMemories.memories];
const uniqueMemories = Array.from(new Map(allMemories.map(m => [m.id, m])).values());
return uniqueMemories;
}

Effective RAG isn’t just about retrieval—it’s about building coherent context:

Order memories chronologically to maintain narrative coherence:

const orderedMemories = memories.sort((a, b) =>
new Date(a.createdAt) - new Date(b.createdAt)
);

Weight memories by relevance to the current query:

const weightedContext = memories.map(memory => ({
content: memory.content,
weight: memory.score // Similarity score from semantic search
}));

Build context around specific entities:

async function buildEntityContext(entityId) {
// Get entity details
const entityResponse = await fetch(`https://api.example.com/entities/${entityId}`, {
headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
});
const entity = await entityResponse.json();
// Get related memories
const memoriesResponse = await fetch(`https://api.example.com/entities/${entityId}/memories`, {
headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
});
const memories = await memoriesResponse.json();
// Get related entities
const relationsResponse = await fetch(`https://api.example.com/entities/${entityId}/relationships`, {
headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
});
const relations = await relationsResponse.json();
// Combine into rich context
return {
entity,
memories: memories.memories,
relations: relations.relationships
};
}
When building context, focus on quality over quantity. Too much context can dilute the relevance of your results.

Optimize your RAG implementation for better performance:

  1. Cache frequent queries: Store results for common queries to reduce processing time
  2. Batch related requests: Combine multiple related queries into a single request
  3. Progressive loading: Retrieve essential information first, then load additional details as needed
  1. Start simple: Begin with basic semantic search before implementing advanced techniques
  2. Test with real queries: Evaluate performance with actual user queries, not just theoretical examples
  3. Iterate based on feedback: Continuously refine your approach based on the quality of results
  4. Balance precision and recall: Adjust thresholds to find the right balance for your use case
  5. Monitor performance: Track key metrics like response time and relevance to identify areas for improvement
The most effective RAG implementations are those that are continuously refined based on real-world usage patterns and feedback.

You’ve completed the Memory API learning path! You now have a comprehensive understanding of:

  1. Creating and managing different types of memories
  2. Retrieving memories efficiently using various methods
  3. Working with entities and their relationships
  4. Implementing advanced RAG techniques for better context

With these skills, you’re well-equipped to build sophisticated AI applications that leverage the power of the Memory API for enhanced context and knowledge management.