Advanced RAG Techniques

Retrieval Augmented Generation (RAG) is a powerful approach that enhances AI responses by retrieving relevant information from a knowledge base. This guide explores advanced techniques for implementing RAG with the Memory API.

Understanding Semantic Search

Semantic search goes beyond simple keyword matching to find content based on meaning and intent. When using the Memory API’s semantic search capabilities:

Semantic search finds memories that are conceptually similar to your query, even if they don't contain the exact same words.

Optimizing RAG Queries

To get the most out of semantic search with the Memory API, consider these optimization techniques:

1. Query Formulation

The way you phrase your search query significantly impacts results:

// Less effective query
const basicQuery = "user preferences";

// More effective query with context and specificity
const enhancedQuery = "User preferences for interface customization and notification settings";

More specific, contextual queries yield better results by providing clearer semantic intent.

2. Filtering with Metadata

Combine semantic search with metadata filtering for more precise results:

// Semantic search with metadata filtering
const response = await fetch('https://api.example.com/memories/semantic-search', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: JSON.stringify({
    tenantId: "tenant-123",
    query: "User interface preferences",
    tags: ["preferences", "ui"],
    fromDate: "2023-01-01",
    limit: 5,
    minScore: 0.7
  })
});

3. Adjusting Similarity Thresholds

The minScore parameter controls the minimum similarity threshold for results:

// Higher threshold for stricter matching
const strictSearch = {
  query: "Dark mode preferences",
  minScore: 0.8  // Only very similar results
};

// Lower threshold for broader matching
const broadSearch = {
  query: "Dark mode preferences",
  minScore: 0.5  // More diverse results
};

Start with a threshold around 0.7 and adjust based on your specific needs. Lower thresholds return more results but may include less relevant matches.

Multi-Stage Retrieval

For complex queries, implement a multi-stage retrieval process:

Initial Broad Search: Retrieve a larger set of potentially relevant memories
Reranking: Apply additional criteria to rank the initial results
Filtering: Remove irrelevant or redundant information

// Example of multi-stage retrieval
async function multiStageRetrieval(query, tenantId) {
  // Stage 1: Initial broad search
  const initialResponse = await fetch('https://api.example.com/memories/semantic-search', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'Bearer YOUR_API_KEY'
    },
    body: JSON.stringify({
      tenantId,
      query,
      minScore: 0.5,
      limit: 20
    })
  });

  const initialResults = await initialResponse.json();

  // Stage 2: Reranking (example: prioritize recent memories)
  const reranked = initialResults.memories.sort((a, b) => {
    // Sort by recency and relevance
    const recencyScore = (new Date(b.createdAt) - new Date(a.createdAt)) / (1000 * 60 * 60 * 24);
    return (b.score * 0.7) + (recencyScore * 0.3) - (a.score * 0.7) - (recencyScore * 0.3);
  });

  // Stage 3: Filtering (example: remove duplicates)
  const uniqueContent = new Set();
  const filtered = reranked.filter(memory => {
    const isDuplicate = uniqueContent.has(memory.content);
    uniqueContent.add(memory.content);
    return !isDuplicate;
  });

  return filtered.slice(0, 5); // Return top 5 after processing
}

Hybrid Search Approaches

Combine different search methods for more robust results:

Entity-Based + Semantic Search

async function hybridSearch(query, entityId, tenantId) {
  // Get entity-related memories
  const entityResponse = await fetch(`https://api.example.com/entities/${entityId}/memories`, {
    headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
  });
  const entityMemories = await entityResponse.json();

  // Get semantically similar memories
  const semanticResponse = await fetch('https://api.example.com/memories/semantic-search', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'Bearer YOUR_API_KEY'
    },
    body: JSON.stringify({
      tenantId,
      query,
      minScore: 0.7
    })
  });
  const semanticMemories = await semanticResponse.json();

  // Combine and deduplicate results
  const allMemories = [...entityMemories.memories, ...semanticMemories.memories];
  const uniqueMemories = Array.from(new Map(allMemories.map(m => [m.id, m])).values());

  return uniqueMemories;
}

Context Building Strategies

Effective RAG isn’t just about retrieval—it’s about building coherent context:

1. Chronological Ordering

Order memories chronologically to maintain narrative coherence:

const orderedMemories = memories.sort((a, b) =>
  new Date(a.createdAt) - new Date(b.createdAt)
);

2. Relevance Weighting

Weight memories by relevance to the current query:

const weightedContext = memories.map(memory => ({
  content: memory.content,
  weight: memory.score // Similarity score from semantic search
}));

3. Entity-Centric Context

Build context around specific entities:

async function buildEntityContext(entityId) {
  // Get entity details
  const entityResponse = await fetch(`https://api.example.com/entities/${entityId}`, {
    headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
  });
  const entity = await entityResponse.json();

  // Get related memories
  const memoriesResponse = await fetch(`https://api.example.com/entities/${entityId}/memories`, {
    headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
  });
  const memories = await memoriesResponse.json();

  // Get related entities
  const relationsResponse = await fetch(`https://api.example.com/entities/${entityId}/relationships`, {
    headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
  });
  const relations = await relationsResponse.json();

  // Combine into rich context
  return {
    entity,
    memories: memories.memories,
    relations: relations.relationships
  };
}

When building context, focus on quality over quantity. Too much context can dilute the relevance of your results.

Performance Optimization

Optimize your RAG implementation for better performance:

Cache frequent queries: Store results for common queries to reduce processing time
Batch related requests: Combine multiple related queries into a single request
Progressive loading: Retrieve essential information first, then load additional details as needed

Best Practices for RAG Implementation

Start simple: Begin with basic semantic search before implementing advanced techniques
Test with real queries: Evaluate performance with actual user queries, not just theoretical examples
Iterate based on feedback: Continuously refine your approach based on the quality of results
Balance precision and recall: Adjust thresholds to find the right balance for your use case
Monitor performance: Track key metrics like response time and relevance to identify areas for improvement

The most effective RAG implementations are those that are continuously refined based on real-world usage patterns and feedback.

Congratulations!

You’ve completed the Memory API learning path! You now have a comprehensive understanding of:

Creating and managing different types of memories
Retrieving memories efficiently using various methods
Working with entities and their relationships
Implementing advanced RAG techniques for better context

With these skills, you’re well-equipped to build sophisticated AI applications that leverage the power of the Memory API for enhanced context and knowledge management.