Enhance GitHub Copilot with RAG in VS Code – Part 3

This is a third part of a series of posts where I am writing my learning building a GitHub Copilot extension, RAG, using your own model etc. 

- First part here
- Second part here

Introduction

Demo Extension in Marketplace

As AI continues to revolutionize software development, GitHub Copilot has emerged as a powerful coding assistant. However, there are scenarios where you might want to enhance Copilot’s capabilities with domain-specific knowledge using Retrieval-Augmented Generation (RAG). In this article, I’ll walk you through creating a custom VS Code extension that integrates with GitHub Copilot while leveraging your own RAG system with Azure OpenAI.

What is Retrieval-Augmented Generation (RAG)?

RAG combines the generative capabilities of large language models with a retrieval system that fetches relevant information from a knowledge base. Instead of relying solely on the model’s pre-trained knowledge, RAG allows the model to reference specific, up-to-date information before generating a response.

Benefits of RAG for developer tools include:

Domain-specific expertise: Access to specialized documentation or codebase knowledge
Reduced hallucinations: Ground the AI responses in factual information
Up-to-date information: Reference the latest documentation that might not be in the model’s training data

Architecture Overview

Our VS Code extension architecture consists of:

VS Code Extension Frontend: Handles user interactions and displays AI responses
Command Processors: Process different types of user queries
Backend RAG API: A .NET Core API that performs:
- Vector embedding of user queries
- Semantic search against knowledge bases
- Integration with Azure OpenAI for response generation

Implementation Guide

Setting Up the VS Code Extension

First, create a new VS Code extension project:

npm install -g yo generator-code
yo code

Creating the Command Processor

The command processor is responsible for handling user requests, sending them to your RAG backend, and streaming the responses back to the user:

import * as vscode from 'vscode';
import logger from '../common/logger';
import handleError from '../common/errorHandler';
import { Utils, SlimChatMessage, SlimChatRoles } from '../common/utils';
import { IOctolampChatResult } from '../participants/IOctolampChatResult';

const CommandRagProcessor = {
    processCommand: async (
        request: vscode.ChatRequest,
        context: vscode.ChatContext,
        stream: vscode.ChatResponseStream,
        token: vscode.CancellationToken): Promise<IOctolampChatResult> => {

        try {
            const history = Utils.getChatHistory(context);
            
            // Keep conversation context manageable
            const trimmedHistory = history.slice(-5);
            trimmedHistory.push({
                role: SlimChatRoles.user,
                content: request.prompt
            });

            const payload = {
                id: (request.toolInvocationToken as any).sessionId,
                command: request.command,
                turns: trimmedHistory
            };

            // Call your RAG backend API
            const response = await fetch('https://your-rag-api-url/vscode/chat', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                },
                body: JSON.stringify(payload)
            });

            if (response && response.body != null && response.ok) {
                // Stream the response back to VS Code
                const reader = response.body.getReader();
                const decoder = new TextDecoder();
                
                while (true) {
                    const { done, value } = await reader.read();
                    if (done) break;

                    const chunk = decoder.decode(value, { stream: true });
                    stream.markdown(chunk);
                }
            }
        } catch (err) {
            handleError(logger, err, stream);
        }
        return { metadata: { command: request.command } };
    }
};

export { CommandRagProcessor };

Building the RAG Backend

The RAG backend is responsible for:

Receiving queries from the VS Code extension
Generating embeddings for the query
Searching a vector database for relevant content
Augmenting the prompt with retrieved information
Generating and streaming a response using Azure OpenAI

Here’s a simplified .NET Core API endpoint that handles these responsibilities:

[ApiController]
[Route("vscode/chat")]
public class ChatController : ControllerBase
{
    private readonly EmbeddingService _embeddingService;
    private readonly SearchService _searchService;
    private readonly CopilotService _copilotService;

    public ChatController(
        EmbeddingService embeddingService,
        SearchService searchService,
        CopilotService copilotService)
    {
        _embeddingService = embeddingService;
        _searchService = searchService;
        _copilotService = copilotService;
    }

    [HttpPost]
    public async Task HandleAsync(
        [FromBody] VSCodeConversation conversation,
        CancellationToken cancellationToken)
    {
        if (conversation?.Turns == null || !conversation.Turns.Any())
        {
            Response.StatusCode = 400;
            await Response.WriteAsync("Invalid payload", cancellationToken);
            return;
        }

        // Get the last user message
        var lastTurn = conversation.Turns.Last();
        
        // Generate embedding for the query
        var queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(
            lastTurn.Content, 
            cancellationToken);
        
        // Search for relevant content
        var contentRecords = await _searchService.SearchAsync(
            queryEmbedding, 
            cancellationToken);
        
        // Augment the conversation with retrieved knowledge
        if (contentRecords?.Any() == true)
        {
            foreach (var item in contentRecords)
            {
                conversation.Turns.Add(new VSCodeChatTurn { 
                    Role = "system", 
                    Content = $"Knowledge reference: {item.Content}" 
                });
            }
        }
        
        // Generate and stream the response
        await _copilotService.GenerateResponseAsync(
            conversation.ToOpenAICompatible(), 
            Response.BodyWriter, 
            cancellationToken);
    }
}

Connecting the Extension to the Chat Interface

To integrate your extension with VS Code’s chat interface, you need to register a chat participant:

// Register chat participant
context.subscriptions.push(
    vscode.chat.registerChatParticipant('rag-copilot', {
        name: 'RAG Copilot',
        icon: vscode.Uri.joinPath(context.extensionUri, 'resources', 'icon.png'),
        supportCommands: ['/rag-search'],
        helpTextPrefix: 'Ask anything about your codebase or documentation.',
        handleCommand: CommandRagProcessor.processCommand
    })
);

Setting Up the Search Service

The search service is a crucial component that retrieves relevant information from your knowledge base:

public class SearchService
{
    private readonly IVectorDatabase _vectorDb;

    public SearchService(IVectorDatabase vectorDb)
    {
        _vectorDb = vectorDb;
    }

    public async Task<ContentRecord[]> SearchAsync(float[] queryEmbedding, CancellationToken cancellationToken)
    {
        // Search the vector database for similar content
        var results = await _vectorDb.SearchAsync(
            queryEmbedding,
            limit: 5,
            scoreThreshold: 0.7f,
            cancellationToken);
            
        return results;
    }
}

Deploying Your RAG System

To deploy your RAG system:

Set up an Azure OpenAI resource
Deploy a vector database (Azure Cognitive Search or a specialized vector DB)
Deploy your .NET Core backend as an Azure App Service
Package and publish your VS Code extension

Testing and Usage

Once deployed, users can interact with your RAG-enabled extension:

Open VS Code
Access the Chat interface (Ctrl+Shift+P > “Chat”)
Use your custom command (e.g., /rag-search How do I implement authentication in our API?)
The extension will send the query to your backend
The backend will retrieve relevant information and augment the prompt
Azure OpenAI will generate a contextually relevant response
The response will stream back to VS Code

Conclusion

By building a custom VS Code extension with RAG capabilities, you can significantly enhance the developer experience with contextually relevant AI assistance. This approach combines the power of GitHub Copilot with your organization’s specific knowledge, making it an invaluable tool for development teams.

The integration between VS Code’s extension API, a custom RAG backend, and Azure OpenAI creates a seamless experience that feels like having an expert colleague who knows your codebase inside out.

As RAG technology continues to evolve, we can expect even more sophisticated integrations that further blur the line between AI assistance and human expertise.