
This is a third part of a series of posts where I am writing my learning building a GitHub Copilot extension, RAG, using your own model etc.
- First part here
- Second part here
Introduction
Demo Extension in Marketplace
As AI continues to revolutionize software development, GitHub Copilot has emerged as a powerful coding assistant. However, there are scenarios where you might want to enhance Copilot’s capabilities with domain-specific knowledge using Retrieval-Augmented Generation (RAG). In this article, I’ll walk you through creating a custom VS Code extension that integrates with GitHub Copilot while leveraging your own RAG system with Azure OpenAI.
What is Retrieval-Augmented Generation (RAG)?
RAG combines the generative capabilities of large language models with a retrieval system that fetches relevant information from a knowledge base. Instead of relying solely on the model’s pre-trained knowledge, RAG allows the model to reference specific, up-to-date information before generating a response.
Benefits of RAG for developer tools include:
- Domain-specific expertise: Access to specialized documentation or codebase knowledge
- Reduced hallucinations: Ground the AI responses in factual information
- Up-to-date information: Reference the latest documentation that might not be in the model’s training data
Architecture Overview
Our VS Code extension architecture consists of:
- VS Code Extension Frontend: Handles user interactions and displays AI responses
- Command Processors: Process different types of user queries
- Backend RAG API: A .NET Core API that performs:
- Vector embedding of user queries
- Semantic search against knowledge bases
- Integration with Azure OpenAI for response generation
Implementation Guide
Setting Up the VS Code Extension
First, create a new VS Code extension project:
npm install -g yo generator-code
yo code
Creating the Command Processor
The command processor is responsible for handling user requests, sending them to your RAG backend, and streaming the responses back to the user:
import * as vscode from 'vscode';
import logger from '../common/logger';
import handleError from '../common/errorHandler';
import { Utils, SlimChatMessage, SlimChatRoles } from '../common/utils';
import { IOctolampChatResult } from '../participants/IOctolampChatResult';
const CommandRagProcessor = {
processCommand: async (
request: vscode.ChatRequest,
context: vscode.ChatContext,
stream: vscode.ChatResponseStream,
token: vscode.CancellationToken): Promise<IOctolampChatResult> => {
try {
const history = Utils.getChatHistory(context);
// Keep conversation context manageable
const trimmedHistory = history.slice(-5);
trimmedHistory.push({
role: SlimChatRoles.user,
content: request.prompt
});
const payload = {
id: (request.toolInvocationToken as any).sessionId,
command: request.command,
turns: trimmedHistory
};
// Call your RAG backend API
const response = await fetch('https://your-rag-api-url/vscode/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(payload)
});
if (response && response.body != null && response.ok) {
// Stream the response back to VS Code
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
stream.markdown(chunk);
}
}
} catch (err) {
handleError(logger, err, stream);
}
return { metadata: { command: request.command } };
}
};
export { CommandRagProcessor };
Building the RAG Backend
The RAG backend is responsible for:
- Receiving queries from the VS Code extension
- Generating embeddings for the query
- Searching a vector database for relevant content
- Augmenting the prompt with retrieved information
- Generating and streaming a response using Azure OpenAI
Here’s a simplified .NET Core API endpoint that handles these responsibilities:
[ApiController]
[Route("vscode/chat")]
public class ChatController : ControllerBase
{
private readonly EmbeddingService _embeddingService;
private readonly SearchService _searchService;
private readonly CopilotService _copilotService;
public ChatController(
EmbeddingService embeddingService,
SearchService searchService,
CopilotService copilotService)
{
_embeddingService = embeddingService;
_searchService = searchService;
_copilotService = copilotService;
}
[HttpPost]
public async Task HandleAsync(
[FromBody] VSCodeConversation conversation,
CancellationToken cancellationToken)
{
if (conversation?.Turns == null || !conversation.Turns.Any())
{
Response.StatusCode = 400;
await Response.WriteAsync("Invalid payload", cancellationToken);
return;
}
// Get the last user message
var lastTurn = conversation.Turns.Last();
// Generate embedding for the query
var queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(
lastTurn.Content,
cancellationToken);
// Search for relevant content
var contentRecords = await _searchService.SearchAsync(
queryEmbedding,
cancellationToken);
// Augment the conversation with retrieved knowledge
if (contentRecords?.Any() == true)
{
foreach (var item in contentRecords)
{
conversation.Turns.Add(new VSCodeChatTurn {
Role = "system",
Content = $"Knowledge reference: {item.Content}"
});
}
}
// Generate and stream the response
await _copilotService.GenerateResponseAsync(
conversation.ToOpenAICompatible(),
Response.BodyWriter,
cancellationToken);
}
}
Connecting the Extension to the Chat Interface
To integrate your extension with VS Code’s chat interface, you need to register a chat participant:
// Register chat participant
context.subscriptions.push(
vscode.chat.registerChatParticipant('rag-copilot', {
name: 'RAG Copilot',
icon: vscode.Uri.joinPath(context.extensionUri, 'resources', 'icon.png'),
supportCommands: ['/rag-search'],
helpTextPrefix: 'Ask anything about your codebase or documentation.',
handleCommand: CommandRagProcessor.processCommand
})
);
Setting Up the Search Service
The search service is a crucial component that retrieves relevant information from your knowledge base:
public class SearchService
{
private readonly IVectorDatabase _vectorDb;
public SearchService(IVectorDatabase vectorDb)
{
_vectorDb = vectorDb;
}
public async Task<ContentRecord[]> SearchAsync(float[] queryEmbedding, CancellationToken cancellationToken)
{
// Search the vector database for similar content
var results = await _vectorDb.SearchAsync(
queryEmbedding,
limit: 5,
scoreThreshold: 0.7f,
cancellationToken);
return results;
}
}
Deploying Your RAG System
To deploy your RAG system:
- Set up an Azure OpenAI resource
- Deploy a vector database (Azure Cognitive Search or a specialized vector DB)
- Deploy your .NET Core backend as an Azure App Service
- Package and publish your VS Code extension
Testing and Usage
Once deployed, users can interact with your RAG-enabled extension:
- Open VS Code
- Access the Chat interface (Ctrl+Shift+P > “Chat”)
- Use your custom command (e.g.,
/rag-search How do I implement authentication in our API?) - The extension will send the query to your backend
- The backend will retrieve relevant information and augment the prompt
- Azure OpenAI will generate a contextually relevant response
- The response will stream back to VS Code
Conclusion
By building a custom VS Code extension with RAG capabilities, you can significantly enhance the developer experience with contextually relevant AI assistance. This approach combines the power of GitHub Copilot with your organization’s specific knowledge, making it an invaluable tool for development teams.
The integration between VS Code’s extension API, a custom RAG backend, and Azure OpenAI creates a seamless experience that feels like having an expert colleague who knows your codebase inside out.
As RAG technology continues to evolve, we can expect even more sophisticated integrations that further blur the line between AI assistance and human expertise.