.NET · .net-core · AI Foundry · AKS · API-Management · API-Management-Service · APIM · Architecture · Azure · Azure Active Directory · Azure AI Search · Azure Cognitive Services · Azure Container Registry · Azure Open Ai · AzureContainerApps · Bicep · Bicep · Bot Service · containerd · docker · Entra · IAC · Identity · managed-identity · OAuth 2.0

Building Backstage AI Chat Plugin with Azure AI Foundry Agents

This post shows how to add an AI assistant plugin to Backstage that lets internal developers chat with their own domain data using Azure AI Foundry Agents. We cover agent concepts, minimal API usage, plugin structure (frontend + lightweight backend proxy), security (secrets & identity), and deployment guidance.

1. Why an AI Chat Plugin in Backstage?

Backstage is the natural hub for internal dev workflows (catalog, templates, docs). Embedding an AI agent there:

  • Removes context switching (ask infra / service questions in‑portal)
  • Surfaces curated organizational knowledge (vector / index retrieval)
  • Enables secure, auditable interactions over internal data

2. Azure AI Foundry Agents (Overview)

Azure AI Foundry (successor to Azure OpenAI studio + broader orchestration) introduces an Agents capability: a managed orchestration layer that couples foundation models (GPT, Phi, etc.), tools, retrieval (vector / hybrid), and multi‑turn state.

Key ideas:

  • Agent: Stateful chat orchestrator bound to a model + optional knowledge (indexes) + tools.
  • Session / Thread: Multi-turn context container (conversation memory managed service-side).
  • Message: User or assistant turn; retrieval augmentation automatically applied when configured.
  • Retrieval / Your Data: You attach a data source (e.g., Azure AI Search or Vector Store) to enable RAG.
  • Tool invocation: (Optional) Functions or connectors that the agent can call.

Typical flow:

  1. Create (or reuse) an agent specifying model + retrieval config.
  2. Create a thread (session) for the user.
  3. Append a user message.
  4. Run the agent → poll until completion.
  5. Read assistant messages → stream back to UI.

2.1 Minimal REST Sequence (Illustrative)

POST https://<your-ai-endpoint>/openai/agents?api-version=2024-05-01-preview
Authorization: Bearer <token>
Content-Type: application/json

{ "model": "gpt-4o-mini", "name": "backstage-docs-agent", "instructions": "You answer questions about our internal services succinctly." }

POST https://<endpoint>/openai/agents/<agentId>/threads?api-version=2024-05-01-preview
{}

POST https://<endpoint>/openai/agents/<agentId>/threads/<threadId>/messages?api-version=2024-05-01-preview
{ "role": "user", "content": "How do I deploy the payments service?" }

POST https://<endpoint>/openai/agents/<agentId>/threads/<threadId>/runs?api-version=2024-05-01-preview
{ "type": "auto" }

GET  https://<endpoint>/openai/agents/<agentId>/threads/<threadId>/messages?api-version=2024-05-01-preview

Exact version / schema may evolve—always check the latest Azure AI Foundry REST reference.

2.2 Authentication

Recommended hierarchy:

  1. Managed Identity (system or user-assigned) when plugin backend runs in Azure (Container Apps / AKS / App Service).
  2. Workload Identity Federation (CI/CD pipelines, GitHub OIDC) for deployments.
  3. Azure Entra App Registration + client secret only if above not available (store secret in Key Vault, inject at runtime).

Use the latest @azure/identity and (when GA) dedicated Agents SDK; until then, call REST with a token from DefaultAzureCredential.

3. Target User Flow

  1. Developer opens Backstage → AI Chat tab.
  2. A thread is created (or resumed) per user session.
  3. User types a question; frontend calls Backstage backend endpoint /api/ai-chat/send.
  4. Backend ensures (or lazily creates) agent + thread, sends user message and run request.
  5. Polls or streams assistant response, returns incremental tokens to UI.
  6. UI renders messages (user/assistant), shows typing indicator and sources (citations) if present.

4. Plugin Structure (Existing Folder)

plugins/ai-chat/
	package.json
	src/
		plugin.ts        # Backstage plugin registration
		index.ts         # Export entry
		components/ChatPage/*.tsx  # UI components (MessageList, MessageInput, TypingIndicator)
		components/ChatPage/ChatService.ts  # Frontend service wrapper

We add a lightweight backend extension route (either inside packages/backend/src or as a backend plugin) to proxy secure calls to Azure AI Foundry, shielding tokens and enabling managed identity.

5. Backend Proxy (C# Minimal API with Streaming – Existing Implementation)

Instead of a Node/Express proxy, this repository uses a .NET 10 minimal API (server/AgentApi) plus a FoundryAgentService that wraps the Azure AI Foundry (Persistent Agents) SDK. This service:

  1. Creates a new thread per incoming chat request.
  2. Posts the user message.
  3. Starts a streaming run (CreateRunStreamingAsync).
  4. Yields incremental MessageContentUpdate tokens (and optional citation annotations) over Server‑Sent Events (SSE).

5.1 Environment Variables

VariableDescription
AI_FOUNDRY_PROJECT_ENDPOINTAzure AI Foundry project endpoint (e.g. https://myproj.eastus.projects.azure.ai)
AI_FOUNDRY_AGENT_IDPre‑created Agent ID in Foundry (binds model + retrieval)
AI_FOUNDRY_TENANT_IDEntra tenant for credential resolution
AI_FOUNDRY_MANAGED_IDENTITY_CLIENT_IDClient ID of user/system assigned managed identity

Using DefaultAzureCredential lets you seamlessly authenticate with Managed Identity in Container Apps / App Service and fall back locally to developer credentials.

5.2 Minimal API Endpoint (Program.cs excerpt)

app.MapPost("/api/agents/chat", async (
		ChatRequest request,
		FoundryAgentService agentService,
		HttpContext httpContext,
		CancellationToken cancellationToken) =>
{
		if (string.IsNullOrWhiteSpace(request.Message)) {
				httpContext.Response.StatusCode = 400;
				await httpContext.Response.WriteAsJsonAsync(new { error = "Message is required." }, cancellationToken);
				return;
		}

		httpContext.Response.StatusCode = 200;
		httpContext.Response.Headers.CacheControl = "no-cache";
		httpContext.Response.Headers["X-Accel-Buffering"] = "no";
		httpContext.Response.ContentType = "text/event-stream";

		await using var writer = new StreamWriter(httpContext.Response.Body);
		await foreach (var chunk in agentService.StreamAgentResponseAsync(request.Message, cancellationToken)) {
				await writer.WriteAsync(JsonSerializer.Serialize(chunk));
				await writer.FlushAsync();
		}
		await writer.WriteAsync(JsonSerializer.Serialize(new { Completed = true }));
		await writer.FlushAsync();
});

5.3 Agent Streaming Service (FoundryAgentService.cs excerpt)

public async IAsyncEnumerable<object> StreamAgentResponseAsync(
		string userMessage,
		[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
		var thread = await _agentsClient.Threads.CreateThreadAsync();
		await _agentsClient.Messages.CreateMessageAsync(thread.Id, MessageRole.User, userMessage, cancellationToken: cancellationToken);

		var streamingResult = _agentsClient.Runs.CreateRunStreamingAsync(thread.Id, _agentId);
		await foreach (StreamingUpdate update in streamingResult) {
				cancellationToken.ThrowIfCancellationRequested();
				if (update is MessageContentUpdate contentUpdate && !string.IsNullOrEmpty(contentUpdate.Text)) {
						var citations = TryGetPrivateTextContent(contentUpdate); // extracts URI citation annotations
						yield return new { contentUpdate.Text, Citations = citations };
				}
		}
}

5.4 Citations Extraction

Because current SDK types encapsulate annotations privately, reflection is (temporarily) used to surface citation annotations for the UI. This enables future rendering of “Sources” below each assistant response.

5.5 Why This Approach

AspectBenefit
Streaming SSELow latency token display; good UX for longer answers
Managed IdentityNo secrets at rest; rotates automatically
New thread per requestStatelesness keeps API simple (can evolve to session persistence later)
Azure SDK (Persistent Agents)Offloads orchestration/state to platform

5.6 Frontend Integration Note

Your Backstage frontend simply opens an EventSource (or fetch stream) to /api/agents/chat and appends JSON objects until { "Completed": true } arrives. (The earlier TypeScript polling example is now superseded by this streaming C# backend.)

Production additions: per-user thread continuity, rate limiting, OpenTelemetry spans (wrapping thread creation + run streaming), exponential backoff on transient RequestFailedException, and circuit breaker around sustained failures.

6. Frontend Chat Page (Snippet Highlights)

Example from plugins/ai-chat/components/ChatPage/ChatService.ts conceptually:

export async function sendMessage(text: string): Promise<string> {
	const resp = await fetch('/api/ai-chat/send', {
		method: 'POST',
		headers: { 'Content-Type': 'application/json' },
		body: JSON.stringify({ message: text }),
	});
	if (!resp.ok) throw new Error('Send failed');
	const data = await resp.json();
	return data.reply;
}

UI controller hook (useChatController.ts) maintains an array of { role, content } and optimistic state while awaiting backend reply.

Message list & input components handle scroll and enter-to-send; typing indicator displays while polling.

7. Configuration & Secrets

Environment variables (backend):

VariablePurpose
AI_FOUNDRY_ENDPOINTBase endpoint of Azure AI Foundry resource
AI_MODELModel deployment name (e.g. gpt-4o-mini)
AZURE_CLIENT_ID / TENANT_ID / CLIENT_SECRETOnly if Managed Identity not used

Deployment (Container Apps) referencing secrets:

az containerapp update -n backstage-portal -g $rg `
	--set-secrets ai-client-secret=<CLIENT_SECRET> `
	--set-env-vars AI_FOUNDRY_ENDPOINT=https://my-ai.openai.azure.com AI_MODEL=gpt-4o-mini AZURE_CLIENT_SECRET=secretref:ai-client-secret

Prefer system-assigned managed identity ⇒ no client secret required.

8. Security & Compliance Considerations

  • Least privilege: Scope Entra app (if used) to required Cognitive Services scope only.
  • No secrets in repo: All values injected at deploy time.
  • Data boundary: Confirm indexing uses appropriate classification controls; exclude sensitive PII if policies require.
  • Logging: Strip or hash user identifiers before logging prompts.
  • Rate limiting: Add simple token bucket middleware to avoid burst cost spikes.

9. Enhancing Retrieval (Your Data)

Attach an Azure AI Search / vector index (outside this snippet) when creating the agent, passing retrieval configuration. After that, agent responses automatically ground answers. Cite sources by surfacing metadata in assistant message payload (check annotations / citations fields when available) and render links below each answer.

10. Streaming (Optional Upgrade)

Switch polling loop to Server-Sent Events (SSE):

  1. Backend endpoint sets Content-Type: text/event-stream.
  2. As tokens arrive (if streaming API variant available), write data: {"delta":"..."}\n\n.
  3. Frontend accumulates until done event.

11. Testing Strategy

  • Unit: Mock fetch to agent endpoints; verify message assembly.
  • Integration: Ephemeral test thread created & torn down each run.
  • Load: Concurrency test with 20–50 simulated users; measure average round‑trip (<3–5s target for small prompts).

12. Deployment Notes

When deploying alongside existing Backstage container:

  • Add the new router file to image (ensure COPY includes packages/backend/src/plugins).
  • Rebuild & push image → new Container App revision.
  • Validate health & /api/ai-chat/send with a sample request before exposing in UI nav.

13. Navigation Integration

Register a route & sidebar item in plugin.ts (simplified):

import { createPlugin, createRoutableExtension } from '@backstage/core-plugin-api';
import { chatPageRouteRef } from './routes';

export const aiChatPlugin = createPlugin({ id: 'ai-chat', routes: { root: chatPageRouteRef } });

export const AiChatPage = aiChatPlugin.provide(
	createRoutableExtension({
		name: 'AiChatPage',
		component: () => import('./components/ChatPage/ChatPage').then(m => m.ChatPage),
		mountPoint: chatPageRouteRef,
	})
);

Add to app sidebar:

// In packages/app/src/components/Root/Root.tsx (or similar)
import { AiChatPage } from '@internal/plugin-ai-chat';
// ... add <SidebarItem icon={ChatIcon} to="/ai-chat" text="AI Chat" />

14. Observability

Log latency + token counts:

const started = Date.now();
// after response
logger.info('aiChat.response', { ms: Date.now()-started, chars: reply.length });

Feed into Log Analytics / Application Insights for usage dashboards.

15. Cost Controls

  • Cap maximum message length.
  • Summarize long threads (periodic summarization then truncate old turns).
  • Reject large binary attachments (if you later extend to file ingestion) before they hit the model.

16. Roadmap Ideas

FeatureDescription
Citation renderingShow top source documents with relevance scores
Multi-agent routingSpecialized agents (security, architecture) chosen by classifier
Function callingTools to fetch live deployment status or on-call roster
Feedback signalsThumbs up/down stored to improve retrieval set
Access controlFilter retrieval corpus based on user’s team groups

17. References

18. Summary

You integrated a secure, minimal AI chat experience in Backstage using Azure AI Foundry Agents. The backend proxy leverages managed identity (or fallback credentials), orchestrates agent lifecycle, and streams responses to a clean React UI. From here you can attach retrieval, citations, function calling, and enterprise governance—turning Backstage into a central AI enablement surface.

Leave a comment