Observability Unleashed: OpenTelemetry in Azure Functions with Go

In the evolving landscape of serverless computing, observability has become paramount for building reliable, scalable applications. This article explores how to implement OpenTelemetry (OTel) in Azure Functions using the new Flex Consumption plan with Go custom handlers, providing comprehensive telemetry data through Azure Monitor’s Data Collection Endpoint (DCE) and Data Collection Rule (DCR).

What is OpenTelemetry?

OpenTelemetry is an open-source observability framework that provides APIs, libraries, agents, and instrumentation to enable the collection of traces, metrics, and logs from applications. It serves as a vendor-neutral standard for telemetry data collection, allowing developers to instrument their applications once and export data to various observability backends.

Key Components of OpenTelemetry

Traces: Represent the journey of a request through your system
Metrics: Numerical measurements that provide insights into application performance
Logs: Structured or unstructured text records with timestamps
Context Propagation: Maintains correlation between distributed components

Why OpenTelemetry for Go Developers on Azure Functions?

Enhanced Observability

OpenTelemetry provides Go developers with powerful tools to understand their application’s behavior in the serverless environment. Unlike traditional monitoring approaches, OTel offers:

Distributed Tracing: Track requests across multiple function invocations and dependencies
Rich Context: Capture detailed metadata about function executions
Performance Insights: Monitor cold starts, execution duration, and resource utilization

Vendor Neutrality

Go developers can instrument their code once and send telemetry to multiple backends without vendor lock-in. This flexibility is particularly valuable when:

Migrating between cloud providers
Using hybrid cloud architectures
Implementing multi-vendor observability strategies

Cost-Effective Monitoring

With Azure’s DCE/DCR architecture, you can control data ingestion costs by:

Filtering telemetry data before storage
Transforming data to optimize storage requirements
Setting retention policies based on data importance

Azure Functions Flex Consumption Plan Benefits

The Flex Consumption plan offers several advantages for Go developers:

Fast Cold Starts: Optimized for languages like Go that compile to native binaries
Fine-Grained Scaling: Scale down to zero and up to thousands of instances
Cost Optimization: Pay only for actual execution time and memory usage
Custom Runtime Support: Perfect for Go custom handlers

Architecture Overview

Here is a quick overview of the log flow

Infrastructure Setup with Bicep

Building a robust OpenTelemetry solution on Azure requires careful infrastructure design. Azure Monitor’s modern data ingestion pipeline consists of several interconnected components that work together to securely collect, transform, and store telemetry data. Let’s explore each component and understand how they form the backbone of our observability solution.

Understanding Azure Monitor Data Collection Architecture

Before diving into the implementation, it’s essential to understand Azure Monitor’s data collection architecture. The modern approach uses a pipeline consisting of Data Collection Endpoints (DCE), Data Collection Rules (DCR), and Log Analytics workspaces. This architecture provides several advantages over legacy direct ingestion methods:

Enhanced Security: All communication is authenticated using Azure Active Directory
Data Transformation: Ability to filter, enrich, and transform data before storage
Cost Control: Granular control over what data gets stored and for how long
Scalability: Built to handle high-volume telemetry from distributed applications
Compliance: Centralized governance and audit capabilities

Data Collection Endpoint (DCE)

A Data Collection Endpoint serves as the secure HTTPS endpoint where your applications send telemetry data. Think of it as the “front door” to Azure Monitor’s data ingestion pipeline. The DCE handles authentication, validates incoming data against the associated Data Collection Rules, and ensures that only authorized applications can send data to your Log Analytics workspace.

Key characteristics of DCEs:

Regional: Deployed in specific Azure regions for optimal performance
Secure: Requires Azure AD authentication for all data ingestion
Scalable: Automatically handles varying telemetry volumes
Monitored: Provides its own metrics and diagnostic capabilities

Here’s how we configure the Data Collection Endpoint in Bicep:

resource dataCollectionEndpoint 'Microsoft.Insights/dataCollectionEndpoints@2022-06-01' = {
  name: dceEndpointName
  location: location
  tags: tags
  properties: {
    networkAcls: {
      publicNetworkAccess: 'Enabled'
    }
    description: 'Data Collection Endpoint for OpenTelemetry Azure Function'
  }
}

The DCE configuration above creates a public endpoint that our Azure Function can reach from anywhere on the internet. In production scenarios, you might want to restrict access using private endpoints or virtual network integration for enhanced security.

Custom Log Analytics Table

Before data can flow into Log Analytics, we need to define the schema for our OpenTelemetry data. Azure Monitor supports custom tables that allow you to define your own data structure optimized for your specific telemetry needs. Custom tables in Log Analytics provide several benefits:

Schema Control: Define exactly what fields you need and their data types
Performance Optimization: Tailored indexes and partitioning for your query patterns
Cost Management: Store only the data you need with appropriate retention policies
Query Flexibility: Use KQL (Kusto Query Language) for advanced analytics

Our OpenTelemetry custom table includes essential fields for distributed tracing, metrics, and contextual information:

resource customTable 'Microsoft.OperationalInsights/workspaces/tables@2022-10-01' = {
  parent: logAnalyticsWorkspace
  name: 'OpenTelemetryData_CL'
  properties: {
    totalRetentionInDays: 30
    plan: 'Analytics'
    schema: {
      name: 'OpenTelemetryData_CL'
      columns: [
        {
          name: 'TimeGenerated'
          type: 'datetime'
        }        
        {
          name: 'Level'
          type: 'string'
        }
        {
          name: 'Message'
          type: 'string'
        }
        // other properties... omitted for brevity
      ]
    }
  }
}

Each column in our schema serves a specific purpose:

TimeGenerated: Timestamp for temporal analysis and time-series queries
TraceId/SpanId: Essential for distributed tracing and request correlation
OperationName: Identifies the type of operation for filtering and grouping
ServiceName/ServiceVersion: Enables service-level analysis and deployment tracking
Level: Log level for filtering by severity
Message: Human-readable description of the telemetry event
StatusCode: HTTP status or operation result for success/failure analysis
Duration: Performance metrics for latency analysis
Properties: Dynamic field for additional context and custom attributes

The _CL suffix indicates this is a custom log table, distinguishing it from Azure’s built-in tables.

Data Collection Rule (DCR)

The Data Collection Rule is the brain of the data ingestion pipeline. It defines how data flows from the Data Collection Endpoint to the final destination in Log Analytics. DCRs provide powerful capabilities for data transformation, filtering, and routing, making them essential for cost-effective and efficient telemetry management.

Key concepts within a DCR:

Stream Declarations: Define the structure of incoming data streams. Each stream represents a specific type of telemetry data (like our OpenTelemetry stream) and must match the data structure your application sends.

Destinations: Specify where the data should be stored. In our case, we’re routing to a Log Analytics workspace, but DCRs can also send data to Azure Storage, Event Hubs, or other destinations.

Data Flows: Define the transformation and routing logic. They connect streams to destinations and can include KQL transformations to filter, enrich, or modify data before storage. Here’s how we configure the Data Collection Rule:

resource dataCollectionRule 'Microsoft.Insights/dataCollectionRules@2022-06-01' = {
  name: dcrRuleName
  location: location
  tags: tags
  dependsOn: [customTable]
  properties: {
    dataCollectionEndpointId: dataCollectionEndpoint.id
    streamDeclarations: {
      'Custom-OpenTelemetryStream': {
        columns: [
          {
            name: 'TraceId'
            type: 'string'
          }
          {
            name: 'SpanId'
            type: 'string'
          }
          {
            name: 'OperationName'
            type: 'string'
          }
          // Additional columns match our custom table schema...
        ]
      }
    }
    destinations: {
      logAnalytics: [
        {
          workspaceResourceId: logAnalyticsWorkspaceId
          name: 'workspace'
        }
      ]
    }
    dataFlows: [
      {
        streams: [
          'Custom-OpenTelemetryStream'
        ]
        destinations: [
          'workspace'
        ]
        outputStream: 'Custom-OpenTelemetryData_CL'
        transformKql: 'source'
      }
    ]
  }
}

Let’s break down the key components:

dataCollectionEndpointId: Links this DCR to our specific DCE, establishing the connection between the ingestion endpoint and the processing rules.

streamDeclarations: The Custom-OpenTelemetryStream declaration defines the expected structure of incoming telemetry data. This must exactly match the JSON structure that your Go application sends. Any data that doesn’t conform to this schema will be rejected.

destinations: The logAnalytics section specifies that our data should flow to a Log Analytics workspace. The workspace name is an internal identifier used in the data flows section.

dataFlows: This section connects everything together. It specifies that data from the Custom-OpenTelemetryStream should be routed to the workspace destination and stored in the Custom-OpenTelemetryData_CL table. The transformKql: 'source' indicates no transformation is applied (data passes through as-is), but you could implement complex KQL transformations here for data enrichment or filtering.

Function App Configuration

With our data collection infrastructure in place, we need to configure the Azure Function to connect to these services. The Function App configuration bridges your Go application with the Azure Monitor pipeline by providing the necessary connection information through environment variables.

module flexFunction 'core/host/function.bicep' = {
  name: 'functionapp'
  scope: rg
  params: {
    location: location
    tags: tags
    planName: planName
    appName: appName
    storageAccountName: storage.outputs.name
    applicationInsightsName: monitoring.outputs.applicationInsightsName
    functionAppRuntime: functionAppRuntime
    functionAppRuntimeVersion: functionAppRuntimeVersion
    dceEndpointUrl: dataCollection.outputs.dataCollectionEndpointUrl
    dcrImmutableId: dataCollection.outputs.dataCollectionRuleImmutableId
  }
}

The critical parameters here are dceEndpointUrl and dcrImmutableId, which are passed as outputs from our data collection module. These values become environment variables in the Function App, allowing your Go code to discover the Azure Monitor endpoints dynamically. This approach ensures that your application code remains environment-agnostic while still connecting to the correct Azure Monitor resources.

Role Assignment for Security

Security is paramount when dealing with telemetry data. Azure’s approach uses role-based access control (RBAC) to ensure that only authorized services can send data to your monitoring infrastructure. The Function App needs specific permissions to write data to the Data Collection Endpoint.

module dcrRoleAssignment 'core/monitor/dcr-role-assignment.bicep' = {
  name: 'dcr-role-assignment'
  scope: rg
  params: {
    dataCollectionRuleId: dataCollection.outputs.dataCollectionRuleId
    functionAppPrincipalId: flexFunction.outputs.principalId
  }
}

This role assignment grants the Function App’s managed identity the “Monitoring Metrics Publisher” role on the Data Collection Rule. This role provides the minimum necessary permissions to send telemetry data while maintaining security best practices. The managed identity approach eliminates the need for storing credentials or connection strings, reducing security risks and simplifying operations.

Go Implementation: Sending Telemetry via OpenTelemetry

Now that our Azure infrastructure is configured, let’s dive into the Go implementation. The OpenTelemetry integration in Go involves several layers: the OpenTelemetry SDK for instrumentation, a custom Azure Monitor exporter for data transmission, and careful initialization patterns to optimize for serverless performance.

Dependencies and Imports

OpenTelemetry for Go consists of multiple packages, each serving specific purposes in the telemetry pipeline. Understanding these dependencies helps you make informed decisions about which components to include in your application.

module gocustomhandlers

go 1.21

require (
    github.com/Azure/azure-sdk-for-go/sdk/azcore v1.9.2
    github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.5.1
    go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.46.1
    go.opentelemetry.io/otel v1.21.0
    go.opentelemetry.io/otel/metric v1.21.0
    go.opentelemetry.io/otel/sdk v1.21.0
    go.opentelemetry.io/otel/trace v1.21.0
)
```go
module gocustomhandlers

go 1.21

require (
    github.com/Azure/azure-sdk-for-go/sdk/azcore v1.9.2
    github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.5.1
    go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.46.1
    go.opentelemetry.io/otel v1.21.0
    go.opentelemetry.io/otel/metric v1.21.0
    go.opentelemetry.io/otel/sdk v1.21.0
    go.opentelemetry.io/otel/trace v1.21.0
)

Key dependency breakdown:

Azure SDK packages: Handle authentication and HTTP communication with Azure services
otel core: Provides the main OpenTelemetry APIs for creating traces and metrics
otel/sdk: Implements the OpenTelemetry specification for data collection and export
otel/contrib/instrumentation: Offers pre-built instrumentation for common libraries like HTTP

Azure Monitor Exporter Implementation

The heart of our integration is a custom Azure Monitor exporter that translates OpenTelemetry data into the format expected by Azure Monitor’s Data Collection Endpoint. This exporter handles authentication, data serialization, and HTTP communication with Azure’s telemetry ingestion APIs.

Unlike standard OpenTelemetry exporters that might send data to OTLP-compatible backends, our Azure Monitor exporter specifically formats data according to Azure’s DCE/DCR requirements. This custom approach gives us full control over the data structure and ensures optimal integration with Azure Monitor’s features.

// AzureMonitorExporter sends telemetry to Azure Monitor DCE
type AzureMonitorExporter struct {
    dceEndpoint string
    dcrRuleId   string
    streamName  string
    credential  azcore.TokenCredential
    httpClient  *http.Client
}

// TelemetryData represents the data structure for Azure Monitor
type TelemetryData struct {
    TimeGenerated  time.Time              `json:"TimeGenerated"`
    TraceId        string                 `json:"TraceId"`
    SpanId         string                 `json:"SpanId"`
    OperationName  string                 `json:"OperationName"`
    ServiceName    string                 `json:"ServiceName"`
    ServiceVersion string                 `json:"ServiceVersion"`
    Level          string                 `json:"Level"`
    Message        string                 `json:"Message"`
    StatusCode     int                    `json:"StatusCode"`
    Duration       float64                `json:"Duration"`
    Properties     map[string]interface{} `json:"Properties"`
}

Sending telemetry to Azure functions as follows

func (e *AzureMonitorExporter) SendTelemetry(ctx context.Context, data []TelemetryData) error {
    // Get access token using managed identity
    token, err := e.credential.GetToken(ctx, policy.TokenRequestOptions{
        Scopes: []string{"https://monitor.azure.com/.default"},
    })
    if err != nil {
        return fmt.Errorf("failed to get access token: %w", err)
    }
    // Prepare the request
    jsonData, err := json.Marshal(data)
    if err != nil {
        return fmt.Errorf("failed to marshal telemetry data: %w", err)
    }
    url := fmt.Sprintf("%s/dataCollectionRules/%s/streams/%s?api-version=2023-01-01", 
                       e.dceEndpoint, e.dcrRuleId, e.streamName)

    req, err := http.NewRequestWithContext(ctx, "POST", url, bytes.NewBuffer(jsonData))
    if err != nil {
        return fmt.Errorf("failed to create request: %w", err)
    }
    req.Header.Set("Authorization", "Bearer "+token.Token)
    req.Header.Set("Content-Type", "application/json")

    resp, err := e.httpClient.Do(req)
    if err != nil {
        return fmt.Errorf("failed to send request: %w", err)
    }
    defer resp.Body.Close()

    if resp.StatusCode != http.StatusNoContent && resp.StatusCode != http.StatusOK {
        return fmt.Errorf("unexpected status code: %d", resp.StatusCode)
    }
    return nil
}

The TelemetryData structure represents how our telemetry information is serialized for Azure Monitor. Each field maps directly to a column in our custom Log Analytics table, ensuring data consistency and enabling powerful querying capabilities.

The SendTelemetry method implements the complete workflow for transmitting data to Azure:

Authentication: Uses the managed identity to obtain an access token for Azure Monitor
Serialization: Converts the telemetry data to JSON format
HTTP Communication: Sends the data to the specific DCE endpoint and DCR stream
Error Handling: Provides detailed error information for troubleshooting

The URL construction follows Azure’s DCE API specification, targeting the specific Data Collection Rule and stream we defined in our Bicep templates. The API version parameter ensures compatibility with Azure Monitor’s data ingestion service.

OpenTelemetry Initialization

Proper initialization is crucial for OpenTelemetry performance, especially in serverless environments where cold starts can impact user experience. Our initialization strategy balances comprehensive instrumentation with startup performance through lazy loading and efficient resource management.

func initTelemetry(ctx context.Context) (func(context.Context) error, error) {
    // Get configuration from environment
    serviceName := os.Getenv("OTEL_SERVICE_NAME")
    if serviceName == "" {
        serviceName = "go-custom-handler"
    }

    serviceVersion := os.Getenv("OTEL_SERVICE_VERSION")
    if serviceVersion == "" {
        serviceVersion = "1.0.0"
    }

    dceEndpoint := os.Getenv("AZURE_DCE_ENDPOINT_URL")
    dcrImmutableId := os.Getenv("AZURE_DCR_IMMUTABLE_ID")

    // Create resource with service information
    res, err := resource.New(
        context.Background(),
        resource.WithAttributes(
            semconv.ServiceNameKey.String(serviceName),
            semconv.ServiceVersionKey.String(serviceVersion),
            attribute.String("cloud.provider", "azure"),
            attribute.String("azure.function.name", serviceName),
        ),
        resource.WithSchemaURL(semconv.SchemaURL),
    )
    if err != nil {
        return nil, fmt.Errorf("failed to create resource: %w", err)
    }

    // Initialize Azure credential and exporter
    if dceEndpoint != "" && dcrImmutableId != "" {
        cred, err := azidentity.NewDefaultAzureCredential(nil)
        if err != nil {
            log.Printf("Failed to create Azure credential: %v", err)
        } else {
            azureExporter = NewAzureMonitorExporter(dceEndpoint, dcrImmutableId, 
                                                  "Custom-OpenTelemetryStream", cred)
        }
    }

    // Set up trace provider
    tracerProvider := sdktrace.NewTracerProvider(
        sdktrace.WithResource(res),
        sdktrace.WithSampler(sdktrace.AlwaysSample()),
    )

    // Set up metric provider
    meterProvider := sdkmetric.NewMeterProvider(
        sdkmetric.WithResource(res),
    )

    // Set global providers
    otel.SetTracerProvider(tracerProvider)
    otel.SetMeterProvider(meterProvider)
    otel.SetTextMapPropagator(propagation.TraceContext{})

    // Initialize global telemetry components
    tracer = otel.Tracer(serviceName, trace.WithInstrumentationVersion(serviceVersion))
    meter = otel.Meter(serviceName, metric.WithInstrumentationVersion(serviceVersion))

    // Create metrics
    counter, err = meter.Int64Counter(
        "http_requests_total",
        metric.WithDescription("Total number of HTTP requests"),
    )
    if err != nil {
        return nil, fmt.Errorf("failed to create counter: %w", err)
    }

    return func(ctx context.Context) error {
        if err := tracerProvider.Shutdown(ctx); err != nil {
            return fmt.Errorf("failed to shutdown tracer provider: %w", err)
        }
        if err := meterProvider.Shutdown(ctx); err != nil {
            return fmt.Errorf("failed to shutdown meter provider: %w", err)
        }
        return nil
    }, nil
}

The initialization function demonstrates several important patterns:

Environment-based Configuration: Using environment variables allows the same code to work across development, staging, and production environments without modification. The fallback values ensure the application remains functional even when some configuration is missing.

Resource Attributes: The resource.New() call creates a resource that identifies your service within the broader telemetry ecosystem. These attributes appear in every telemetry record, enabling filtering and grouping in your monitoring dashboards. The semantic conventions from OpenTelemetry ensure consistency across different services and teams.

Provider Setup: The trace and metric providers are the core components that collect and manage telemetry data. The AlwaysSample() configuration ensures all traces are collected, which is appropriate for most Azure Functions scenarios due to their typically low volume and high business value.

Global Registration: Setting the global providers makes them available throughout your application without requiring dependency injection. This simplifies instrumentation code and follows OpenTelemetry best practices.

Graceful Shutdown: The returned shutdown function ensures telemetry data is properly flushed before the application terminates, preventing data loss in serverless scenarios where processes may be quickly terminated.

HTTP Handler with OpenTelemetry Instrumentation

The HTTP handler showcases practical OpenTelemetry instrumentation in an Azure Functions context. This implementation demonstrates how to create spans, collect metrics, and maintain distributed tracing context while handling the unique characteristics of serverless environments.

func simpleHttpTriggerHandler(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context()
    
    // Initialize OTEL if not already done (lazy initialization)
    if !otelInitialized {
        shutdown, err := initTelemetry(ctx)
        if err != nil {
            log.Printf("Failed to initialize OTEL: %v", err)
        } else {
            otelShutdown = shutdown
            otelInitialized = true
        }
    }

    var span trace.Span
    if otelInitialized && tracer != nil {
        ctx, span = tracer.Start(ctx, "http_request",
            trace.WithAttributes(
                semconv.HTTPMethodKey.String(r.Method),
                semconv.HTTPURLKey.String(r.URL.String()),
                semconv.HTTPUserAgentKey.String(r.Header.Get("User-Agent")),
            ),
        )
        defer span.End()
    }

    startTime := time.Now()
    invocationId := r.Header.Get("X-Azure-Functions-InvocationId")

    // Add custom attributes to span
    if span != nil {
        span.SetAttributes(
            attribute.String("azure.function.invocation_id", invocationId),
            attribute.String("http.remote_addr", r.RemoteAddr),
        )
    }

    // Increment request counter
    if counter != nil {
        counter.Add(ctx, 1, metric.WithAttributes(
            attribute.String("method", r.Method),
            attribute.String("endpoint", "/hello"),
        ))
    }

    // Process request and create response
    httpResponse := HttpResponse{
        StatusCode: 200,
        Body:       `{"hello":"world","message":"API executed successfully"}`,
        Headers: map[string]interface{}{
            "Content-Type": "application/json",
        },
    }

    response := InvokeResponse{
        Outputs:     map[string]interface{}{"res": httpResponse},
        Logs:        nil,
        ReturnValue: nil,
    }

    // Set response headers and write response
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusOK)

    if span != nil {
        span.SetAttributes(attribute.Int("http.status_code", 200))
    }

    if err := json.NewEncoder(w).Encode(response); err != nil {
        log.Printf("Failed to write output: %v", err)
        if span != nil {
            span.SetStatus(codes.Error, "Failed to encode response")
        }
        return
    }

    duration := time.Since(startTime)

    // Send telemetry to Azure Monitor
    properties := map[string]interface{}{
        "invocation_id": invocationId,
        "user_agent":    r.Header.Get("User-Agent"),
        "remote_addr":   r.RemoteAddr,
        "method":        r.Method,
        "url":           r.URL.String(),
    }

    if span != nil {
        sendOtelToAzure(ctx, span.SpanContext(), "http_request", 
                       "HTTP request processed successfully", 200, duration, properties)
    }
}

This handler implementation illustrates several key instrumentation patterns:

Lazy Initialization: OpenTelemetry is initialized only when the first request arrives, minimizing cold start impact. This pattern is particularly important in Azure Functions where instances may be created and destroyed frequently.

Span Creation and Context: The tracer.Start() call creates a new span representing the HTTP request operation. The span automatically becomes part of the request context, enabling distributed tracing across service boundaries. The defer span.End() ensures the span is properly closed regardless of how the function exits.

Semantic Conventions: Using OpenTelemetry’s semantic conventions (like semconv.HTTPMethodKey) ensures your telemetry data follows industry standards, making it easier to correlate with other services and use with standard monitoring tools.

Azure Functions Context: The X-Azure-Functions-InvocationId header provides a unique identifier for each function execution, enabling correlation between Azure Functions logs and your custom telemetry data.

Custom Attributes: Adding Azure-specific attributes like the invocation ID and remote address provides valuable context for debugging and analysis. These attributes appear in your telemetry data and can be used for filtering and grouping in queries.

Metrics Collection: The counter increment demonstrates how to collect custom metrics alongside tracing data. Metrics provide aggregate views of your application’s behavior, complementing the detailed information available in traces.

Asynchronous Telemetry: The sendOtelToAzure() call sends telemetry data to Azure Monitor without blocking the request response. This pattern ensures that telemetry collection doesn’t impact user-facing performance.

Log Analytics Queries

Query your telemetry data in Log Analytics:

// View all OpenTelemetry data
OpenTelemetryData_CL
     | order by TimeGenerated desc

// Filter by service
OpenTelemetryData_CL
     | where ServiceName_s == "go-custom-handler"
     | order by TimeGenerated desc

// View HTTP request traces
OpenTelemetryData_CL
      | where OperationName_s == "http_request"
      | project TimeGenerated, TraceId_s, Duration_d, StatusCode_d
      | order by TimeGenerated desc

Best Practices

Implementing OpenTelemetry effectively requires following established patterns that balance observability benefits with performance and operational considerations. These best practices have emerged from real-world deployments and help ensure your telemetry implementation is both valuable and sustainable.

1. Environment Configuration

Configuration management is fundamental to building maintainable observability solutions. Using environment variables provides flexibility across different deployment environments while maintaining security and simplifying operations. This approach allows the same application binary to work in development, staging, and production with appropriate telemetry configuration for each environment.

serviceName := os.Getenv("OTEL_SERVICE_NAME")
if serviceName == "" {
    serviceName = "go-custom-handler"  // fallback
}

Always provide sensible defaults to ensure your application remains functional even when some configuration is missing. This defensive programming approach prevents telemetry configuration issues from breaking core functionality.

2. Lazy Initialization

In serverless environments, cold start performance is critical to user experience. Lazy initialization postpones OpenTelemetry setup until it’s actually needed, reducing the time from function creation to first request handling. This pattern is especially important in Azure Functions where billing is based on execution time.

if !otelInitialized {
    shutdown, err := initTelemetry(ctx)
    // Handle initialization...
    otelInitialized = true
}

Consider implementing initialization timeouts to prevent telemetry setup from indefinitely blocking request processing in case of network issues or service unavailability.

3. Asynchronous Telemetry Sending

Telemetry collection should never block business logic execution. Sending telemetry data asynchronously ensures that observability doesn’t impact user-facing performance, even when monitoring services are slow or temporarily unavailable.

go func() {
    if err := azureExporter.SendTelemetry(context.Background(), telemetryData); err != nil {
        log.Printf("Failed to send telemetry: %v", err)
    }
}()

Implement proper error handling in your background goroutines to prevent telemetry failures from causing memory leaks or resource exhaustion. Consider using bounded channels or worker pools for high-volume scenarios.

4. Error Handling and Fallbacks

Robust telemetry implementations gracefully handle failures without impacting application functionality. When monitoring services are unavailable, your application should continue operating normally, perhaps with reduced observability.

if azureExporter == nil {
    log.Printf("Azure exporter not available, skipping telemetry send")
    return
}

Design your error handling to distinguish between transient failures (network issues, service overload) and permanent failures (misconfiguration, authentication problems). Implement appropriate retry strategies for transient failures while avoiding infinite retry loops.

5. Resource Tagging

Consistent resource attributes enable effective querying, filtering, and correlation across your entire system. Establish organization-wide standards for resource attributes to ensure telemetry data can be effectively aggregated and analyzed across services and teams.

resource.WithAttributes(
    semconv.ServiceNameKey.String(serviceName),
    semconv.ServiceVersionKey.String(serviceVersion),
    attribute.String("cloud.provider", "azure"),
    attribute.String("azure.function.name", serviceName),
)

Include enough context to enable effective troubleshooting, but avoid excessive cardinality that could impact storage costs or query performance. Consider using hierarchical naming schemes for services and components.

6. Managed Identity for Security

Azure Managed Identity eliminates the complexity and security risks associated with credential management. Using managed identity ensures that your telemetry authentication tokens are automatically rotated and securely managed by Azure.

cred, err := azidentity.NewDefaultAzureCredential(nil)

This approach also simplifies deployment pipelines since there are no secrets to manage or rotate. The DefaultAzureCredential provider automatically detects the appropriate authentication method for each environment (local development, Azure Functions, etc.).

7. Proper Span Lifecycle Management

Accurate span lifecycle management is essential for meaningful distributed tracing. Every span must be properly started and ended to maintain trace integrity and prevent resource leaks.

ctx, span = tracer.Start(ctx, "operation_name")
defer span.End()

Use descriptive operation names that clearly indicate what the span represents. Consider the cardinality implications of your naming scheme – too many unique span names can impact storage and query performance.

8. Cost-Effective Data Collection

Telemetry data can become expensive at scale. Implement sampling and filtering strategies early to control costs while maintaining observability value. Different types of telemetry may warrant different sampling rates based on their business value and volume characteristics.

sdktrace.WithSampler(sdktrace.TraceIDRatioBased(0.1)) // 10% sampling

Consider implementing adaptive sampling that adjusts rates based on current telemetry volume or error rates. High-value scenarios (errors, slow requests) might warrant higher sampling rates than routine operations.

Conclusion

Implementing OpenTelemetry in Azure Functions Flex Consumption with Go custom handlers provides a powerful observability solution that combines the performance benefits of native Go binaries with comprehensive telemetry collection. The integration with Azure Monitor’s DCE/DCR architecture ensures secure, scalable, and cost-effective telemetry data management.

Key takeaways from this implementation:

Performance: Lazy initialization and asynchronous telemetry sending minimize impact on function execution
Security: Managed Identity authentication eliminates credential management complexity
Scalability: The Flex Consumption plan automatically scales based on demand
Cost Control: DCR-based filtering and transformation optimize storage costs
Flexibility: OpenTelemetry’s vendor-neutral approach provides future-proofing

This approach empowers Go developers to build highly observable serverless applications on Azure while maintaining the performance characteristics that make Go an excellent choice for cloud-native development. The combination of OpenTelemetry’s standardized instrumentation with Azure’s managed infrastructure creates a robust foundation for enterprise-grade serverless applications.

By following the patterns and best practices outlined in this article, developers can achieve comprehensive observability without sacrificing the simplicity and performance that make Go and Azure Functions an attractive combination for modern cloud applications.

The full source code can be found in https://github.com/MoimHossain/GoCustomHandlers

Thanks for reading!

2 thoughts on “Observability Unleashed: OpenTelemetry in Azure Functions with Go”

Leave a reply to Ed Cancel reply