API-Management · API-Management-Service · APIM · Architecture · Azure · Azure CLI · Azure Container Instance · AzureContainerApps · docker · Grafana · http · HTTPS · K6 · Observablity · Traces

Measuring API Latency & Throughput with k6

Modern API platforms live and die by their tail latency. Your users do not care that the “average” response time is fine if 5% of requests are 10× slower. This post walks through using k6 to benchmark Azure API Management (APIM) backed APIs, first with a basic latency script and then with per‑request telemetry streamed into Azure Application Insights for deep correlation. We will also show how to run k6 inside Azure Container Apps as a repeatable, cloud‑side load generator.

What is k6 and Why Use It for Latency Work?

k6 is an open‑source, scriptable load testing tool (JavaScript) focused on developer ergonomics and high‑performance execution (written in Go). Key reasons it fits API latency benchmarking:

  • Scriptable in JS: Easy to codify request patterns, correlation headers, custom logic.
  • Native trend stats: p(90), p(95), p(99), p(99.9) without extra tooling.
  • Deterministic load during a fixed VU or scenario window.
  • Extensibility: Custom output (JSON, InfluxDB, Prometheus, App Insights via HTTP), custom tagging, summary formatting.
  • Container‑friendly: One-line docker run locally, or run in Kubernetes / Azure Container Apps.

For latency investigations you typically care about:

  • Median (p50) for “bulk baseline”.
  • p95 / p99 for tail (user pain region).
  • Max (outliers, potential incidents / retries / cold starts).
  • Error rate (must stay low while pushing higher concurrency or arrival rate).

Dissecting the Basic Script 

Core goals of the baseline script:

  1. Drive a POST request to your APIM endpoint.
  2. Attach a correlation identifier you can later search across APIM diagnostics and backend logs.
  3. Keep configuration entirely in environment variables for portability.

Minimal version (excerpt):

import http from 'k6/http';
import { check, sleep } from 'k6';
import { uuidv4 } from 'https://jslib.k6.io/k6-utils/1.4.0/index.js';

export const options = {
  vus: Number(__ENV.VUS || 5),
  duration: __ENV.DURATION || '30s',
  thresholds: {
    http_req_failed: ['rate<0.01'],
    http_req_duration: ['p(95)<1000', 'p(99)<3000']
  }
};

Generating correlation + W3C traceparent (simplified):

function makeTraceParent(guid) {
  const hex = guid.replace(/-/g, '');
  const traceId = (hex + '0'.repeat(32)).substring(0, 32);
  const spanId = hex.substring(0, 16);
  return `00-${traceId}-${spanId}-01`;
}

Executing one iteration (one request):

export default function () {
  const url = __ENV.TARGET_URL; // required
  const corr = uuidv4();
  const body = JSON.stringify({ transactionId: corr });
  const headers = {
    'Content-Type': 'application/json',
    'x-correlation-id': corr,
    'traceparent': makeTraceParent(corr)
  };
  const res = http.post(url, body, { headers });
  check(res, { 'status is 200/201/202': r => [200,201,202].includes(r.status) });
  const think = Number(__ENV.THINK_MS || 0);
  if (think > 0) sleep(think / 1000);
}

Key environment variables:

  • TARGET_URL (required)
  • VUS (virtual users)
  • DURATION (e.g., 1m30s5m)
  • THINK_MS (optional pause per iteration)

Run it locally via Docker (WSL example – adjust mount path):

wsl docker run --rm \
  -e TARGET_URL="https://your-apim.azure-api.net/casper/transaction" \
  -e VUS=50 -e DURATION=1m \
  -v /mnt/c/Repo/RPS-Benchmark/scripts:/scripts \
  grafana/k6:latest run /scripts/k6-basic.js

Interpreting core metrics:

  • http_req_duration – end‑to‑end latency.
  • http_reqs – total requests.
  • http_req_failed – failure rate.
  • iteration_duration – includes JS overhead + request.

Percentiles (p90, p95, p99) highlight whether the tail is degrading under load. If p99 is far from median, pursue correlation of slow IDs.

Running k6 in Azure Container Apps (ACA) as a Job

Azure Container Apps lets you run ephemeral jobs (short-lived executions) or long-running revisions. For load tests, jobs are ideal: they spin up, execute, stop, and you capture their logs.

High-level steps:

  1. (Optional) Push a custom image if you need embedded scripts; otherwise mount or pass them at build time. For simplicity we use the public grafana/k6 image and a ConfigMap-like approach below (alternative: bake scripts into your own image).
  2. Create a resource group and Container Apps environment.
  3. Create a Job that runs k6 with environment variables.
  4. Start the Job and watch logs.

Note: For production‑like network fidelity you can enable VNet integration / private endpoints on the Container Apps environment so your load traffic traverses similar network paths as real consumers. That is intentionally omitted here for brevity.

Create Resource Group & Environment

az group create -n rg-k6-bench -l eastus
az containerapp env create \
  -g rg-k6-bench \
  -n k6-env \
  -l eastus

(Optional) Build a Custom k6 Image with Scripts

Create a Dockerfile (if you want your scripts inside the image):

FROM grafana/k6:latest
WORKDIR /scripts
COPY k6-basic.js ./
COPY k6-azmon.js ./

Push to Azure Container Registry (ACR):

az acr create -n k6benchacr -g rg-k6-bench --sku Basic
az acr login -n k6benchacr
docker build -t k6benchacr.azurecr.io/k6-bench:latest .
docker push k6benchacr.azurecr.io/k6-bench:latest

If you skip this, you can reference grafana/k6:latest directly; you just need a way to provide your JS (volume mount is not directly supported in ACA jobs from local path, so custom image or remote fetch is easier).

Create an ACA Job for the Basic Test

az containerapp job create \
  -g rg-k6-bench \
  -n k6-basic-job \
  --environment k6-env \
  --image grafana/k6:latest \
  --command "run" \
  --args "/scripts/k6-basic.js" \
  --cpu 0.5 --memory 1Gi \
  --replica-timeout 1800 \
  --replica-completion-count 1 \
  --env-vars TARGET_URL="https://your-apim.azure-api.net/casper/transaction" VUS=50 DURATION=1m

Because the public image does not contain your script by default, if you used the custom image path instead:

az containerapp job create \
  -g rg-k6-bench \
  -n k6-basic-job \
  --environment k6-env \
  --image k6benchacr.azurecr.io/k6-bench:latest \
  --command "run" \
  --args "/scripts/k6-basic.js" \
  --cpu 0.5 --memory 1Gi \
  --replica-timeout 1800 \
  --replica-completion-count 1 \
  --env-vars TARGET_URL="https://your-apim.azure-api.net/casper/transaction" VUS=50 DURATION=1m

Run the job:

az containerapp job start -g rg-k6-bench -n k6-basic-job

Fetch logs:

az containerapp logs show -g rg-k6-bench -n k6-basic-job --type job

Scale considerations:

  • Increase --replica-completion-count replicas for parallel generators (independent pods) if you want more aggregate RPS (ensure backend is sized accordingly).
  • For consistent arrival rate across replicas, consider scenario scripts using constant-arrival-rate (not covered fully here but easy to add).

Network notes (not implemented here):

  • Use --internal-only on the environment + VNet injection for realistic latency.
  • Private endpoint or Private Link for APIM (Internal mode) isolates test traffic.

Shipping Per‑Request Latency to Application Insights

Sometimes a percentile is not enough—you need the exact correlation IDs of the slowest requests to debug in APIM diagnostics or backend traces. The k6-azmon.js script sends each (or a sampled subset) request as a trace to Application Insights via the ingestion API.

Environment Variables
VariablePurpose
TARGET_URLAPI endpoint under test
APPINSIGHTS_IKEYApplication Insights instrumentation key (GUID)
VUSDURATIONTHINK_MSLoad shaping knobs
BATCH_SIZENumber of telemetry items per ingestion POST
FLUSH_INTERVAL_MSTime-based flush when buffer not full
SAMPLING1 = all requests; 0.1 = 10% sample
MAX_BUFFERSafety upper bound before forced flush
PRETTY_SUMMARYIf set, prints concise latency/ingestion block

Building Telemetry Objects

function makeTelemetry(item) {
  return {
    name: 'Microsoft.ApplicationInsights.Message',
    time: item.time,
    iKey: IKEY,
    tags: {
      'ai.cloud.role': 'k6-loadgen',
      'ai.operation.id': item.correlationId,
      'ai.operation.parentId': item.correlationId.substring(0,16),
      'ai.internal.sdkVersion': 'k6:custom'
    },
    data: {
      baseType: 'MessageData',
      baseData: {
        message: `k6 request ${item.correlationId}`,
        severityLevel: 1,
        properties: {
          testType: 'k6-azmon',
          correlationId: item.correlationId,
          url: item.url,
          status: String(item.status),
          durationMs: String(item.durationMs),
          vu: String(item.vu),
          iteration: String(item.iteration),
          startTime: item.startTime,
          requestBodyBytes: String(item.requestBodyBytes),
          sampling: String(SAMPLING)
        }
      }
    }
  };
}

Request Execution + Sampling

function doRequest(track) {
  const corr = uuidv4();
  const start = Date.now();
  const body = JSON.stringify({ transactionId: corr });
  const headers = {
    'Content-Type': 'application/json',
    'x-correlation-id': corr,
    'traceparent': makeTraceParent(corr)
  };
  const res = http.post(TARGET_URL, body, { headers });
  const dur = Date.now() - start;
  check(res, { 'status<400': r => r.status < 400 });
  if (track) buffer.push(makeTelemetry({
      correlationId: corr,
      durationMs: dur,
      status: res.status,
      url: TARGET_URL,
      vu: __VU,
      iteration: iterationCounter++,
      startTime: new Date(start).toISOString(),
      requestBodyBytes: body.length,
      time: isoNow()
  }));
  if (buffer.length >= MAX_BUFFER) flush(true);
}

Batching & Flushing

function flush(force=false) {
  if (buffer.length === 0) return;
  if (!force && buffer.length < BATCH_SIZE && (Date.now()-lastFlush) < FLUSH_INTERVAL_MS) return;
  const payload = JSON.stringify(buffer);
  const res = http.request('POST', ingestionUrl, payload, { headers: { 'Content-Type': 'application/x-json-stream' } });
  if (res.status >= 400) {
    failedItems += buffer.length;
    console.error(`AppInsights ingestion failed status=${res.status}`);
  } else {
    sentCount += buffer.length;
  }
  flushCalls += 1;
  buffer = [];
  lastFlush = Date.now();
}
Pretty Summary Output
export function handleSummary(data) {
  flush(true);
  if (__ENV.PRETTY_SUMMARY) {
    const m = data.metrics || {};
    function gv(n,k){return m[n]?.values?.[k];}
    const concise = [
      '===== k6 QUICK SUMMARY =====',
      `Requests total=${gv('http_reqs','count')||0}`,
      `Latency(ms) med=${gv('http_req_duration','med')}`,
      'Query: traces | where customDimensions.testType == "k6-azmon"',
      '================================='
    ].join('\n');
    console.log('\n'+concise+'\n');
  }
  return {};
}

Running Locally

wsl docker run --rm \
  -e TARGET_URL="https://your-apim.azure-api.net/casper/transaction" \
  -e APPINSIGHTS_IKEY="<ikey-guid>" \
  -e VUS=50 -e DURATION=2m \
  -e BATCH_SIZE=5 -e FLUSH_INTERVAL_MS=2000 -e SAMPLING=1 -e PRETTY_SUMMARY=1 \
  -v /mnt/c/Repo/RPS-Benchmark/scripts:/scripts \
  grafana/k6:latest run /scripts/k6-azmon.js

Deploying the Telemetry Script as an ACA Job

Assuming you baked k6-azmon.js into a custom image (steps similar to earlier):

az containerapp job create \
  -g rg-k6-bench \
  -n k6-azmon-job \
  --environment k6-env \
  --image k6benchacr.azurecr.io/k6-bench:latest \
  --command "run" \
  --args "/scripts/k6-azmon.js" \
  --cpu 0.5 --memory 1Gi \
  --env-vars TARGET_URL="https://your-apim.azure-api.net/casper/transaction" APPINSIGHTS_IKEY="<ikey-guid>" VUS=40 DURATION=2m BATCH_SIZE=5 FLUSH_INTERVAL_MS=2000 SAMPLING=0.25 PRETTY_SUMMARY=1

Start & view logs:

az containerapp job start -g rg-k6-bench -n k6-azmon-job
az containerapp logs show -g rg-k6-bench -n k6-azmon-job --type job

Kusto Queries in Application Insights

Top 50 slowest (descending):

traces
| where customDimensions.testType == "k6-azmon"
| project timestamp, durationMs=tolong(customDimensions.durationMs), status=tostring(customDimensions.status), correlationId=customDimensions.correlationId, url=customDimensions.url
| order by durationMs desc
| take 50

Tail stats (>=300ms):

traces
| where customDimensions.testType == "k6-azmon"
| extend dur=tolong(customDimensions.durationMs)
| where dur >= 300
| summarize count(), p95=percentile(dur,95), p99=percentile(dur,99), max=max(dur)

Join with exceptions (if backend logs correlation id):

traces
| where customDimensions.testType == "k6-azmon"
| extend corr=customDimensions.correlationId, dur=tolong(customDimensions.durationMs)
| join kind=leftouter (
    exceptions | project corr=tostring(customDimensions['x-correlation-id']), exType=type, exMsg=message, exTs=timestamp
  ) on corr
| where dur >= 300
| project timestamp, corr, dur, status=customDimensions.status, exType, exMsg
| order by dur desc

Benchmarking & Latency Best Practices

ConcernGuidance
Warm-upUse an initial ramp (e.g., 30–60s) before measuring to avoid cold start bias.
Tail focusTrack p95/p99; log top N slow correlation IDs for root cause.
Arrival modelConsider constant-arrival-rate scenarios if you need consistent RPS independent of VUs.
SamplingFor very high volume, sample telemetry (SAMPLING=0.1) to control ingestion cost.
IsolationRun from a controlled network (ACA with VNet) to reduce noise from local Wi-Fi / VPN.
ReproducibilityVersion control scripts, pin image tags, and record env vars per run.
CorrelationAlways propagate x-correlation-id (and a valid W3C traceparent if your backend traces).
Data hygieneDo not include secrets in payload or headers; treat instrumentation key as semi-public but rotate if leaked.
ThresholdsUse k6 thresholds to fail fast in CI if latency regresses.
Outlier analysisExport individual traces of the slowest requests; compare phases (DNS, connect, TLS, waiting) if needed.

Closing Thoughts

Sustainable performance discipline is about continuous latency awareness, not ad-hoc stress tests. k6’s scripting model plus Azure Container Apps jobs deliver a portable, repeatable mechanism to:

  1. Recreate realistic load patterns quickly.
  2. Capture actionable percentile and error metrics.
  3. Emit per‑request telemetry for deep, correlation-driven diagnostics.

By pushing each (or sampled) request into Application Insights, you bridge summary statistics and forensic detail—no more guessing which single request out of thousands spiked your p99. Combine this with APIM diagnostic logs, backend distributed tracing, and you have an end‑to‑end latency observability loop.

Next experiments you might try:

  • Add a constant-arrival-rate scenario for strict RPS.
  • Introduce multiple endpoints / weights inside one script for mixed traffic.
  • Push k6 output to Prometheus and build a live latency dashboard.
  • Run comparative A/B tests between two backend versions (two TARGET_URLs) and diff percentile curves.

Happy benchmarking—and may your p99 stay close to your median.

Leave a comment