Prometheus is unreachable

If a Prometheus instance installed with OSM can’t be reached, perform the following steps to identify and resolve any issues.

  1. Verify a Prometheus Pod exists.

    When installed with osm install --set=OpenServiceMesh.deployPrometheus=true, a Prometheus Pod named something like osm-prometheus-5794755b9f-rnvlr should exist in the namespace of the other OSM control plane components which named osm-system by default.

    If no such Pod is found, verify the OSM Helm chart was installed with the OpenServiceMesh.deployPrometheus parameter set to true with helm:

    $ helm get values -a <mesh name> -n <OSM namespace>

    If the parameter is set to anything but true, reinstall OSM with the --set=OpenServiceMesh.deployPrometheus=true flag on osm install.

  2. Verify the Prometheus Pod is healthy.

    The Prometheus Pod identified above should be both in a Running state and have all containers ready, as shown in the kubectl get output:

    $ # Assuming OSM is installed in the osm-system namespace:
    $ kubectl get pods -n osm-system -l app=osm-prometheus
    NAME                              READY   STATUS    RESTARTS   AGE
    osm-prometheus-5794755b9f-67p6r   1/1     Running   0          27m

    If the Pod is not showing as Running or its containers ready, use kubectl describe to look for other potential issues:

    $ # Assuming OSM is installed in the osm-system namespace:
    $ kubectl describe pods -n osm-system -l app=osm-prometheus

    Once the Prometheus Pod is found to be healthy, Prometheus should be reachable.

Metrics are not showing up in Prometheus

If Prometheus is found not to be scraping metrics for any Pods, perform the following steps to identify and resolve any issues.

  1. Verify application Pods are working as expected.

    If workloads running in the mesh are not functioning properly, metrics scraped from those Pods may not look correct. For example, if metrics showing traffic to Service A from Service B are missing, ensure the services are communicating successfully.

    To help further troubleshoot these kinds of issues, see the traffic troubleshooting guide.

  2. Verify the Pods whose metrics are missing have an Envoy sidecar injected.

    Only Pods with an Envoy sidecar container are expected to have their metrics scraped by Prometheus. Ensure each Pod is running a container from an image with envoyproxy/envoy in its name:

    $ kubectl get po -n <pod namespace> <pod name> -o jsonpath='{.spec.containers[*].image}'
    mynamespace/myapp:v1.0.0 envoyproxy/envoy-alpine:v1.17.2
  3. Verify the proxy’s endpoint being scraped by Prometheus is working as expected.

    Each Envoy proxy exposes an HTTP endpoint that shows metrics generated by that proxy and is scraped by Prometheus. Check to see if the expected metrics are shown by making a request to the endpoint directly.

    For each Pod whose metrics are missing, use kubectl to forward the Envoy proxy admin interface port and check the metrics:

    $ kubectl port-forward -n <pod namespace> <pod name> 15000

    Go to http://localhost:15000/stats/prometheus in a browser to check the metrics generated by that Pod. If Prometheus does not seem to be accounting for these metrics, move on to the next step to ensure Prometheus is configured properly.

  4. Verify the intended namespaces have been enrolled in metrics collection.

    For each namespace that contains Pods which should have metrics scraped, ensure the namespace is monitored by the intended OSM instance with osm mesh list.

    Next, check to make sure the namespace is annotated with enabled:

    $ # Assuming OSM is installed in the osm-system namespace:
    $ kubectl get namespace <namespace> -o jsonpath='{.metadata.annotations.openservicemesh\.io/metrics}'

    If no such annotation exists on the namespace or it has a different value, fix it with osm:

    $ osm metrics enable --namespace <namespace>
    Metrics successfully enabled in namespace [<namespace>]
  5. If custom metrics are not being scraped, verify they have been enabled.

    Custom metrics are currently disable by default and enabled when the OpenServiceMesh.featureFlags.enableWASMStats parameter is set to true. Verify the current OSM instance has this parameter set for a mesh named osm in the osm-system namespace:

    $ helm get values -a osm -n osm-system

    If OpenServiceMesh.featureFlags.enableWASMStats is set to a different value, reinstall OSM and pass --set OpenServiceMesh.featureFlags.enableWASMStats to osm install.