Skip to main content

Overview

Anthos Service Mesh (managed Istio) provides secure service-to-service communication, advanced traffic management, and deep observability for microservices on GKE. Fully managed by Google with automatic upgrades.

Mutual TLS

Automatic encryption between services

Traffic Control

Canary deployments, A/B testing, circuit breaking

Observability

Service topology, latency, error rates

Policy Enforcement

Fine-grained authorization, rate limiting

Why Service Mesh?

Challenge: By default, pods can talk to any other podSolution: Service mesh enforces mTLS + authorization policiesImplementation:
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production-mcp-server-langgraph
spec:
  mtls:
    mode: STRICT  # All traffic must be mTLS
Result: Encrypted, authenticated communication
Use cases:
  • Canary releases (10% traffic to v2)
  • A/B testing (iOS users → v2)
  • Blue-green deployments
  • Circuit breaking (prevent cascading failures)
Without mesh: Complex custom codeWith mesh: Declarative traffic rules
Built-in metrics:
  • Request rate (QPS per service)
  • P50/P95/P99 latency
  • Success rate (% 2xx responses)
  • Service dependency graph
Without mesh: Instrumentation code in every serviceWith mesh: Automatic sidecar collection
Scenario: Services across dev, staging, prod clustersCapability: Single mesh spanning clustersBenefit: Consistent policies, cross-cluster service discovery

Architecture

Components:
  • Istiod: Control plane (managed by Google, auto-upgraded)
  • Envoy sidecars: Injected into each pod, handle traffic
  • Telemetry: Metrics sent to Cloud Monitoring

Quick Setup (30 minutes)

1

Enable APIs & Fleet Registration

./deployments/service-mesh/anthos/setup-anthos-service-mesh.sh \
  PROJECT_ID production-mcp-server-langgraph-gke us-central1
What it does:
  • Enables Anthos Service Mesh APIs
  • Registers cluster with GKE Fleet
  • Enables managed service mesh
  • Waits for control plane (~10-15 min)
2

Verify Installation

# Check mesh status
gcloud container fleet mesh describe --project=PROJECT_ID

# Should show:
# state: ACTIVE
# controlPlaneManagement: AUTOMATIC

# Verify Istiod running
kubectl get pods -n istio-system
istiod pod should be Running
3

Enable Sidecar Injection

# Label namespace for automatic injection
kubectl label namespace production-mcp-server-langgraph istio-injection=enabled

# Restart deployments to inject sidecars
kubectl rollout restart deployment/production-mcp-server-langgraph \
  -n production-mcp-server-langgraph
4

Verify Sidecars Injected

# Check pods have 2 containers (app + envoy)
kubectl get pods -n production-mcp-server-langgraph

# Should show:
# NAME                                   READY   STATUS
# production-mcp-server-langgraph-...    2/2     Running
#                                        ^^^
#                                    app + sidecar

# Describe pod to see istio-proxy container
kubectl describe pod POD_NAME -n production-mcp-server-langgraph | grep istio-proxy
5

Enable Strict mTLS

kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production-mcp-server-langgraph
spec:
  mtls:
    mode: STRICT
EOF
All traffic now encrypted with mTLS!
6

Verify mTLS

# Check Kiali dashboard or use istioctl
istioctl proxy-config secret -n production-mcp-server-langgraph POD_NAME

# Should show TLS certificates

Traffic Management

Canary Deployment

Deploy new version to 10% of traffic:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: mcp-server
  namespace: production-mcp-server-langgraph
spec:
  hosts:
  - mcp-server
  http:
  - match:
    - headers:
        x-canary:
          exact: "true"
    route:
    - destination:
        host: mcp-server
        subset: v2
      weight: 100
  - route:
    - destination:
        host: mcp-server
        subset: v1
      weight: 90
    - destination:
        host: mcp-server
        subset: v2
      weight: 10  # 10% to canary
Workflow:
  1. Deploy v2 with label version: v2
  2. Apply VirtualService (10% → v2)
  3. Monitor metrics for 30 minutes
  4. If healthy, increase to 50%, then 100%
  5. If unhealthy, revert to 0%

Circuit Breaking

Prevent cascading failures:
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: postgres-proxy
  namespace: production-mcp-server-langgraph
spec:
  host: postgres-proxy
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 50
        http2MaxRequests: 100
        maxRequestsPerConnection: 2
    outlierDetection:
      consecutiveErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
Behavior: After 5 consecutive errors, eject pod for 30 seconds

Retry Policy

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: mcp-server
spec:
  http:
  - route:
    - destination:
        host: mcp-server
    retries:
      attempts: 3
      perTryTimeout: 2s
      retryOn: 5xx,reset,connect-failure

Security

Strict mTLS

  • Cluster-Wide
  • Namespace-Specific
  • Permissive (Migration)
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT
Applies to all namespaces.

Authorization Policies

Deny-all by default:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: production-mcp-server-langgraph
spec: {}  # Empty = deny all
Allow specific service:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allow-mcp-server
  namespace: production-mcp-server-langgraph
spec:
  selector:
    matchLabels:
      app: postgres-proxy
  action: ALLOW
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/mcp-production/sa/mcp-server"]
    to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/api/*"]
Result: Only mcp-server SA can call postgres-proxy

Observability

Service Topology

View in Google Cloud Console:
Navigation → Anthos → Service Mesh → Topology
Shows:
  • Service dependency graph
  • Traffic flow between services
  • Error rates per edge

Metrics

  • Request Rate
  • Latency (P95)
  • Error Rate
rate(istio_requests_total{
  destination_service_name="mcp-server",
  destination_workload_namespace="mcp-production"
}[1m])

Dashboards

Import pre-built dashboards:
# Install Kiali (service mesh dashboard)
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/kiali.yaml

# Port-forward
kubectl port-forward svc/kiali -n istio-system 20001:20001

# Open http://localhost:20001
Features:
  • Service graph visualization
  • Traffic animation
  • Configuration validation
  • Distributed tracing

Multi-Cluster Mesh

1

Register All Clusters

# Dev cluster
gcloud container fleet memberships register mcp-dev-membership \
  --gke-cluster=us-central1/mcp-dev-gke \
  --project=PROJECT_ID

# Staging cluster
gcloud container fleet memberships register mcp-staging-membership \
  --gke-cluster=us-central1/mcp-staging-gke \
  --project=PROJECT_ID

# Prod cluster (already registered)
2

Enable Mesh for All

gcloud container fleet mesh update \
  --management automatic \
  --memberships=mcp-dev-membership,mcp-staging-membership,mcp-prod-membership \
  --project=PROJECT_ID
3

Configure Cross-Cluster Service Discovery

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: external-staging-service
  namespace: production-mcp-server-langgraph
spec:
  hosts:
  - mcp-server.mcp-staging.svc.cluster.local
  location: MESH_INTERNAL
  ports:
  - number: 8000
    name: http
    protocol: HTTP
  resolution: DNS
Use case: Production can call staging services for integration testing

Troubleshooting

Symptom: Pod has 1/1 containers (should be 2/2)Checks:
# Verify namespace labeled
kubectl get namespace production-mcp-server-langgraph --show-labels

# Should see: istio-injection=enabled

# Check injection status
kubectl get mutatingwebhookconfigurations
Solution: Label namespace and restart pods
Symptom: Service A can’t connect to Service BChecks:
# Check PeerAuthentication
kubectl get peerauthentication -n production-mcp-server-langgraph

# Check DestinationRule
kubectl get destinationrule -n production-mcp-server-langgraph

# Verify certificates
istioctl proxy-config secret POD_NAME -n production-mcp-server-langgraph
Common fix: Ensure both sides have sidecars injected
Symptom: Mesh status shows PROVISIONING for >20 minutesSolution:
# Check fleet status
gcloud container fleet mesh describe --project=PROJECT_ID

# View logs
kubectl logs -n istio-system deployment/istiod

# If stuck, re-enable
gcloud container fleet mesh update \
  --management automatic \
  --memberships=MEMBERSHIP_NAME \
  --project=PROJECT_ID

Best Practices

Start with PERMISSIVE mTLS, then move to STRICT
# Week 1: Permissive (allow migration)
mtls:
  mode: PERMISSIVE

# Week 2: Strict (after all services have sidecars)
mtls:
  mode: STRICT
Use namespace-scoped policies for isolation
# Production has strict mTLS
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production-mcp-server-langgraph
spec:
  mtls:
    mode: STRICT

# Dev can be permissive
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: mcp-dev
spec:
  mtls:
    mode: PERMISSIVE
Enable resource limits on sidecars
apiVersion: v1
kind: ConfigMap
metadata:
  name: istio-sidecar-injector
  namespace: istio-system
data:
  values: |
    global:
      proxy:
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 256Mi
Monitor mesh health with SLIs
# SLI: 99% of requests < 500ms
# SLI: 99.9% success rate (non-5xx)
# Alert if error budget depleted


Next Steps

1

Install Anthos Service Mesh

./deployments/service-mesh/anthos/setup-anthos-service-mesh.sh PROJECT_ID
2

Enable Sidecar Injection

kubectl label namespace production-mcp-server-langgraph istio-injection=enabled
kubectl rollout restart deployment -n production-mcp-server-langgraph
3

Enable Strict mTLS

kubectl apply -f deployments/service-mesh/anthos/peer-authentication.yaml
4

Configure Traffic Rules

Set up canary deployments, circuit breaking, retries
5

Monitor Service Topology

Console → Anthos → Service Mesh → Topology