Aarav Joshi

Posted on Sep 7

Optimizing Java Applications in Kubernetes: 10 Essential Performance Techniques for Production Deployments

#programming #devto #java #softwareengineering

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

Running Java applications in Kubernetes presents unique challenges that differ from traditional deployments. The dynamic nature of container orchestration requires thoughtful configuration to achieve optimal performance. Through my experience deploying enterprise Java applications, I've identified several key techniques that significantly improve reliability and efficiency.

Resource management forms the foundation of stable Java applications in Kubernetes. Containers without proper resource constraints can consume excessive CPU and memory, affecting other applications on the node. I always define both requests and limits to help the scheduler make intelligent placement decisions while preventing resource starvation.

Consider this deployment configuration that establishes sensible boundaries:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: java-application
spec:
  template:
    spec:
      containers:
      - name: java-app
        image: my-java-app:latest
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        env:
        - name: JAVA_OPTS
          value: "-XX:MaxRAMPercentage=75.0"

The JAVA_OPTS environment variable ensures the JVM respects container memory limits. Without this configuration, the JVM might allocate memory based on the host machine's capacity rather than the container's constraints. This often leads to out-of-memory kills by the Kubernetes system.

Proper health checks are crucial for maintaining application availability. Kubernetes uses liveness and readiness probes to determine container health and traffic routing. I've found that Spring Boot Actuator provides excellent endpoints for these checks.

This configuration establishes robust health monitoring:

livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  initialDelaySeconds: 45
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 3

The initial delay gives the JVM sufficient time to start up completely. I learned this the hard way when probes were failing due to slow JVM initialization. The different endpoints for liveness and readiness allow for more granular health reporting.

Horizontal pod autoscaling enables applications to handle variable loads efficiently. Based on CPU utilization or custom metrics, Kubernetes can automatically adjust the number of pod replicas. This elasticity proves invaluable during traffic spikes.

Here's a comprehensive autoscaling configuration:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: java-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: java-application
  minReplicas: 3
  maxReplicas: 15
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 65
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 75
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 20
        periodSeconds: 60

The behavior section controls scaling aggression. I prefer conservative scale-down policies to prevent rapid replica reduction during temporary load drops. The memory-based scaling provides additional protection against memory leaks.

Distributed caching dramatically improves application performance by reducing database load. In multi-replica deployments, maintaining cache consistency across pods becomes essential. Redis has become my preferred solution for distributed caching.

Implementing Redis caching in Spring Boot requires minimal configuration:

@Configuration
@EnableCaching
public class RedisConfig {

    @Bean
    public RedisConnectionFactory redisConnectionFactory() {
        return new LettuceConnectionFactory("redis-service", 6379);
    }

    @Bean
    public RedisCacheManager cacheManager(RedisConnectionFactory connectionFactory) {
        RedisCacheConfiguration config = RedisCacheConfiguration.defaultCacheConfig()
            .entryTtl(Duration.ofMinutes(30))
            .disableCachingNullValues()
            .serializeValuesWith(SerializationPair.fromSerializer(new GenericJackson2JsonRedisSerializer()));

        return RedisCacheManager.builder(connectionFactory)
            .cacheDefaults(config)
            .withInitialCacheConfigurations(Map.of(
                "users", RedisCacheConfiguration.defaultCacheConfig()
                    .entryTtl(Duration.ofHours(1)),
                "products", RedisCacheConfiguration.defaultCacheConfig()
                    .entryTtl(Duration.ofMinutes(15))
            ))
            .build();
    }
}

The configuration establishes different TTL values for various data types. User data might cache longer than product information, which changes more frequently. This approach maximizes cache effectiveness while ensuring data freshness.

Centralized logging provides crucial visibility into distributed applications. When multiple pod instances run simultaneously, correlating logs becomes challenging. The Loki stack offers lightweight log aggregation specifically designed for Kubernetes.

This values.yaml configuration sets up comprehensive log collection:

loki:
  config:
    schema_config:
      configs:
      - from: 2020-10-24
        store: boltdb-shipper
        object_store: s3
        schema: v11
        index:
          prefix: index_
          period: 24h
    storage_config:
      aws:
        s3: s3://us-east-1/log-bucket
        region: us-east-1
      boltdb_shipper:
        active_index_directory: /var/loki/boltdb-shipper-active
        cache_location: /var/loki/boltdb-shipper-cache
        shared_store: s3

promtail:
  config:
    clients:
    - url: http://loki:3100/loki/api/v1/push
    scrape_configs:
    - job_name: kubernetes-pods
      kubernetes_sources:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        target_label: app
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod

The configuration includes S3 storage for long-term log retention and proper labeling for log correlation. I've found that consistent labeling makes log searching much more efficient during incident investigation.

JVM tuning plays a critical role in containerized environments. Traditional JVM settings designed for physical servers often perform poorly in containers. The newer container-aware JVM flags significantly improve performance.

This Dockerfile demonstrates optimal JVM configuration:

FROM openjdk:17-jdk-slim

WORKDIR /app

COPY target/my-application.jar app.jar

# Set container-aware JVM options
ENV JAVA_OPTS="-XX:+UseContainerSupport \
-XX:MaxRAMPercentage=75.0 \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \
-XX:InitiatingHeapOccupancyPercent=35 \
-XX:ParallelGCThreads=4 \
-XX:ConcGCThreads=2 \
-XX:+AlwaysPreTouch \
-XX:+UseStringDeduplication \
-XX:+ExitOnOutOfMemoryError"

EXPOSE 8080

ENTRYPOINT exec java $JAVA_OPTS -jar app.jar

The UseContainerSupport flag ensures the JVM recognizes container boundaries. MaxRAMPercentage controls heap size relative to container memory limits. I typically set this between 70-80% to leave room for off-heap memory and system processes.

Proper startup and shutdown handling prevents data corruption and service interruptions. Kubernetes sends SIGTERM signals to containers before termination, giving applications time to complete ongoing work.

Implementing graceful shutdown in Spring Boot:

@Configuration
public class GracefulShutdownConfig {

    @Bean
    public GracefulShutdown gracefulShutdown() {
        return new GracefulShutdown();
    }

    @Bean
    public ServletWebServerFactory servletContainer() {
        TomcatServletWebServerFactory factory = new TomcatServletWebServerFactory();
        factory.addConnectorCustomizers(gracefulShutdown());
        return factory;
    }
}

public class GracefulShutdown implements TomcatConnectorCustomizer {

    private Connector connector;

    @Override
    public void customize(Connector connector) {
        this.connector = connector;
    }

    public void pause() {
        this.connector.pause();
        Executor executor = this.connector.getProtocolHandler().getExecutor();
        if (executor instanceof ThreadPoolExecutor) {
            try {
                ThreadPoolExecutor threadPoolExecutor = (ThreadPoolExecutor) executor;
                threadPoolExecutor.shutdown();
                if (!threadPoolExecutor.awaitTermination(30, TimeUnit.SECONDS)) {
                    log.warn("Tomcat thread pool did not shut down gracefully");
                }
            } catch (InterruptedException ex) {
                Thread.currentThread().interrupt();
            }
        }
    }
}

This configuration ensures the application stops accepting new requests while allowing existing requests to complete. The 30-second timeout provides sufficient time for most operations to finish.

Configuration management becomes more complex in distributed environments. Instead of embedding configuration in container images, external configuration sources provide greater flexibility.

Using Kubernetes ConfigMaps for external configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: java-app-config
data:
  application.properties: |
    server.port=8080
    spring.datasource.url=jdbc:postgresql://database:5432/mydb
    spring.datasource.username=appuser
    spring.jpa.hibernate.ddl-auto=validate
    spring.cache.type=redis
    spring.redis.host=redis-service
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: java-application
spec:
  template:
    spec:
      containers:
      - name: java-app
        image: my-java-app:latest
        volumeMounts:
        - name: config-volume
          mountPath: /app/config
        env:
        - name: SPRING_CONFIG_LOCATION
          value: file:/app/config/application.properties
      volumes:
      - name: config-volume
        configMap:
          name: java-app-config

This approach separates configuration from application code, enabling environment-specific settings without rebuilding images. I can update configuration without redeploying the entire application.

Monitoring and metrics collection provide essential insights into application performance. Prometheus integration offers detailed JVM and application metrics that help identify performance bottlenecks.

Configuring Micrometer for Prometheus metrics:

@Configuration
public class MetricsConfig {

    @Bean
    public MeterRegistryCustomizer<PrometheusMeterRegistry> metricsCommonTags() {
        return registry -> registry.config().commonTags(
            "application", "java-application",
            "environment", System.getenv().getOrDefault("ENV", "dev")
        );
    }

    @Bean
    public TimedAspect timedAspect(MeterRegistry registry) {
        return new TimedAspect(registry);
    }
}

@Service
public class OrderService {

    @Timed(value = "order.process", description = "Time taken to process order")
    public Order processOrder(OrderRequest request) {
        // Business logic
    }

    @Counted(value = "order.created", description = "Total orders created")
    public Order createOrder(OrderRequest request) {
        // Creation logic
    }
}

The @Timed and @Counted annotations provide detailed method-level metrics. These metrics help identify slow methods and track business transaction volumes.

Network optimization reduces latency between microservices. Proper service configuration and connection pooling significantly impact performance in distributed systems.

Configuring HTTP client connection pooling:

@Configuration
public class HttpClientConfig {

    @Bean
    public ConnectionKeepAliveStrategy keepAliveStrategy() {
        return (response, context) -> {
            HeaderElementIterator it = new BasicHeaderElementIterator(
                response.headerIterator(HTTP.CONN_KEEP_ALIVE));
            while (it.hasNext()) {
                HeaderElement he = it.nextElement();
                String param = he.getName();
                String value = he.getValue();
                if (value != null && param.equalsIgnoreCase("timeout")) {
                    return Long.parseLong(value) * 1000;
                }
            }
            return 30 * 1000;
        };
    }

    @Bean
    public CloseableHttpClient httpClient() {
        return HttpClients.custom()
            .setMaxConnTotal(100)
            .setMaxConnPerRoute(20)
            .setKeepAliveStrategy(keepAliveStrategy())
            .setConnectionTimeToLive(30, TimeUnit.SECONDS)
            .build();
    }
}

The connection pool configuration prevents connection establishment overhead for frequent inter-service communication. The keep-alive strategy maintains persistent connections when possible.

Security considerations must address both application and infrastructure concerns. Proper secret management and network policies protect sensitive data and restrict unnecessary communication.

Using Kubernetes Secrets for sensitive configuration:

apiVersion: v1
kind: Secret
metadata:
  name: database-credentials
type: Opaque
data:
  username: dXNlcm5hbWU=
  password: cGFzc3dvcmQ=
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: java-application
spec:
  template:
    spec:
      containers:
      - name: java-app
        image: my-java-app:latest
        env:
        - name: DB_USERNAME
          valueFrom:
            secretKeyRef:
              name: database-credentials
              key: username
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: database-credentials
              key: password

Secrets provide secure storage for sensitive information like database passwords and API keys. The base64 encoding offers basic protection, though additional encryption might be necessary for highly sensitive environments.

Performance testing and optimization require continuous monitoring and adjustment. Load testing in environments that mirror production helps identify bottlenecks before they affect users.

Implementing performance monitoring and alerting:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: java-app-monitor
spec:
  selector:
    matchLabels:
      app: java-application
  endpoints:
  - port: http
    interval: 30s
    path: /actuator/prometheus
  - port: http
    interval: 30s
    path: /actuator/health
    metricRelabelings:
    - sourceLabels: [__name__]
      regex: '(http_server_requests_seconds_.*)'
      action: keep

The ServiceMonitor configuration collects application metrics at regular intervals. The metric relabeling focuses on important metrics, reducing storage requirements and improving query performance.

These techniques have proven effective across numerous production deployments. The combination of proper resource management, health monitoring, autoscaling, distributed caching, and centralized observability creates a robust foundation for Java applications in Kubernetes environments.

Each application has unique requirements, so I recommend gradual implementation and thorough testing. Start with resource limits and health checks, then progressively add more advanced features like autoscaling and distributed caching.

Regular performance testing helps validate configuration choices and identify areas for improvement. Monitoring actual production traffic provides the most valuable insights into optimal configuration parameters.

The dynamic nature of Kubernetes means configurations may need adjustment as application usage patterns evolve. Continuous monitoring and occasional tuning ensure applications maintain peak performance throughout their lifecycle.

Remember that optimal configurations depend on specific application characteristics and workload patterns. What works for one application might not be ideal for another. Regular performance analysis and adjustment remain essential for maintaining optimal operation.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!

101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!