Recent Posts
Archives

Archive for the ‘en-US’ Category

PostHeaderIcon Why Project Managers Must Guard Against “Single Points of Failure” in Human Capital

In the world of systems architecture, we’re deeply familiar with the dangers of single points of failure: a server goes down, and suddenly, an entire service collapses. But what about the human side of our operations? What happens when a single employee holds the keys—sometimes literally—to critical infrastructure or institutional knowledge?

As a project manager, you’re not just responsible for timelines and deliverables—you’re also a risk manager. And one of the most insidious risks to any project or company is over-reliance on one individual.


The “Only One Who Knows” Problem

Here are some familiar but risky scenarios:

  • The lead engineer who is the only one with access to production.

  • The architect who built a legacy system but never documented it.

  • The IT admin who’s the sole owner of critical credentials.

  • The contractor who manages deployments but stores scripts only on their local machine.

These situations might feel efficient in the short term—“Let her handle it, she knows it best”—but they are dangerous. Because the moment that person is unavailable (sick leave, resignation, burnout, or worse), your entire project or company is exposed.

This isn’t just about contingency; it’s about resilience.


Human Capital Is Capital

As Peter Drucker famously said, “What gets measured gets managed.” But too often, human capital is not measured or managed with the rigor applied to financial or technical assets.

Yet your people—their knowledge, access, habits—are core infrastructure.

Consider the risks:

  • Operational disruption if a key team member disappears without handover

  • Security vulnerability if credentials are centralized in one individual’s hands

  • Knowledge drain when processes live only in someone’s memory

  • Compliance risk if proper delegation and documentation are missing


Practical Ways to Mitigate the Risk

As a PM or senior tech manager, you can apply several concrete practices to reduce this risk:

1. 📄 Document Everything

  • Maintain centralized and versioned process documentation

  • Include architecture diagrams, deployment workflows, emergency protocols

  • Use internal wikis or documentation tools like Confluence, Notion, or GitBook

2. 👥 Promote Redundancy Through Collaboration

  • Encourage pair programming, shadowing, or “brown bag” sessions

  • Rotate team members through different systems to broaden familiarity

3. 🔄 Rotate Access and Responsibilities

  • Build redundancy into roles—no one should be a bottleneck

  • Use tools like AWS IAM, 1Password, or HashiCorp Vault for shared, audited access

4. 🔎 Test the System Without Them

  • Simulate unavailability scenarios. Can the team deploy without X? Can someone else resolve critical incidents?

  • This is part of operational resiliency planning


A Real-World Example: HSBC’s Core Vacation Policy

When I worked at HSBC, a global financial institution with high security and compliance standards, they enforced a particularly impactful policy:

👉 Every employee or contractor was required to take at least 1 consecutive week of “core vacation” each year.

The reasons were twofold:

  1. Operational Resilience: To ensure that no person was irreplaceable, and teams could function in their absence.

  2. 🚨 Fraud Detection: Continuous presence often masks subtle misuse of systems or privileges. A break allows for behaviors to be reviewed or irregularities to surface.

This policy, common in banking and finance, is a brilliant example of using absence as a testing mechanism—not just for risk, but for trust and transparency.


Building Strong People and Even Stronger Systems

Let’s be clear: this is not about making people “replaceable.”
This is about making systems sustainable and protecting your team from burnout, stress, and unrealistic dependence.

You want to:

  • ✅ Respect your team’s contribution

  • ✅ Protect them from overexposure

  • ✅ Ensure your project or company remains healthy and functional

As the CTO of Basecamp, David Heinemeier Hansson, once said:

“People should be able to take a real vacation without the company collapsing. If they can’t, it’s a leadership failure, not a workforce problem.”


Further Reading and Resources

PostHeaderIcon [DefCon32] Abusing Legacy Railroad Signaling Systems

David Meléndez and Gabriela Gabs Garcia, researchers focused on transportation security, expose critical vulnerabilities in Spain’s legacy railroad signaling systems. Their presentation reveals how accessible hardware tools can compromise these systems, posing risks to train operations. By combining theoretical analysis with practical demonstrations, David and Gabriela urge stakeholders to bolster protections for critical infrastructure.

Vulnerabilities in Railroad Signaling

David and Gabriela begin by outlining the mechanics of railway signaling, which relies on beacons to communicate track status to train operators. Using off-the-shelf tools, they demonstrate how these systems can be manipulated to display false signals, potentially causing derailments or collisions. Their research, motivated by Spain’s high terrorist alert level, highlights the ease of tampering with outdated infrastructure, drawing parallels to past incidents like the 2004 Madrid train bombings.

Exploiting Accessible Technology

The duo details their methodology, showing how domestic hardware can override signal frequencies to mislead train operators. By crafting a device that mimics legitimate signals, attackers could disrupt train circulation without detection. David emphasizes the simplicity of these attacks, underscoring the urgent need for modernized systems to counter such threats, especially given the public availability of required tools.

Risks to Critical Infrastructure

Gabriela addresses the broader implications, noting that Spain’s railway vulnerabilities reflect global risks. The 2004 Madrid bombings, which killed 193 people, serve as a stark reminder of the stakes. Their findings reveal that motivated actors with basic knowledge could exploit these weaknesses, endangering lives and infrastructure. The researchers call for increased investment in security to prevent catastrophic incidents.

Call for Industry Action

Concluding, David and Gabriela advocate for a reevaluation of railway security protocols. They urge stakeholders to implement robust countermeasures, such as encrypted signaling and real-time monitoring, to protect against tampering. Their work aims to spark industry-wide dialogue, encouraging collaborative efforts to safeguard transportation networks worldwide.

Links:

  • None

PostHeaderIcon Understanding volatile in Java: A Deep Dive with a Cloud-Native Use Case

In the modern cloud-native world, concurrency is no longer a niche concern. Whether you’re building scalable microservices in Kubernetes, deploying serverless functions in AWS Lambda, or writing multithreaded backend services in Java, thread safety is a concept you must understand deeply.

Among Java’s many concurrency tools, the volatile keyword stands out as both simple and powerful—yet often misunderstood.

This article provides a comprehensive look at volatile, including real-world cloud-based scenarios, a complete Java example, and important caveats every developer should know.

What Does volatile Mean in Java?

At its core, the volatile keyword in Java is used to ensure visibility of changes to variables across threads.

  • Guarantees read/write operations are done directly from and to main memory, avoiding local CPU/thread caches.
  • Ensures a “happens-before” relationship, meaning changes to a volatile variable by one thread are visible to all other threads that read it afterward.

❌ The Problem volatile Solves

Let’s consider the classic issue: Thread A updates a variable, but Thread B doesn’t see it due to caching.

public class ServerStatus {
    private static boolean isRunning = true;

    public static void main(String[] args) throws InterruptedException {
        Thread monitor = new Thread(() -> {
            while (isRunning) {
                // still running...
            }
            System.out.println("Service stopped.");
        });

        monitor.start();
        Thread.sleep(1000);
        isRunning = false;
    }
}

Under certain JVM optimizations, Thread B might never see the change, causing an infinite loop.

✅ Using volatile to Fix the Visibility Issue

public class ServerStatus {
    private static volatile boolean isRunning = true;

    public static void main(String[] args) throws InterruptedException {
        Thread monitor = new Thread(() -> {
            while (isRunning) {
                // monitor
            }
            System.out.println("Service stopped.");
        });

        monitor.start();
        Thread.sleep(1000);
        isRunning = false;
    }
}

This change ensures all threads read the latest value of isRunning from main memory.

☁️ Cloud-Native Use Case: Gracefully Stopping a Health Check Monitor

Now let’s ground this with a real-world cloud-native example. Suppose a Spring Boot microservice runs a background thread that polls the health of cloud instances (e.g., EC2 or GCP VMs). On shutdown—triggered by a Kubernetes preStop hook—you want the monitor to exit cleanly.

public class CloudHealthMonitor {

    private static volatile boolean running = true;

    public static void main(String[] args) {
        Thread healthThread = new Thread(() -> {
            while (running) {
                pollHealthCheck();
                sleep(5000);
            }
            System.out.println("Health monitoring terminated.");
        });

        healthThread.start();

        Runtime.getRuntime().addShutdownHook(new Thread(() -> {
            System.out.println("Shutdown signal received.");
            running = false;
        }));
    }

    private static void pollHealthCheck() {
        System.out.println("Checking instance health...");
    }

    private static void sleep(long millis) {
        try {
            Thread.sleep(millis);
        } catch (InterruptedException ignored) {}
    }
}

This approach ensures your application exits gracefully, cleans up properly, and avoids unnecessary errors or alerts in monitoring systems.

⚙️ How volatile Works Behind the Scenes

Java allows compilers and processors to reorder instructions for optimization. This can lead to unexpected results in multithreaded contexts.

volatile introduces memory barriers that prevent instruction reordering and force flushes to/from main memory, maintaining predictable behavior.

Common Misconceptions

  • volatile makes everything thread-safe!” ❌ False. It provides visibility, not atomicity.
  • “Use volatile instead of synchronized Only for simple flags. Use synchronized for compound logic.
  • volatile is faster than synchronized ✅ Often true—but only if used appropriately.

When Should You Use volatile?

✔ Use it for:

  • Flags like running, shutdownRequested
  • Read-mostly config values that are occasionally changed
  • Safe publication in single-writer, multi-reader setups

✘ Avoid for:

  • Atomic counters (use AtomicInteger)
  • Complex inter-thread coordination
  • Compound read-modify-write operations

✅ Summary Table

Feature volatile
Visibility Guarantee ✅ Yes
Atomicity Guarantee ❌ No
Lock-Free ✅ Yes
Use for Flags ✅ Yes
Use for Counters ❌ No
Cloud Relevance ✅ Graceful shutdowns, health checks

Conclusion

In today’s cloud-native Java ecosystem, understanding concurrency is essential. The volatile keyword—though simple—offers a reliable way to ensure thread visibility and safe signaling across threads.

Whether you’re stopping a background process, toggling a configuration flag, or signaling graceful shutdowns, volatile remains an invaluable tool for writing correct, responsive, and cloud-ready code.

What About You?

Have you used volatile in a critical system before? Faced tricky visibility bugs? Share your insights in the comments!

Related Reading

PostHeaderIcon Advanced Encoding in Java, Kotlin, Node.js, and Python

Encoding is essential for handling text, binary data, and secure transmission across applications. Understanding advanced encoding techniques can help prevent data corruption and ensure smooth interoperability across systems. This post explores key encoding challenges and how Java/Kotlin, Node.js, and Python tackle them.


1️⃣ Handling Special Unicode Characters (Emoji, Accents, RTL Text)

Java/Kotlin

Java uses UTF-16 internally, but for external data (JSON, databases, APIs), explicit encoding is required:

String text = "🔧 Café مرحبا";
byte[] utf8Bytes = text.getBytes(StandardCharsets.UTF_8);
String decoded = new String(utf8Bytes, StandardCharsets.UTF_8);
System.out.println(decoded); // 🔧 Café مرحبا

Tip: Always specify StandardCharsets.UTF_8 to avoid platform-dependent defaults.

Node.js

const text = "🔧 Café مرحبا";
const utf8Buffer = Buffer.from(text, 'utf8');
const decoded = utf8Buffer.toString('utf8');
console.log(decoded); // 🔧 Café مرحبا

Tip: Using an incorrect encoding (e.g., latin1) may corrupt characters.

Python

text = "🔧 Café مرحبا"
utf8_bytes = text.encode("utf-8")
decoded = utf8_bytes.decode("utf-8")
print(decoded)  # 🔧 Café مرحبا

Tip: Python 3 handles Unicode by default, but explicit encoding is always recommended.


2️⃣ Encoding Binary Data for Transmission (Base64, Hex, Binary Files)

Java/Kotlin

byte[] data = "Hello World".getBytes(StandardCharsets.UTF_8);
String base64Encoded = Base64.getEncoder().encodeToString(data);
byte[] decoded = Base64.getDecoder().decode(base64Encoded);
System.out.println(new String(decoded, StandardCharsets.UTF_8)); // Hello World

Node.js

const data = Buffer.from("Hello World", 'utf8');
const base64Encoded = data.toString('base64');
const decoded = Buffer.from(base64Encoded, 'base64').toString('utf8');
console.log(decoded); // Hello World

Python

import base64
data = "Hello World".encode("utf-8")
base64_encoded = base64.b64encode(data).decode("utf-8")
decoded = base64.b64decode(base64_encoded).decode("utf-8")
print(decoded)  # Hello World

Tip: Base64 encoding increases data size (~33% overhead), which can be a concern for large files.


3️⃣ Charset Mismatches and Cross-Language Encoding Issues

A file encoded in ISO-8859-1 (Latin-1) may cause garbled text when read using UTF-8.

Java/Kotlin Solution:

byte[] bytes = Files.readAllBytes(Paths.get("file.txt"));
String text = new String(bytes, StandardCharsets.ISO_8859_1);

Node.js Solution:

const fs = require('fs');
const text = fs.readFileSync("file.txt", { encoding: "latin1" });

Python Solution:

with open("file.txt", "r", encoding="ISO-8859-1") as f:
    text = f.read()

Tip: Always specify encoding explicitly when working with external files.


4️⃣ URL Encoding and Decoding

Java/Kotlin

String encoded = URLEncoder.encode("Hello World!", StandardCharsets.UTF_8);
String decoded = URLDecoder.decode(encoded, StandardCharsets.UTF_8);

Node.js

const encoded = encodeURIComponent("Hello World!");
const decoded = decodeURIComponent(encoded);

Python

from urllib.parse import quote, unquote
encoded = quote("Hello World!")
decoded = unquote(encoded)

Tip: Use UTF-8 for URL encoding to prevent inconsistencies across different platforms.


Conclusion: Choosing the Right Approach

  • Java/Kotlin: Strong type safety, but requires careful Charset management.
  • Node.js: Web-friendly but depends heavily on Buffer conversions.
  • Python: Simple and concise, though strict type conversions must be managed.

📌 Pro Tip: Always be explicit about encoding when handling external data (APIs, files, databases) to avoid corruption.

 

PostHeaderIcon Mastering DNS Configuration: A, AAAA, CNAME, and Best Practices with OVH

I am currently reorganizing a website of mine, hosted at OVHcloud, and it is worth reminding some concepts and best practices related to DNS.

(disclaimer: I am not part of OVH at all, I express myself as a mere customer)

DNS (Domain Name System) is the backbone of the internet, translating human-friendly domain names into IP addresses that computers understand. Yet, many website owners and IT professionals struggle with its configuration. Let’s break down the essential DNS records—A, AAAA, and CNAME—and illustrate best practices using OVH’s interface.

Key DNS Records Explained

1️⃣ A Record (Address Record)

  • Maps a domain (e.g., example.com) to an IPv4 address (e.g., 192.168.1.1).
  • Best practice: Ensure you update this if your server IP changes.

2️⃣ AAAA Record (IPv6 Address Record)

  • Similar to A records but maps to an IPv6 address (e.g., 2001:db8::1).
  • Best practice: If your hosting provider supports IPv6, use this alongside A records for better future-proofing.

3️⃣ CNAME Record (Canonical Name Record)

  • Points a domain (e.g., blog.example.com) to another domain (example.wordpress.com).
  • Best practice: Use CNAME for aliases but avoid pointing the root domain (example.com) to another domain using CNAME—stick to A/AAAA records.

Configuring DNS Records in OVH

To set up a subdomain (blog.example.com) on OVH:

  1. Log in to your OVH Control Panel.
  2. Navigate to Web Cloud → Domains, then select your domain.
  3. Go to the DNS Zone tab and click Add an entry.
  4. Choose A Record if your blog has a dedicated IPv4, or CNAME if pointing to another domain.
  5. Enter your subdomain (blog) and the corresponding IP or domain.
  6. Save changes and wait for propagation (~24 hours max).

Best Practices for DNS Management

  • Use TTL (Time-To-Live) wisely: Lower values (e.g., 300s) allow faster updates but increase queries to your DNS provider.
  • Keep DNS records minimal: Avoid unnecessary CNAME chains to improve resolution speed.
  • Secure with DNSSEC: If your registrar supports it, enable DNSSEC to prevent DNS spoofing.
  • Regularly review DNS settings: Especially after migrations, new SSL configurations, or changes in hosting.

PostHeaderIcon [DefCon32] Behind Enemy Lines: Engaging and Disrupting Ransomware Web Panels

Vangelis Stykas, Chief Technology Officer at Atropos, delivers a bold exploration of offensive cybersecurity, targeting the command-and-control (C2) web panels of ransomware groups. His talk unveils strategies to infiltrate these systems, disrupt operations, and gather intelligence on threat actors. Vangelis’s work, driven by a desire to challenge criminal enterprises, showcases the power of turning adversaries’ tools against them, offering a fresh perspective on combating ransomware.

Targeting Ransomware Infrastructure

Vangelis opens by highlighting the resilience of ransomware groups, noting that only 3.5% of 140 tested web panels exhibited vulnerabilities, compared to 15–20% for Fortune 100 companies. He recounts infiltrating panels of groups like ALPHV/BlackCat, Everest, and Mallox, exploiting flaws such as outdated WordPress sites and chat features. These breaches enabled Vangelis to extract decryption keys and member identities, disrupting operations and aiding victims.

Methodologies for Infiltration

Delving into technical strategies, Vangelis explains how he exploited low-hanging vulnerabilities in ransomware C2 panels, such as misconfigured APIs and weak authentication. His approach, refined over two years, involved identifying data leak sites and leveraging penetration testing expertise to gain unauthorized access. By targeting infrastructure like Tor networks and custom firewalls, Vangelis demonstrates how attackers’ own security measures can be weaponized against them.

Ethical Dilemmas and Community Impact

Vangelis reflects on the moral complexities of his work, rejecting the vigilante label in favor of being a “Socratic fly” that disrupts the status quo. He urges cyber threat intelligence (CTI) firms to share data openly, noting that faster access to C2 information could amplify his impact. His successes, including contributing to ALPHV/BlackCat’s collapse, highlight the potential of offensive tactics to weaken ransomware ecosystems.

Future of Cyber Offense

Concluding, Vangelis emphasizes the need for persistent innovation in fighting ransomware. He advocates for collaborative intelligence sharing and proactive disruption of criminal infrastructure. By drawing parallels to the “Five Horsemen” of cyber threats, Vangelis inspires researchers to confront adversaries head-on, ensuring that the cybersecurity community remains one step ahead in this ongoing battle.

Links:

PostHeaderIcon [DotJs2024] Dante’s Inferno of Fullstack Development (A Brief History)

Fullstack webcraft’s tumult—acronym avalanches, praxis pivots—evokes a helical descent, yet upward spiral. James Q. Quick, a JS evangelist, speaker, and BigCommerce developer experience lead, traversed this inferno at dotJS 2024, channeling Dante’s nine circles via Dan Brown’s lens. A Rubik’s aficionado (sub-two minutes) and Da Vinci Code devotee (Paris-site pilgrim), Quick, born 1991—the web’s inaugural site’s year—wove personal yarns into a scorecard saga, rating eras on SEO, performance, build times, dynamism. His verdict: chaos conceals progress; contextualize to conquer.

Quick decried distraction’s vortex: HTML/CSS/JS/gGit/npm, framework frenzy—Vue, React, Svelte, et al.—framework-hopping’s siren song. His jest: “GrokweJS,” halting churn. Web genesis: 1989 Berners-Lee, 1991 inaugural site (HTML how-to), 1996 Space Jam’s static splendor. Circle one: static HTML—SEO stellar, perf pristine, builds nil, dynamism dead. LAMP stacks (two: PHP/MySQL) injected server dynamism—SEO middling, perf client-hobbled, builds absent, dynamism robust.

Client-side JS (three: jQuery/Angular) flipped: SEO tanked (crawlers blind), perf ballooned bundles, builds concatenated, dynamism client-rich. Jamstack’s static resurgence (four: Gatsby/Netlify)—SEO revived, perf CDN-fast, builds protracted, dynamism API-propped—reigned till content deluges. SSR revival (five: Next.js/Nuxt)—SEO solid, perf hybrid, builds lengthy, dynamism server-fresh—bridged gaps.

Hybrid rendering (six: Astro/Next)—per-page static/SSR toggles—eased dynamism sans universal builds. ISR (seven: Next’s coinage)—subset builds, on-demand SSR, CDN-cache—slashed times, dynamism on-tap. Hydration’s bane (eight): JS deluges for interactivity, wasteful. Server components (nine: React/Next, Remix, Astro islands)—stream static shells, async data, cache surgically—optimize bites, interactivity islands.

Quick’s spiral: circles ascend, solving yesteryear’s woes innovatively. Pantheon’s 203 steps with napping tot evoked hope: endure inferno, behold stars.

Static Foundations to Dynamic Dawns

Quick’s scorecard chronicled: HTML’s purity (1991 site) to LAMP’s server pulse, client JS’s interactivity boon-cum-SEO curse. Jamstack’s static revival—Gatsby’s graphs—revitalized speed, API-fed dynamism; SSR’s return balanced freshness with crawlability.

Hybrid Horizons and Server Supremacy

Hybrids like Astro cherry-pick render modes; ISR on-demand builds dynamism sans staleness. Hydration’s excess yields to server components: React’s streams static + async payloads, islands (Astro/Remix) granularize JS—caching confluence for optimal perf.

Links:

PostHeaderIcon Efficient Inter-Service Communication with Feign and Spring Cloud in Multi-Instance Microservices

In a world where systems are becoming increasingly distributed and cloud-native, microservices have emerged as the de facto architecture. But as we scale
microservices horizontally—running multiple instances for each service—one of the biggest challenges becomes inter-service communication.

How do we ensure that our services talk to each other reliably, efficiently, and in a way that’s resilient to failures?

Welcome to the world of Feign and Spring Cloud.


The Challenge: Multi-Instance Microservices

Imagine you have a user-service that needs to talk to an order-service, and your order-service runs 5 instances behind a
service registry like Eureka. Hardcoding URLs? That’s brittle. Manual load balancing? Not scalable.

You need:

  • Service discovery to dynamically resolve where to send the request
  • Load balancing across instances
  • Resilience for timeouts, retries, and fallbacks
  • Clean, maintainable code that developers love

The Solution: Feign + Spring Cloud

OpenFeign is a declarative web client. Think of it as a smart HTTP client where you only define interfaces — no more boilerplate REST calls.

When combined with Spring Cloud, Feign becomes a first-class citizen in a dynamic, scalable microservices ecosystem.

✅ Features at a Glance:

  • Declarative REST client
  • Automatic service discovery (Eureka, Consul)
  • Client-side load balancing (Spring Cloud LoadBalancer)
  • Integration with Resilience4j for circuit breaking
  • Easy integration with Spring Boot config and observability tools

Step-by-Step Setup

1. Add Dependencies

[xml][/xml]

If using Eureka:

[xml][/xml]


2. Enable Feign Clients

In your main Spring Boot application class:

[java]@SpringBootApplication
@EnableFeignClients
public <span>class <span>UserServiceApplication { … }
[/java]


3. Define Your Feign Interface

[java]
@FeignClient(name = "order-service")
public interface OrderClient { @GetMapping("/orders/{id}")
OrderDTO getOrder(@PathVariable("id") Long id); }
[/java]

Spring will automatically:

  • Register this as a bean
  • Resolve order-service from Eureka
  • Load-balance across all its instances

4. Add Resilience with Fallbacks

You can configure a fallback to handle failures gracefully:

[java]

@FeignClient(name = "order-service", fallback = OrderClientFallback.class)
public interface OrderClient {
@GetMapping("/orders/{id}") OrderDTO getOrder(@PathVariable Long id);
}[/java]

The fallback:

[java]

@Component
public class OrderClientFallback implements OrderClient {
@Override public OrderDTO getOrder(Long id) {
return new OrderDTO(id, "Fallback Order", LocalDate.now());
}
}[/java]


⚙️ Configuration Tweaks

Customize Feign timeouts in application.yml:

[yml]

feign:

    client:

       config:

           default:

                connectTimeout:3000

                readTimeout:500

[/yml]

Enable retry:

[xml]
feign:
client:
config:
default:
retryer:
maxAttempts: 3
period: 1000
maxPeriod: 2000
[/xml]


What Happens Behind the Scenes?

When user-service calls order-service:

  1. Spring Cloud uses Eureka to resolve all instances of order-service.
  2. Spring Cloud LoadBalancer picks an instance using round-robin (or your chosen strategy).
  3. Feign sends the HTTP request to that instance.
  4. If it fails, Resilience4j (or your fallback) handles it gracefully.

Observability & Debugging

Use Spring Boot Actuator to expose Feign metrics:

[xml]

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency[/xml]

And tools like Spring Cloud Sleuth + Zipkin for distributed tracing across Feign calls.


Beyond the Basics

To go even further:

  • Integrate with Spring Cloud Gateway for API routing and external access.
  • Use Spring Cloud Config Server to centralize configuration across environments.
  • Secure Feign calls with OAuth2 via Spring Security and OpenID Connect.

✨ Final Thoughts

Using Feign with Spring Cloud transforms service-to-service communication from a tedious, error-prone task into a clean, scalable, and cloud-native solution.
Whether you’re scaling services across zones or deploying in Kubernetes, Feign ensures your services communicate intelligently and resiliently.

PostHeaderIcon Problem: Spring JMS MessageListener Stuck / Not Receiving Messages

Scenario

A Spring Boot application using ActiveMQ with @JmsListener suddenly stops receiving messages after running for a while. No errors in logs, and the queue keeps growing, but the consumers seem idle.

Setup

@JmsListener(destination = "myQueue", concurrency = "5-10") public void processMessage(String message) { log.info("Received: {}", message); }
  • ActiveMQConnectionFactory was used.

  • The queue (myQueue) was filling up.

  • Restarting the app temporarily fixed the issue.


Investigation

  1. Checked ActiveMQ Monitoring (Web Console)

    • Messages were enqueued but not dequeued.

    • Consumers were still active, but not processing.

  2. Thread Dump Analysis

    • Found that listener threads were stuck in a waiting state.

    • The problem only occurred under high load.

  3. Checked JMS Acknowledgment Mode

    • Default AUTO_ACKNOWLEDGE was used.

    • Suspected an issue with message acknowledgment.

  4. Enabled Debug Logging

    • Added:

      logging.level.org.springframework.jms=DEBUG
    • Found repeated logs like:

      JmsListenerEndpointContainer#0-1 received message, but no further processing
    • This hinted at connection issues.

  5. Tested with a Different Message Broker

    • Using Artemis JMS instead of ActiveMQ resolved the issue.

    • Indicated that it was broker-specific.


Root Cause

ActiveMQ’s TCP connection was silently dropped, but the JMS client did not detect it.

  • When the connection is lost, DefaultMessageListenerContainer doesn’t always recover properly.

  • ActiveMQ does not always notify clients of broken connections.

  • No exceptions were thrown because the connection was technically “alive” but non-functional.


Fix

  1. Enabled keepAlive in ActiveMQ connection

    ActiveMQConnectionFactory factory = new ActiveMQConnectionFactory(); factory.setUseKeepAlive(true); factory.setOptimizeAcknowledge(true); return factory;
  2. Forced Reconnection with Exception Listener

    • Implemented:

      factory.setExceptionListener(exception -> { log.error("JMS Exception occurred, reconnecting...", exception); restartJmsListener(); });
    • This ensured that if a connection was dropped, the listener restarted.

  3. Switched to DefaultJmsListenerContainerFactory with DMLC

    • SimpleMessageListenerContainer was less reliable in handling reconnections.

    • New Configuration:

      @Bean public DefaultJmsListenerContainerFactory jmsListenerContainerFactory( ConnectionFactory connectionFactory) { DefaultJmsListenerContainerFactory factory = new DefaultJmsListenerContainerFactory(); factory.setConnectionFactory(connectionFactory); factory.setSessionTransacted(true); factory.setErrorHandler(t -> log.error("JMS Listener error", t)); return factory; }

Final Outcome

✅ After applying these fixes, the issue never reoccurred.
🚀 The app remained stable even under high load.


Key Takeaways

  • Silent disconnections in ActiveMQ can cause message listeners to hang.

  • Enable keepAlive and optimizeAcknowledge for reliable connections.

  • Use DefaultJmsListenerContainerFactory with DMLC instead of SMLC.

  • Implement an ExceptionListener to restart the JMS connection if necessary.

 

PostHeaderIcon How to Bypass Elasticsearch’s 10,000-Result Limit with the Scroll API

If you’ve ever worked with the Elasticsearch API, you’ve likely run into its infamous 10,000-result limit. It’s a default cap that can feel like a brick wall when you’re dealing with large datasets—think log analysis, report generation, or bulk data exports. Fortunately, there’s a slick workaround: the Scroll API. In this post, I’ll walk you through why this limit exists, how the Scroll API solves it, and share practical examples to get you started.

Why the 10,000-Result Limit Exists

Elasticsearch caps standard search results at 10,000 to protect performance. Fetching millions of records in one shot with from and size parameters can strain memory and slow things down. But what if you need all that data? That’s where the Scroll API shines—it’s designed for deep pagination, letting you retrieve everything in manageable chunks.

What Is the Scroll API?

Unlike a typical search, the Scroll API maintains a temporary “scroll context” on the server. You grab a batch of results, get a scroll_id, and use it to fetch the next batch—no need to rerun your query. It’s efficient, scalable, and perfect for big data tasks.

How to Use the Scroll API: Step by Step

Let’s break it down with examples you can try yourself.

Step 1: Start the Scroll

Kick things off with a search request. Add the scroll parameter (like 1m for a 1-minute timeout) and set size to control your batch size. Here’s a basic example:
GET /my_index/_search?scroll=1m
{
  "size": 1000,
  "query": {
    "match_all": {}
  }
}
This pulls the first 1,000 hits and returns a `scroll_id`—a long, encoded string you’ll need for the next step.

Step 2: Fetch More Results

Using that `scroll_id`, request the next batch. You don’t need to repeat the query—just send the ID and timeout:
POST /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "c2NhbjsxMDAwO...YOUR_SCROLL_ID_HERE..."
}
Loop this call until you’ve retrieved all your data. Each response includes a new `scroll_id` (sometimes the same, depending on the version), so keep updating it.

Step 3: Clean Up

When you’re done, delete the scroll context to free up server resources. It’s a small but critical step:
DELETE /_search/scroll/c2NhbjsxMDAwO...YOUR_SCROLL_ID_HERE...

Skip this, and you’ll leave dangling contexts that could bog down your cluster.

A Real-World Example

Let’s say you’re sifting through millions of logs for a specific error. Here’s a targeted scroll query:
GET /logs/_search?scroll=2m
{
  "size": 500,
  "query": {
    "match": {
      "error_message": "timeout"
    }
  }
}

Then, use the Scroll API to paginate through every matching log entry. It’s way cleaner than hacking around with `from` and `size`.
Tips for Scroll API Success
  • Batch Size: Stick to a `size` like 500–1000. Too large, and you’ll strain memory; too small, and you’ll make too many requests.
  • Timeout Tuning: Set the scroll duration (e.g., `1m`, `5m`) based on how fast your script processes each batch. Too short, and the context expires mid-run.
  • Automation: Use a script to handle the loop. Python’s `elasticsearch` library, for instance, has a handy scroll helper:
from elasticsearch import Elasticsearch

es = Elasticsearch(["http://localhost:9200"])
scroll = es.search(index="logs", scroll="2m", size=500, body={"query": {"match": {"error_message": "timeout"}}})
scroll_id = scroll["_scroll_id"]

while len(scroll["hits"]["hits"]):
    print(scroll["hits"]["hits"])  # Process this batch
    scroll = es.scroll(scroll_id=scroll_id, scroll="2m")
    scroll_id = scroll["_scroll_id"]

es.clear_scroll(scroll_id=scroll_id)  # Cleanup

Why Scroll Beats the Alternatives

You could tweak `index.max_result_window` to raise the limit, but that’s a performance gamble. Export tools or aggregations might work for summaries, but for raw data retrieval, Scroll is king—efficient and built for the job.

Conclusion

The Scroll API has been a game-changer for my Elasticsearch projects, especially when wrestling with massive indices. It’s simple once you get the hang of it, and the payoff is huge.