Jonathan Lalou's Blog

Posts Tagged ‘Jonathan Lalou’

Docker Rootless Mode, or Fixing Persistent Docker Daemon Failure in WSL 2

This comprehensive tutorial addresses the complex and persistent issue of the Docker daemon failing to start with a "Device or resource busy" error when running a native Docker Engine inside a WSL 2 (Windows Subsystem for Linux 2) distribution, ultimately leading to the necessity of switching to Docker Rootless Mode.

1. The Problem Overview

The core issue is the native system-wide Docker daemon (dockerd) failing to initialize upon startup inside the WSL 2 environment. This failure manifests as a persistent loop of errors:

High-Level Status: Active: failed (Result: exit-code) or Start request repeated too quickly.
Access Block: Attempts to clear corrupted storage often fail with mv: cannot move '/var/lib/docker': Device or resource busy.
Root Cause: The failures stem from a combination of stale lock files (docker.pid), corrupted storage metadata (metadata.db), and fundamental conflicts with the WSL 2 kernel's implementation of features like Cgroups or network socket activation.

To resolve this reliably, the solution is to bypass the system-level conflicts by switching from the problematic rootful daemon to the more stable Docker Rootless mode.

2. Step-by-Step Resolution

The resolution involves three phases: diagnosing the specific low-level crash, performing an aggressive cleanup to free the lock, and finally, installing the stable rootless solution.

Phase 1: Aggressive Cleanup and File Lock Removal

The persistent "Device or resource busy" error is the primary block. Even a full Windows reboot or wsl --shutdown often fails to clear the lock held on /var/lib/docker.

A. Forcefully Shut Down WSL 2

Close all WSL terminals.
Open Windows PowerShell (or CMD).
Execute the global shutdown command: This ensures the Linux kernel and all running processes are terminated, releasing file locks.
```
wsl --shutdown
```

B. Identify and Rename the Corrupted Directory

Relaunch your WSL terminal.
Rename the Corrupted Docker Storage: This creates a fresh start for the storage driver. If this fails with Device or resource busy (which is highly likely), proceed to step C.
```
sudo mv /var/lib/docker /var/lib/docker.bak
```

[If Rename Fails] Terminate and Delete the Lock File: The daemon failed because it was locked by a rogue PID, which often leaves behind a stale PID file.

# Stop the failing service (just in case it auto-started)
sudo systemctl stop docker.service

# Delete the stale PID file that falsely signals the daemon is running
sudo rm /var/run/docker.pid

Phase 2: Switch to Docker Rootless Mode

Rootless mode installs the daemon under your standard user account, isolating it from the system-level issues that caused the failure.

A. Install Prerequisites

Install the uidmap package, which is necessary for managing user namespaces in the rootless environment.

Check and clear any package locks (if necessary):
If sudo apt install hangs, check for and kill the conflicting process (e.g., unattended-upgr using sudo kill -9 <PID>), and then delete the lock files:
```
sudo rm /var/lib/dpkg/lock-frontend
sudo rm /var/lib/dpkg/lock
sudo dpkg --configure -a
```

Install uidmap:

sudo apt update
sudo apt install uidmap

B. Install the Rootless Daemon

Ensure the system-wide daemon is stopped and disabled to prevent conflicts:

sudo systemctl stop docker.service
sudo systemctl disable docker.service
sudo rm /var/run/docker.sock # Clean up the system socket

Run the Rootless setup script:
```
dockerd-rootless-setuptool.sh install
```

Phase 3: Configure and Launch

The setup script completes the installation but requires manual configuration to launch the daemon and set the necessary environment variables.

A. Configure Shell Environment

Edit your bash profile (~/.bashrc):
```
vi ~/.bashrc
```

Add the necessary environment variables (these lines are typically provided by the setup script and redirect the client to the rootless socket):

# Docker Rootless configuration for user <your_username>
export XDG_RUNTIME_DIR=/home/<your_username>/.docker/run
export PATH=/usr/bin:$PATH
export DOCKER_HOST=unix:///home/<your_username>/.docker/run/docker.sock

Save the file and exit the editor.

B. Startup Sequence (Required on Every WSL Launch)

Because your WSL environment is not using a fully managed systemd to start the rootless daemon automatically, you must execute the following two commands every time you open a new terminal:

Source the configuration: Activates the DOCKER_HOST environment variable in the current session.
```
source ~/.bashrc
```
Start the Rootless Daemon: Launches the user-level daemon in the background.
```
dockerd-rootless.sh &
```

C. Final Verification

Wait a few seconds after launching the daemon, then verify connectivity:

docker ps

The client will now connect to the stable, user-level daemon, resolving the persistent startup failures.

Posted in en-US | Tags: Docker, Jonathan Lalou, WSL | No Comments »

Cloudflare WARP vs. Traditional VPN: A Deep Dive into Identity vs. Optimization

Author: Jonathan Lalou

In the landscape of digital security, both Cloudflare’s WARP and a Virtual Private Network (VPN) offer encrypted tunnels for internet traffic. However, their primary objectives are fundamentally different. WARP is an optimization and security layer built on speed, while a traditional VPN is a tunneling tool built for anonymity and location masking. Understanding this distinction is crucial for choosing the right tool for your specific needs.

What is Cloudflare WARP?

Cloudflare WARP is a proprietary application built on the company’s global network backbone, utilizing the fast, modern WireGuard protocol (or its Rust implementation, BoringTun).

Encryption & Security: It encrypts all traffic leaving your device, protecting your data and DNS queries from your local Internet Service Provider (ISP) or third-party snoopers on unencrypted public Wi-Fi networks.
Performance & Reliability: WARP routes traffic over Cloudflare’s optimized network, aiming to reduce latency and improve browsing speed by avoiding internet congestion, particularly with its premium WARP+ service.

The key philosophical distinction is that WARP is designed for people who want better internet, not necessarily a new digital identity.

The Core Difference: Identity vs. Optimization

The confusion arises because both technologies create an encrypted tunnel. However, a VPN’s tunnel always terminates in a remote, user-selected geographic location to mask identity, whereas WARP’s tunnel terminates at the nearest Cloudflare edge for maximum speed.

Primary Goals and Identity Masking

The core purpose of Cloudflare WARP is securing internet connections and improving speed. Conversely, a Traditional VPN is designed for privacy, anonymity, and bypassing geo-restrictions.

When it comes to IP address masking, traditional VPNs are highly effective, as they change your public IP address to that of the remote VPN server. While WARP does provide a Cloudflare IP address, it is typically localized and positioned near your actual physical location (e.g., in the same city or region). It does not conceal your country of origin. WARP is ineffective for true anonymity because it does not fully disguise your IP address.

Geographical Access and Control

The difference in goal leads to a major divergence in functionality regarding geo-blocking:

Geo-Unblocking: Traditional VPNs are effective at bypassing geo-restrictions because they allow the user to manually select servers in dozens of different countries, making the traffic appear to originate from that location. In contrast, WARP is ineffective for this purpose; since the exit location is automatically selected for performance, it cannot be used to circumvent geographical blocks on streaming services or localized content.
Server Selection: A traditional VPN gives users manual control over selecting the server location. WARP offers automatic server selection, connecting you only to the nearest, fastest Cloudflare data center.

Conclusion: Which One Should You Use?

WARP and VPNs are complementary tools serving different security objectives:

Choose WARP If: Your primary goals are to encrypt your traffic on public Wi-Fi, prevent your ISP from tracking your DNS queries and browsing habits, and potentially improve connectivity performance. WARP is excellent for general, everyday secure browsing.
Choose a Traditional VPN If: Your requirements include anonymity (hiding your country or city), bypassing geo-restrictions for streaming services (like foreign Netflix libraries), evading government censorship, or P2P file sharing.

Posted in en-US | Tags: Jonathan Lalou, VPN, WARP | No Comments »

🛑 DNS Hijacked? Why Your Windows Network Settings Keep Changing to `127.0.2.2` and `127.0.2.3`

Author: Jonathan Lalou

If you’ve manually set a specific DNS server (like 10.0.0.1 or 8.8.8.8) only to find it automatically revert to 127.0.2.2 and 127.0.2.3 after a reboot or network event, your system is not broken—it’s being actively managed by a third-party application.

This behavior is a very strong indicator that specialized security, VPN, or filtering software is running on your system, forcing all DNS queries through a local proxy for protection or routing purposes.

🔍 What Does 127.0.2.2 and 127.0.2.3 Actually Mean?

These addresses are intentionally set by a specific type of software and are not standard addresses distributed by your router.

Loopback Addresses: The entire 127.0.0.0/8 range (from 127.0.0.1 up to 127.255.255.255) is reserved for loopback or localhost. Any traffic sent to these addresses never leaves your computer; it simply “loops back” to a service running on the same machine.
Local DNS Proxy: The applications that cause this create a specialized local DNS server (a proxy) that listens on these specific addresses on your Windows machine.
Forced Interception: By setting your network adapter’s DNS to these loopback IPs, the software ensures that every single DNS request is first intercepted and processed by its local proxy before being securely forwarded over a tunnel (like a VPN) or filtered.
Reversion is Intentional: When you manually change the DNS, the controlling program detects the change and automatically reverts the settings to the 127.0.2.2 addresses to maintain control over your DNS traffic.

🚨 Common Culprits for this DNS Reversion

While any DNS-altering security application can cause this, the 127.0.2.2 and 127.0.2.3 addresses are particularly associated with the following categories of software:

Cloudflare WARP (or WARP+): This is the most common culprit. WARP uses these exact addresses to route your traffic through its secure DNS tunnel.
Web Filtering or Parental Control Software: Apps like CovenantEyes or corporate/school security clients often use a local DNS proxy to enforce content filtering or policy rules.
Advanced Antivirus/Security Suites: Some high-end security tools can install DNS-level protection to block malicious domains.
VPN Clients: Certain VPN clients may use a similar local DNS strategy to prevent DNS leaks.

🛠 How to Fix and Prevent the DNS Change

To successfully set your DNS to your desired address (like 10.0.0.1), you must first disable or completely remove the application that is actively controlling your DNS.

Solution 1: Identify and Disable the Application (The Primary Fix)

The quickest solution is to look for, pause, or quit the known conflicting software.

Check the System Tray: Look for icons related to Cloudflare WARP, VPN clients, or parental control apps. Disconnect or Exit the program entirely.
Use netstat to Find the Listener (Advanced):
1. Open PowerShell or Command Prompt as an Administrator.
2. Run the command: netstat -a -b
3. Review the output (which may take a moment) and look for a process name associated with UDP port 53 (the standard DNS port). The executable name will tell you exactly what service is running the local DNS proxy.

Solution 2: Perform a Clean Boot

If you can’t easily identify the program, performing a Clean Boot can help isolate it:

Press Windows Key + R, type msconfig, and press Enter.
Go to the Services tab, check the box for Hide all Microsoft services, and then click Disable all.
Go to the Startup tab, click Open Task Manager, and then Disable all non-Microsoft programs.
Restart your PC.
If the DNS settings no longer revert, you have confirmed that one of the disabled programs was the culprit. Re-enable them one by one (restarting after each) until the issue reappears to pinpoint the specific program.

Once the controlling application is disabled or uninstalled, you should be able to set and save your network adapter’s DNS address without it being automatically reverted.

Posted in Uncategorized | Tags: Cloudflare, Jonathan Lalou, Network, WARP, Windows | No Comments »

How to Backup and Restore All Docker Images with Gzip Compression

Author: Jonathan Lalou

TL;DR:
To back up all your Docker images safely, use docker save to export them and gzip to compress them.
Then, when you need to restore, use docker load to re-import everything.
Below you’ll find production-ready Bash scripts for automated backup and restore — complete with compression and error handling.

📦 Why You Need This

Whether you’re upgrading your system, cleaning your Docker environment, or migrating to another host, exporting your local images is crucial. Docker’s built-in commands make this possible, but using them manually for dozens of images can be tedious and space-inefficient.
This article provides automated scripts that will:

Backup every Docker image individually,
Compress each file with gzip for storage efficiency,
Restore all images automatically with a single command.

🧱 Backup Script (`backup_docker_images.sh`)

The script below exports all Docker images, one by one, into compressed .tar.gz files.
Each image gets its own archive, named after its repository and tag.

#!/bin/bash
# --------------------------------------------
# Backup all Docker images into compressed .tar.gz files
# --------------------------------------------

BACKUP_DIR=~/docker-backup
mkdir -p "$BACKUP_DIR"
cd "$BACKUP_DIR" || exit 1

echo "📦 Starting Docker image backup..."
echo "Backup directory: $BACKUP_DIR"
echo

for image in $(docker image ls --format "{{.Repository}}:{{.Tag}}"); do
  # sanitize file name
  safe_name=$(echo "$image" | tr '/:' '__')
  outfile="${safe_name}.tar"
  gzfile="${outfile}.gz"

  echo "🟢 Saving $image → $gzfile"

  # Save and compress directly (no uncompressed tar left behind)
  docker save "$image" | gzip -c > "$gzfile"

  if [ $? -eq 0 ]; then
    echo "✅ Successfully saved $image"
  else
    echo "❌ Error saving $image"
  fi
  echo
done

echo "🎉 Backup complete!"
ls -lh "$BACKUP_DIR"/*.gz

💡 What This Script Does

Creates a ~/docker-backup directory automatically.
Iterates over every local Docker image.
Uses docker save piped to gzip for direct compression.
Prints friendly success and error messages.

Result: You’ll get a set of compressed files like:

jonathan-tomcat__latest.tar.gz
jonathan-mysql__latest.tar.gz
jonathan-grafana__latest.tar.gz
...

🔁 Restore Script (`restore_docker_images.sh`)

This companion script automatically restores every compressed image. It detects both .tar.gz and .tar files in the backup directory, decompresses them, and loads them back into Docker.

#!/bin/bash
# --------------------------------------------
# Restore all Docker images from .tar.gz or .tar files
# --------------------------------------------

BACKUP_DIR=~/docker-backup
cd "$BACKUP_DIR" || { echo "❌ Backup directory not found: $BACKUP_DIR"; exit 1; }

echo "🚀 Starting Docker image restore from $BACKUP_DIR"
echo

for file in *.tar.gz *.tar; do
  [ -e "$file" ] || { echo "No backup files found."; exit 0; }

  echo "🟡 Loading $file..."
  if [[ "$file" == *.gz ]]; then
    gunzip -c "$file" | docker load
  else
    docker load -i "$file"
  fi

  if [ $? -eq 0 ]; then
    echo "✅ Successfully loaded $file"
  else
    echo "❌ Error loading $file"
  fi
  echo
done

echo "🎉 Restore complete!"
docker image ls

💡 How It Works

Automatically detects .tar.gz or .tar backups.
Decompresses each one and loads it into Docker.
Prints progress updates as it restores each image.

After running it, your local Docker environment will look exactly like before — same repositories, tags, and image IDs.

⚙️ How to Use

1️⃣ Backup All Docker Images

chmod +x backup_docker_images.sh
./backup_docker_images.sh

You’ll see a live summary of each image as it’s saved and compressed.

2️⃣ Restore Later (After a Prune or Reinstall)

chmod +x restore_docker_images.sh
./restore_docker_images.sh

Docker will reload each image automatically, maintaining all original metadata.

💾 Bonus: Cleaning and Rebuilding Your Docker Environment

If you want to clear all Docker data before restoring your images, run:

docker system prune -a --volumes

⚠️ Warning: This deletes all containers, images, networks, and volumes.
Afterward, simply run the restore script to bring your images back.

🧠 Why Use Gzip?

Docker image archives are often large — several gigabytes each. Compressing them with gzip:

Saves 30–70% of disk space,
Speeds up transfers (especially over SSH),
Keeps the backups cleaner and easier to manage.

You can still restore them directly with gunzip -c file.tar.gz | docker load — no decompression step required.

✅ Summary Table

Task	Command	Description
Backup all images (compressed)	`./backup_docker_images.sh`	Creates one `.tar.gz` per image
Restore all images	`./restore_docker_images.sh`	Loads back each saved archive
Prune all Docker data	`docker system prune -a --volumes`	Clears everything before restore

🚀 Conclusion

Backing up your Docker images is a crucial part of any development or CI/CD workflow. With these two scripts, you can protect your local Docker environment from accidental loss, disk cleanup, or reinstallation.
By combining docker save and gzip, you ensure both efficiency and recoverability — making your Docker workstation fully portable and disaster-proof.

Keep calm and backup your containers 🐳💾

Posted in en-US | Tags: Docker, Jonathan Lalou | No Comments »

⚙️ How to Fix Missing User Setup in Ubuntu 22 on WSL2 (Windows 11)

Author: Jonathan Lalou

TL;DR:

If you installed Ubuntu 22.04 on Windows 11 using WSL2 and the system never prompted you to create a user, it means the initial setup script did not run correctly. You are now logging in as root directly. To fix this, manually create your user, grant it sudo rights, and set it as the default login account:

sudo adduser jlalou
sudo usermod -aG sudo jlalou
ubuntu2204.exe config --default-user jlalou

After restarting WSL, you’ll log in as a normal user with administrator privileges.

🧩 Understanding the Issue

When you install Ubuntu from the Microsoft Store or via the wsl --install command, WSL normally runs a first-launch configuration script. That script asks for a username, sets a password, adds the user to the sudo group, and makes it the default account for future sessions.

If that welcome prompt never appeared, Ubuntu is skipping its initialization phase. This often happens when:

The first start was interrupted or closed prematurely.
The distribution was imported manually with wsl --import.
You started WSL as root before the setup script ran.

In this case, WSL falls back to the default root account, leaving no regular user configured.

✅ Step-by-Step Solution

1️⃣ Create Your User Manually

Launch your Ubuntu terminal (it will open as root), then create your desired user account:

adduser jlalou

Enter a password when prompted, and confirm the optional user details. Next, give this new account administrative privileges:

usermod -aG sudo jlalou

You can confirm the membership with:

grep jlalou /etc/group

If you see sudo listed among the groups, the user has been successfully added.

2️⃣ Make This User the Default Login Account

List your installed distributions:

wsl -l -v

You’ll see something like:

NAME            STATE           VERSION
* Ubuntu-22.04   Running         2

In PowerShell (or Command Prompt), set your new user as the default:

ubuntu2204.exe config --default-user jlalou

(The command name may vary slightly—use Get-Command *ubuntu* in PowerShell if you’re unsure.)

Close all Ubuntu windows and reopen WSL. You should now log in automatically as jlalou.

3️⃣ Verify Everything Works

Once inside the shell, check your identity and privileges:

whoami
# Expected output: jlalou

sudo ls /root
# Should prompt for your password and succeed

If both commands work, your configuration is complete.

🔁 Optional: Trigger the Initial Setup from Scratch

If you prefer to start over and allow Ubuntu’s built-in setup wizard to handle everything automatically, simply unregister and reinstall the distribution:

wsl --unregister Ubuntu-22.04
wsl --install -d Ubuntu-22.04

Upon first launch, Ubuntu will display:

Installing, this may take a few minutes...
Please create a default UNIX user account.

From there you can define your username and password normally.

🧠 Why This Happens

WSL integrates tightly with Windows, but when the initialization script fails, it bypasses Ubuntu’s user-creation process. This can occur when the image is imported, cloned, or restored without the metadata WSL expects. As a result, Ubuntu runs entirely as root, skipping all onboarding logic.

While this is convenient for testing, it’s not secure or practical for daily use. Running as a dedicated user with sudo access ensures safer file permissions, a more predictable environment, and compatibility with Ubuntu’s standard management tools.

🧾 Summary Table

Goal	Command
Create user	`adduser jlalou`
Grant sudo rights	`usermod -aG sudo jlalou`
Set as default login	`ubuntu2204.exe config --default-user jlalou`
Verify identity	`whoami` / `sudo echo ok`

🚀 Conclusion

Missing the initial user setup prompt in Ubuntu 22 under WSL2 can be confusing, but it’s easily corrected. Creating a dedicated user and assigning sudo privileges restores the intended WSL experience—secure, organized, and fully functional. Once configured, you can enjoy seamless integration between Windows 11 and Ubuntu, with the flexibility and power of both operating systems at your fingertips.

Posted in General | Tags: Jonathan Lalou, Linux, Ubuntu, WSL | No Comments »

Building Resilient Architectures: Patterns That Survive Failure

Author: Jonathan Lalou

How to design systems that gracefully degrade, recover quickly, and scale under pressure.

1) Patterns for Graceful Degradation

When dependencies fail, your system should still provide partial service. Examples:

Show cached product data if the pricing service is down.
Allow “read-only” mode if writes are failing.
Provide degraded image quality if the CDN is unavailable.

2) Circuit Breakers

Prevent cascading failures with Resilience4j or Hystrix:

@CircuitBreaker(name = "inventoryService", fallbackMethod = "fallbackInventory")
public Inventory getInventory(String productId) {
    return restTemplate.getForObject("/inventory/" + productId, Inventory.class);
}

public Inventory fallbackInventory(String productId, Throwable t) {
    return new Inventory(productId, 0);
}

3) Retries with Backoff

Retries should be bounded and spaced out:

@Retry(name = "paymentService", fallbackMethod = "fallbackPayment")
public PaymentResponse processPayment(PaymentRequest req) {
    return restTemplate.postForObject("/pay", req, PaymentResponse.class);
}

RetryConfig config = RetryConfig.custom()
    .maxAttempts(3)
    .waitDuration(Duration.ofMillis(200))
    .intervalFunction(IntervalFunction.ofExponentialBackoff(200, 2.0, 0.5)) // jitter
    .build();

4) Scaling Microservices in Kubernetes/ECS

Scaling is not just replicas—it’s smart policies:

Kubernetes HPA: Scale pods based on CPU or custom metrics (e.g., p95 latency).
```
kubectl autoscale deployment api --cpu-percent=70 --min=3 --max=10
```
ECS: Use Service Auto Scaling with CloudWatch alarms on queue depth.
Pre-warm caches: Scale up before big events (e.g., Black Friday).

Posted in en-US | Tags: CircuitBreaker, GracefulDegradation, Jonathan Lalou, SRE | No Comments »

Fixing the “Failed to Setup IP tables” Error in Docker on WSL2

Author: Jonathan Lalou

TL;DR:
If you see this error when running Docker on Windows Subsystem for Linux (WSL2):

ERROR: Failed to Setup IP tables: Unable to enable SKIP DNAT rule:
(iptables failed: iptables --wait -t nat -I DOCKER -i br-xxxx -j RETURN:
iptables: No chain/target/match by that name. (exit status 1))

👉 The cause is usually that your system is using the nftables backend for iptables, but Docker expects the legacy backend.
Switching iptables to legacy mode and restarting Docker fixes it:

sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy

Then restart Docker and verify:

sudo iptables -t nat -L

You should now see the DOCKER chain listed. ✅

🔍 Understanding the Problem

When Docker starts, it configures internal network bridges using iptables.
If it cannot find or manipulate its DOCKER chain, you’ll see this “Failed to Setup IP tables” error.
This problem often occurs in WSL2 environments, where the Linux kernel uses the newer nftables system by default, while Docker still relies on the legacy iptables interface.

In short:

iptables-nft (default in modern WSL2) ≠ iptables-legacy (expected by Docker)
The mismatch causes Docker to fail to configure NAT and bridge rules

⚙️ Step-by-Step Fix

1️⃣ Check which iptables backend you’re using

sudo iptables --version
sudo update-alternatives --display iptables

If you see something like iptables v1.8.x (nf_tables), you’re using nftables.

2️⃣ Switch to legacy mode

sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy

Confirm the change:

sudo iptables --version

Now it should say (legacy).

3️⃣ Restart Docker

If you’re using Docker Desktop for Windows:

wsl --shutdown
net stop com.docker.service
net start com.docker.service

or simply quit and reopen Docker Desktop.

If you’re running Docker Engine inside WSL:

sudo service docker restart

4️⃣ Verify the fix

sudo iptables -t nat -L

You should now see the DOCKER chain among the NAT rules:

Chain DOCKER (2 references)
target     prot opt source               destination
RETURN     all  --  anywhere             anywhere

If it appears — congratulations 🎉 — your Docker networking is fixed!

🧠 Extra Troubleshooting Tips

If the error persists, flush and rebuild the NAT table:

sudo service docker stop
sudo iptables -t nat -F
sudo iptables -t nat -X
sudo service docker start

Check kernel modules (for completeness):

lsmod | grep iptable
sudo modprobe iptable_nat

Keep Docker Desktop and WSL2 kernel up to date — many network issues are fixed in newer builds.

✅ Summary

Step	Command	Goal
Check backend	`sudo iptables --version`	Identify nft vs legacy
Switch mode	`update-alternatives --set ... legacy`	Use legacy backend
Restart Docker	`sudo service docker restart`	Reload NAT rules
Verify	`sudo iptables -t nat -L`	Confirm DOCKER chain exists

🚀 Conclusion

This “Failed to Setup IP tables” issue is one of the most frequent Docker-on-WSL2 networking errors.
The root cause lies in the nftables vs legacy backend mismatch — a subtle but critical difference in Linux networking subsystems.
Once you switch to the legacy backend and restart Docker, everything should work smoothly again.

By keeping your WSL2 kernel, Docker Engine, and iptables configuration aligned, you can prevent these issues and maintain a stable developer environment on Windows.

Happy containerizing! 🐋

Posted in en-US | Tags: Docker, Jonathan Lalou, WSL | No Comments »

SRE Principles: From Error Budgets to Everyday Reliability

Author: Jonathan Lalou

How to define, measure, and improve reliability with concrete metrics, playbooks, and examples you can apply this week.

In a world where users expect instant, uninterrupted access, reliability is a feature. Site Reliability Engineering (SRE) brings engineering discipline to operations with a toolkit built on error budgets, SLIs/SLOs, and automation. This post turns those ideas into specifics: exact metrics, alert rules, dashboards, code and infra changes, and a lightweight maturity model you can use to track progress.

1) What Is SRE Culture?

1.1 Error Budgets: A Contract Between Speed and Stability

An error budget is the amount of unreliability you are willing to tolerate over a period. It converts reliability targets into engineering freedom.

Example: SLO = 99.9% availability over 30 days → error budget = 0.1% unavailability.
Translation: Over 30 days (~43,200 minutes), you may “spend” up to 43.2 minutes of downtime before freezing risky changes.
Policy: If the budget is heavily spent (e.g., >60%), restrict deployments to reliability fixes until burn rate normalizes.

1.2 SLIs & SLOs: A Common Language

SLI (Service Level Indicator) is a measured metric; SLO (Service Level Objective) is the target for that metric.

Domain	SLI (what we measure)	Example SLO (target)	Notes
Availability	% successful requests (non-5xx + within timeout)	99.9% over 30 days	Define failure modes clearly (timeouts, 5xx, dependency errors).
Latency	p95 end-to-end latency (ms)	≤ 300 ms (p95), ≤ 800 ms (p99)	Track server time and total time (incl. downstream calls).
Error Rate	Failed / total requests	< 0.1% rolling 30 days	Include client-cancel/timeouts if user-impacting.
Durability	Data loss incidents	0 incidents / year	Backups + restore drills must be part of policy.

1.3 Automation Over Manual Ops

Automated delivery: CI/CD with canary or blue–green, automated rollback on SLO breach.
Self-healing: Readiness/liveness probes; restart on health failure; auto-scaling based on SLI-adjacent signals (e.g., queue depth, p95 latency).
Runbooks & ChatOps: One-click actions (flush cache keyspace, rotate credentials, toggle feature flag) with audit trails.

2) How Do You Measure Reliability?

2.1 Availability (“The Nines”)

SLO	Max Downtime / Year	Per 30 Days
99.0%	~3d 15h	~7h 12m
99.9%	~8h 46m	~43m
99.99%	~52m 34s	~4m 19s
99.999%	~5m 15s	~26s

2.2 Latency (Percentiles, Not Averages)

Track p50/p90/p95/p99. Averages hide tail pain. Tie your alerting to user-impacting percentiles.

API example: p95 ≤ 300 ms, p99 ≤ 800 ms during business hours; relaxed after-hours SLOs if business permits.
Queue example: p99 time-in-queue ≤ 2s; backlog < 1,000 msgs for >99% of intervals.

2.3 Error Rate

Define “failed” precisely: HTTP 5xx, domain-level errors (e.g., “payment declined” may be success from a platform perspective but failure for a specific business flow—track both).

2.4 Example SLI Formulas

# Availability SLI
availability = successful_requests / total_requests

# Latency SLI
latency_p95 = percentile(latency_ms, 95)

# Error Rate SLI
error_rate = failed_requests / total_requests

2.5 SLO-Aware Alerting (Burn-Rate Alerts)

Alert on error budget burn rate, not just raw thresholds.

Fast burn: 2% budget in 1 hour → page immediately (could exhaust daily budget).
Slow burn: 10% budget in 24 hours → open a ticket, investigate within business hours.

3) How Do You Improve Reliability?

3.1 Code Fixes (Targeted, Measurable)

Database hot paths: Add missing index, rewrite N+1 queries, reduce chatty patterns; measure p95 improvement before/after.
Memory leaks: Fix long-lived caches, close resources; verify with heap usage slope flattening over 24h.
Concurrency: Replace blocking I/O with async where appropriate; protect critical sections with timeouts and backpressure.

3.2 Infrastructure Changes

Resilience patterns: circuit breaker, retry with jittered backoff, bulkheads, timeouts per dependency.
Scaling & HA: Multi-AZ / multi-region, min pod counts, HPA/VPA policies; pre-warm instances ahead of known peaks.
Graceful degradation: Serve cached results, partial content, or fallback modes when dependencies fail.

3.3 Observability Enhancements

Tracing: Propagate trace IDs across services; sample at dynamic rates during incidents.
Dashboards: One SLO dashboard per service showing SLI, burn rate, top 3 error classes, top 3 slow endpoints, dependency health.
Logging: Structure logs (JSON); include correlation IDs; ensure PII scrubbing; add request_id, tenant_id, release labels.

3.4 Reliability Improvement Playbook (Weekly Cadence)

Review SLO attainment & burn-rate charts.
Pick top 1–2 user-visible issues (tail latency spike, recurring 5xx).
Propose one code fix and one infra/observability change.
Deploy via canary; compare SLI before/after; document result.
Close the loop: update runbooks, tests, alerts.

4) Incident Response: From Page to Postmortem

4.1 During the Incident

Own the page: acknowledge within minutes; post initial status (“investigating”).
Stabilize first: roll back most recent release; fail over; enable feature flag fallback.
Collect evidence: time-bounded logs, key metrics, traces; snapshot dashboards.
Comms: update stakeholders every 15–30 minutes until stable.

4.2 After the Incident (Blameless Postmortem)

Facts first: timeline, impact, user-visible symptoms, SLIs breached.
Root cause: 5 Whys; include contributing factors (alerts too noisy, missing runbook).
Actions: 1–2 short-term mitigations, 1–2 systemic fixes; assign owners and due dates.
Learning: update tests, add guardrails (pre-deploy checks, SLO gates), improve dashboards.

5) Common Anti-Patterns (and What to Do Instead)

Anti-pattern: Alert on every 5xx spike → Do this: alert on SLO burn rate and user-visible error budgets.
Anti-pattern: One giant “golden dashboard” → Do this: concise SLO dashboard + deep-dive panels per dependency.
Anti-pattern: Manual runbooks that require SSH → Do this: ChatOps / runbook automation with audit logs.
Anti-pattern: Deploying without rollback plans → Do this: canary, blue–green, auto-rollback on SLO breach.
Anti-pattern: No load testing → Do this: regular synthetic load/chaos drills tied to SLOs.

6) A 30-Day Quick Start

Week 1: Define 2–3 SLIs and SLOs; publish error budget policy.
Week 2: Build SLO dashboard; create two burn-rate alerts (fast/slow).
Week 3: Add tracing to top 3 endpoints; implement circuit breaker + timeouts to the noisiest dependency.
Week 4: Run a game day (controlled failure); fix 2 gaps found; document runbooks.

7) Concrete Examples & Snippets

7.1 Example SLI Prometheus (pseudo-metrics)

# Availability SLI
sum(rate(http_requests_total{status=~"2..|3.."}[5m]))
/
sum(rate(http_requests_total[5m]))

# Error Rate SLI
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))

# Latency p95 (histogram)
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

7.2 Burn-Rate Alert (illustrative)

# Fast-burn: page if 2% of monthly budget is burned in 1 hour
# slow-burn: ticket if 10% burned over 24 hours
# (Use your SLO window and target to compute rates)

7.3 Resilience Config (Java + Resilience4j sketch)

// Circuit breaker + retry with jittered backoff
CircuitBreakerConfig cb = CircuitBreakerConfig.custom()
  .failureRateThreshold(50f)
  .waitDurationInOpenState(Duration.ofSeconds(30))
  .permittedNumberOfCallsInHalfOpenState(5)
  .slidingWindowSize(100)
  .build();

RetryConfig retry = RetryConfig.custom()
  .maxAttempts(3)
  .waitDuration(Duration.ofMillis(200))
  .intervalFunction(IntervalFunction.ofExponentialBackoff(200, 2.0, 0.2)) // jitter
  .build();

7.4 Kubernetes Health Probes

livenessProbe:
  httpGet: { path: /health/liveness, port: 8080 }
  initialDelaySeconds: 30
  periodSeconds: 10
readinessProbe:
  httpGet: { path: /health/readiness, port: 8080 }
  initialDelaySeconds: 10
  periodSeconds: 5

8) Lightweight SRE Maturity Model

Level	Practices	What to Add Next
Level 1: Awareness	Basic monitoring, ad-hoc on-call, manual deployments	Define SLIs/SLOs, create SLO dashboard, add canary deploys
Level 2: Control	Burn-rate alerts, incident runbooks, partial automation	Tracing, circuit breakers, chaos drills, auto-rollback
Level 3: Optimization	Error budget policy enforced, game days, automated rollbacks	Multi-region resilience, SLO-gated releases, org-wide error budgets

9) Sample Reliability OKRs

Objective: Improve checkout service reliability without slowing delivery.
- KR1: Availability SLO from 99.5% → 99.9% (30-day window).
- KR2: Reduce p99 latency from 1,200 ms → 600 ms at p95 load.
- KR3: Cut incident MTTR from 45 min → 20 min via runbook automation.
- KR4: Implement canary + auto-rollback for 100% of releases.

Conclusion

Reliability isn’t perfection—it’s disciplined trade-offs. By anchoring work to error budgets, articulating SLIs/SLOs that reflect user experience, and investing in automation, observability, and resilient design, teams deliver systems that users trust—and engineers love operating.

Next step: Pick one service. Define two SLIs and one SLO. Add a burn-rate alert and a rollback plan. Measure, iterate, and share the wins.

Posted in en-US | Tags: Jonathan Lalou, Prometheus, Resilience, SLA, SLI, SLO, SRE | No Comments »

“Android Application Development with Maven” by Patroklos Papapetrou and Jonathan Lalou, was published by Packt

Author: Jonathan Lalou

Abstract

I am glad and proud to announce the publication of “Android Application Development with Maven”, on March 15th 2015, by Packt.

Direct link: https://www.packtpub.com/apache-maven-dependency-management/book

Alternate locations: Amazon.com, Amazon.co.uk, Barnes & Noble.

On this occasion, I’d like to thank all Packt team for allowing me achieving this project.

What you will learn from this book

Integrate Maven with your favorite Android IDE
Install and configure Maven with your local development environment
Create the proper Maven structure for both standalone Android applications or applications that are part of a bigger project
Run unit tests using popular frameworks such as Robolectric and collect coverage information using Maven plugins
Configure a variety of different tools such as Robotium, Spoon, and Selendroid to run integration tests
Handle dependencies and different versions of the same application
Manage and automate the release process of your application inside/outside Google Play
Discover new tools such as Eclipse, IntelliJ IDEA/Android Studio, and NetBeans, which perfectly integrate with Maven and boost your productivity

In Detail

Android is an open source operating system used for smartphones and tablet computers. The Android market is one of the biggest and fastest growing platforms for application developers, with over a million apps uploaded every day.

Right from the beginning, this book will cover how to set up your Maven development environment and integrate it with your favorite IDE. By sequentially working through the steps in each chapter, you will quickly master the plugins you need for every phase of the Android development process. You will learn how to use Maven to manage and build your project and dependencies, automate your Android application testing plans, and develop and maintain several versions of your application in parallel. Most significantly, you will learn how to integrate your project into a complete factory.

Approach

Learn how to use and configure Maven to support all phases of the development of an Android application

Who this book is for

Android Application Development with Maven is intended for Android developers or devops engineers who want to use Maven to effectively develop quality Android applications. It would be helpful, but not necessary, if you have some previous experience with Maven.

Table of content

1: Beginning with the Basics
2: Starting the Development Phase
3: Unit Testing
4: Integration Testing
5: Android Flavors
6: Release Life Cycle and Continuous Integration
7: Other Tools and Plugins

Posted in en-US | Tags: Android Application Development with Maven, Jonathan Lalou, Packt | No Comments »

“Apache Maven Dependency Management” by Jonathan Lalou, was published by Packt

Author: Jonathan Lalou

Abstract

I am glad and proud to announce the publication of “Apache Maven Dependency Management”, by Packt.

Direct link: https://www.packtpub.com/apache-maven-dependency-management/book

Alternate locations: Amazon.com, Amazon.co.uk, Barnes & Noble.

On this occasion, I’d like to thank all Packt team for allowing me achieving this project.

What you will learn from this book

Learn how to use profiles, POM, parent POM, and modules
Increase build-speed and decrease archive size
Set, rationalize, and exclude transitive dependencies
Optimize your POM and its dependencies
Migrate projects to Maven including projects with exotic dependencies

In Detail

Managing dependencies in a multi-module project is difficult. In a multi-module project, libraries need to share transitive relations with each other. Maven eliminates this need by reading the project files of dependencies to figure out their inter-relations and other related information. Gaining an understanding of project dependencies will allow you to fully utilize Maven and use it to your advantage.

Aiming to give you a clear understanding of Maven’s functionality, this book focuses on specific case studies that shed light on highly useful Maven features which are often disregarded. The content of this book will help you to replace homebrew processes with more automated solutions.

This practical guide focuses on the variety of problems and issues which occur during the conception and development phase, with the aim of making dependency management as effortless and painless as possible. Throughout the course of this book, you will learn how to migrate from non-Maven projects to Maven, learn Maven best practices, and how to simplify the management of multiple projects. The book emphasizes the importance of projects as well as identifying and fixing potential conflicts before they become issues. The later sections of the book introduce you to the methods that you can use to increase your team’s productivity. This book is the perfect guide to help make you into a proud software craftsman.

Approach

An easy-to-follow, tutorial-based guide with chapters progressing from basic to advanced dependency management.

Who this book is for

If you are working with Java or Java EE projects and you want to take advantage of Maven dependency management, then this book is ideal for you. This book is also particularly useful if you are a developer or an architect. You should be well versed with Maven and its basic functionalities if you wish to get the most out of this book.

Table of content

Preface
Chapter 1: Basic Dependency Management
Chapter 2: Dependency Mechanism and Scopes
Chapter 3: Dependency Designation (advanced)
Chapter 4: Migration of Dependencies to Apache Maven
Chapter 5: Tools within Your IDE
Chapter 6: Release and Distribute
Appendix: Useful Public Repositories
Index

Posted in en-US | Tags: Apache Maven Dependency Management, Jonathan Lalou, Packt | No Comments »

Posts Tagged ‘Jonathan Lalou’

1. The Problem Overview

2. Step-by-Step Resolution

Phase 1: Aggressive Cleanup and File Lock Removal

A. Forcefully Shut Down WSL 2

B. Identify and Rename the Corrupted Directory

Phase 2: Switch to Docker Rootless Mode

A. Install Prerequisites

B. Install the Rootless Daemon

Phase 3: Configure and Launch

A. Configure Shell Environment

B. Startup Sequence (Required on Every WSL Launch)

C. Final Verification

What is Cloudflare WARP?

The Core Difference: Identity vs. Optimization

Primary Goals and Identity Masking

Geographical Access and Control

Conclusion: Which One Should You Use?

🔍 What Does 127.0.2.2 and 127.0.2.3 Actually Mean?

🚨 Common Culprits for this DNS Reversion

🛠 How to Fix and Prevent the DNS Change

Solution 1: Identify and Disable the Application (The Primary Fix)

Solution 2: Perform a Clean Boot

📦 Why You Need This

🧱 Backup Script (backup_docker_images.sh)

💡 What This Script Does

🔁 Restore Script (restore_docker_images.sh)

💡 How It Works

⚙️ How to Use

1️⃣ Backup All Docker Images

2️⃣ Restore Later (After a Prune or Reinstall)

💾 Bonus: Cleaning and Rebuilding Your Docker Environment

🧠 Why Use Gzip?

✅ Summary Table

🚀 Conclusion

TL;DR:

🧩 Understanding the Issue

✅ Step-by-Step Solution

1️⃣ Create Your User Manually

2️⃣ Make This User the Default Login Account

3️⃣ Verify Everything Works

🔁 Optional: Trigger the Initial Setup from Scratch

🧠 Why This Happens

🧾 Summary Table

🚀 Conclusion

1) Patterns for Graceful Degradation

2) Circuit Breakers

3) Retries with Backoff

4) Scaling Microservices in Kubernetes/ECS

🔍 Understanding the Problem

⚙️ Step-by-Step Fix

1️⃣ Check which iptables backend you’re using

2️⃣ Switch to legacy mode

3️⃣ Restart Docker

4️⃣ Verify the fix

🧠 Extra Troubleshooting Tips

✅ Summary

🚀 Conclusion

1) What Is SRE Culture?

1.1 Error Budgets: A Contract Between Speed and Stability

1.2 SLIs & SLOs: A Common Language

1.3 Automation Over Manual Ops

2) How Do You Measure Reliability?

2.1 Availability (“The Nines”)

2.2 Latency (Percentiles, Not Averages)

2.3 Error Rate

2.4 Example SLI Formulas

2.5 SLO-Aware Alerting (Burn-Rate Alerts)

3) How Do You Improve Reliability?

3.1 Code Fixes (Targeted, Measurable)

3.2 Infrastructure Changes

3.3 Observability Enhancements

3.4 Reliability Improvement Playbook (Weekly Cadence)

4) Incident Response: From Page to Postmortem

4.1 During the Incident

4.2 After the Incident (Blameless Postmortem)

5) Common Anti-Patterns (and What to Do Instead)

6) A 30-Day Quick Start

7) Concrete Examples & Snippets

7.1 Example SLI Prometheus (pseudo-metrics)

🧱 Backup Script (`backup_docker_images.sh`)

🔁 Restore Script (`restore_docker_images.sh`)