Recent Posts
Archives

Posts Tagged ‘SoftwareEngineering’

PostHeaderIcon [VoxxedDaysTicino2026] Backlog.md: The Simplest Project Management Tool for the AI Era

Lecturer

Alex Gavrilescu is a full-stack developer with extensive experience in .NET and Vue.js technologies. He has been actively involved in software development for many years and has shifted his focus toward artificial intelligence since last year. Alex developed Backlog.md as a side project starting from the end of May 2025, while maintaining a full-time role in the casino industry. He shares insights through blog articles on platforms like LinkedIn and X (formerly Twitter). Relevant links include his LinkedIn profile (https://www.linkedin.com/in/alex-gavrilescu/) and X account (https://x.com/alexgavrilescu).

Abstract

This article examines Alex Gavrilescu’s presentation on his journey in AI-assisted software development and the creation of Backlog.md, a terminal-based project management tool designed to enhance predictability and structure in workflows involving AI agents. Drawing from personal experiences, the discussion analyzes the evolution from unstructured prompting to a systematic approach, emphasizing task decomposition, context management, and delegation modes. It explores the tool’s features, limitations, and implications for spec-driven AI development, highlighting how such methodologies foster deterministic outcomes in non-deterministic AI environments.

Context of AI Integration in Development Workflows

In the evolving landscape of software engineering, the integration of artificial intelligence agents has transformed traditional practices. Alex begins by contextualizing his experiences, noting the shift from basic code completions in integrated development environments (IDEs) like Visual Studio’s IntelliSense, which relied on simple machine learning or pattern matching, to more advanced tools. The advent of models like ChatGPT allowed developers to query and incorporate code snippets, reducing friction but still requiring manual transfers.

The introduction of GitHub Copilot marked a significant advancement, embedding AI directly into IDEs for contextual queries and modifications. However, the true leap came with agent modes, where AI operates in a loop, utilizing tools and gathering context autonomously until task completion. Alex distinguishes between “steer mode,” where developers iteratively guide AI through prompts and approvals, and “delegate mode,” where comprehensive instructions are provided upfront for independent execution. His focus leans toward delegation, aiming for reliable outcomes without constant intervention.

This context is crucial as AI models are inherently non-deterministic, yielding varied results from identical prompts. Alex draws parallels to human collaboration, where structured information—clarifying the “why,” “what,” and “how”—ensures success. He references practices like Gherkin scenarios (given-when-then) but simplifies them to acceptance criteria and definitions of done, adapting them for AI efficiency. Early challenges, such as limited context windows in models like those from May 2025, necessitated task breakdown to avoid information loss during compaction.

The implications are profound: unstructured AI use often leads to abandonment, as complexity escalates failure rates. Alex classifies developers into categories like “vibe coders” (improvisational prompting without code review) and “AI product managers” (structured delegation with final reviews), illustrating how his journey from near-abandonment to 95% success stemmed from imposing structure.

Development and Features of Backlog.md

Backlog.md emerged as Alex’s solution to the limitations of manual task structuring. Initially, he created tasks in Markdown files, logging them in Git repositories for sharing and history. This allowed referencing between tasks, scoping to prevent derailment, and assigning tasks to specialized agents (e.g., Opus for UI, Codex for backend). By avoiding database or API dependencies, agents could directly read files, enhancing efficiency.

The tool formalizes this into a command-line interface (CLI) resembling Git commands: backlog task create, edit, list. Tasks are stored as Markdown with a front-matter section for metadata (title, ID, dependencies, status). Sections include “why” for problem context, acceptance criteria with checkboxes for self-verification, implementation plans generated by agents, and notes/summaries for pull request descriptions.

Backlog.md supports subtasks, dependencies (e.g., “relates to” or “blocked by”), and a web interface for easier editing, including rich text and dark mode. It operates offline, uses Git for synchronization across branches, and avoids conflicts by leveraging repository permissions for security. Notably, 99% of its code was AI-generated, with Alex reviewing initial tasks, demonstrating the tool’s recursive utility.

Limitations include no direct task initiation from the interface, self-hosting requirements, single-repo support, experimental documentation/decisions sections, and absent integrations like GitHub Issues or Jira. As a solo side project, it lacks production-grade support, but welcomes community contributions via issues or pull requests.

In practice, Alex showcases Backlog.md in a live demo for spec-driven development. Starting with a product requirements document (PRD) generated by an agent like Claude, tasks are decomposed. Implementation plans are reviewed per task to adapt to changes, ensuring accuracy. Sub-agents orchestrate parallel planning, with human checkpoints at description, plan, and code stages.

Methodological Implications for Spec-Driven AI Development

Spec-driven AI development, as outlined, requires clear intent expression before execution. Backlog.md facilitates this by breaking projects into manageable tasks, delegating to agents for research, planning, and coding. A feedback loop refines agent instructions, specs, and processes.

Alex’s workflow begins with PRD creation, followed by task decomposition adhering to Backlog.md guidelines. Agents generate plans only upon task start, preventing obsolescence. For a task-scheduling feature, he demonstrates PRD prompting, task creation, and sub-agent orchestration for plans, emphasizing acceptance criteria for verification.

The methodology promotes one-task-per-context-window sessions, referencing summaries to avoid bloat. Definitions of done, global across projects, enforce testing, linting, and security checks. This counters “vibe coding’s” directional uncertainty, ensuring guardrails like unit tests prevent premature completion claims.

Implications extend to project readiness: documentation for agent onboarding mirrors human processes, with skills, code styles, and self-verification loops enhancing efficiency. Alex references a Factory.ai article on AI-ready maturity levels, underscoring documentation’s role.

Challenges persist in UI verification, requiring human QA, and complex integrations. Yet, the approach allows iterations without full restarts, leveraging cheap tokens for refinements.

Consequences and Future Directions

Backlog.md’s simplicity yields repeatability, boosting success from 50% (slot-machine-like prompting) to 95%. By structuring delegation, it mitigates AI’s non-determinism, fostering predictable workflows. Consequences include democratized AI use—no prior experience needed beyond basic Git—potentially broadening adoption.

For teams, Git synchronization enables collaboration, though self-hosting limits non-technical access. Future enhancements might include multi-repo support, integrations, and improved documentation, driven by its 4,600 GitHub stars and community feedback.

Broader implications question AI’s role: accepting “good enough” results accelerates development, but human input remains vital for steering and verification. As models improve (e.g., Opus 5.6’s million-token window), tools like Backlog.md evolve, but foundational structure endures.

In conclusion, Alex’s tool and methodology exemplify pragmatic AI integration, balancing innovation with reliability in an era where agents redefine development.

Links:

PostHeaderIcon [DevoxxBE2025] Behavioral Software Engineering

Lecturer

Mario Fusco is a Senior Principal Software Engineer at Red Hat, where he leads the Drools project, a business rules management system, and contributes to initiatives like LangChain4j. As a Java Champion and open-source advocate, he co-authored “Java 8 in Action” with Raoul-Gabriel Urma and Alan Mycroft, published by Manning. Mario frequently speaks at conferences on topics ranging from functional programming to domain-specific languages.

Abstract

This examination draws parallels between behavioral economics and software engineering, highlighting cognitive biases that distort rational decision-making in technical contexts. It elucidates key heuristics identified by economists like Daniel Kahneman and Amos Tversky, situating them within engineering practices such as benchmarking, tool selection, and architectural choices. Through illustrative examples of performance evaluation flaws and hype-driven adoptions, the narrative scrutinizes methodological influences on project outcomes. Ramifications for collaborative dynamics, innovation barriers, and professional development are explored, proposing mindfulness as a countermeasure to enhance engineering efficacy.

Foundations of Behavioral Economics and Rationality Myths

Classical economic models presupposed fully efficient markets populated by perfectly logical agents, often termed Homo Economicus, who maximize utility through impeccable reasoning. However, pioneering work by psychologists Daniel Kahneman and Amos Tversky in the late 1970s challenged this paradigm, demonstrating that human judgment is riddled with systematic errors. Their prospect theory, for instance, revealed how individuals weigh losses more heavily than equivalent gains, leading to irrational risk aversion or seeking behaviors. This laid the groundwork for behavioral economics, which integrates psychological insights into economic analysis to explain deviations from predicted rational conduct.

In software engineering, a parallel illusion persists: the notion of the “Engeen,” an idealized practitioner who approaches problems with unerring logic and objectivity. Yet, engineers are susceptible to the same mental shortcuts that Kahneman and Tversky cataloged. These heuristics, evolved for quick survival decisions in ancestral environments, often mislead in modern technical scenarios. For example, the anchoring effect—where initial information disproportionately influences subsequent judgments—can skew performance assessments. An engineer might fixate on a preliminary benchmark result, overlooking confounding variables like hardware variability or suboptimal test conditions.

The availability bias compounds this, prioritizing readily recalled information over comprehensive data. If recent experiences involve a particular technology failing, an engineer might unduly favor alternatives, even if statistical evidence suggests otherwise. Contextualized within the rapid evolution of software tools, these biases amplify during hype cycles, where media amplification creates illusory consensus. Implications extend to resource allocation: projects may pursue fashionable solutions, diverting efforts from proven, albeit less glamorous, approaches.

Heuristics in Performance Evaluation and Tool Adoption

Performance benchmarking exemplifies how cognitive shortcuts undermine objective analysis. The availability heuristic leads engineers to overemphasize memorable failures, such as a vivid recollection of a slow database query, while discounting broader datasets. This can result in premature optimizations or misguided architectural pivots. Similarly, anchoring occurs when initial metrics set unrealistic expectations; a prototype’s speed on high-end hardware might bias perceptions of production viability.

Tool adoption is equally fraught. The pro-innovation bias fosters an uncritical embrace of novel technologies, often without rigorous evaluation. Engineers might adopt container orchestration systems like Kubernetes for simple applications, incurring unnecessary complexity. The bandwagon effect reinforces this, as perceived peer adoption creates social proof, echoing Tversky’s work on conformity under uncertainty.

The not-invented-here syndrome further distorts choices, prompting reinvention of wheels due to overconfidence in proprietary solutions. Framing effects alter problem-solving: the same requirement, phrased differently—e.g., “build a scalable service” versus “optimize for cost”—yields divergent designs. Examples from practice include teams favoring microservices for “scalability” when monolithic structures suffice, driven by availability of success stories from tech giants.

Analysis reveals these heuristics degrade quality: biased evaluations lead to inefficient code, while hype-driven adoptions inflate maintenance costs. Implications urge structured methodologies, such as A/B testing or peer reviews, to counteract intuitive pitfalls.

Biases in Collaborative and Organizational Contexts

Team interactions amplify individual biases, creating collective delusions. The curse of knowledge hinders communication: experts assume shared understanding, leading to ambiguous requirements or overlooked edge cases. Hyperbolic discounting prioritizes immediate deliverables over long-term maintainability, accruing technical debt.

Organizational politics exacerbate these: non-technical leaders impose decisions, as in mandating unproven tools based on superficial appeal. Sunk cost fallacy sustains failing projects, ignoring opportunity costs. Dunning-Kruger effect, where incompetence breeds overconfidence, manifests in unqualified critiques of sound engineering.

Confirmation bias selectively affirms preconceptions, dismissing contradictory evidence. In code reviews, this might involve defending flawed implementations by highlighting partial successes. Contextualized within agile methodologies, these biases undermine iterative improvements, fostering resistance to refactoring.

Implications for dynamics: eroded trust hampers collaboration, reducing innovation. Analysis suggests diverse teams dilute biases, as varied perspectives challenge assumptions.

Strategies to Mitigate Biases in Engineering Practices

Mitigation begins with awareness: educating on Kahneman’s System 1 (intuitive) versus System 2 (deliberative) thinking encourages reflective pauses. Structured decision frameworks, like weighted scoring for tool selection, counteract anchoring and availability.

For performance, blind testing—evaluating without preconceptions—promotes objectivity. Debiasing techniques, such as devil’s advocacy, challenge bandwagon tendencies. Organizational interventions include bias training and diverse hiring to foster balanced views.

In practice, adopting evidence-based approaches—rigorous benchmarking protocols—enhances outcomes. Implications: mindful engineering boosts efficiency, reducing rework. Future research could quantify bias impacts via metrics like defect rates.

In essence, recognizing human frailties transforms engineering from intuitive art to disciplined science, yielding superior software.

Links:

  • Lecture video: https://www.youtube.com/watch?v=Aa2Zn8WFJrI
  • Mario Fusco on LinkedIn: https://www.linkedin.com/in/mariofusco/
  • Mario Fusco on Twitter/X: https://twitter.com/mariofusco
  • Red Hat website: https://www.redhat.com/

PostHeaderIcon [DotJs2024] Becoming the Multi-armed Bandit

In the intricate ballet of software stewardship, where intuition waltzes with empiricism, resides the multi-armed bandit—a probabilistic oracle guiding choices amid uncertainty. Ben Halpern, co-founder of Forem and dev.to’s visionary steward, dissected this gem at dotJS 2024. A full-stack polymath blending code with community curation, Ben recounted its infusions across his odyssey—from parody O’Reilly covers viralizing memes to mutton-busting triumphs—framing bandits as bridges between artistic whimsy and scientific rigor, aligning devs with stakeholders in pursuit of optimal paths.

Ben’s prologue evoked dev.to’s genesis: Twitter-era jests birthing a creative agora, bandit logic A/B-testing post formats for engagement zeniths. The archetype—casino levers, pulls maximizing payouts—mirrors dev dilemmas: UI variants, feature rollouts, content cadences. Exploration probes unknowns; exploitation harvests proven yields. Ben advocated epsilon-greedy: baseline exploitation (1-ε pulls best arm), exploratory ventures (ε samples alternatives), ε tuning via Thompson sampling for contextual nuance.

Practical infusions abounded. Load balancing: bandit selects origins, favoring responsive backends. Feature flags: variants vie, metrics crown victors. Smoke tests: endpoint probes, failures demote. ML pipelines: hyperparameter hunts, models ascend via validation. Ben’s dev.to saga: title A/Bs, bandit-orchestrated, surfacing resonant headlines sans bias. Organizational strata: nascent projects revel in exploration—ideation fests yielding prototypes; maturity mandates exploitation—scaling victors, pruning pretenders. This lexicon fosters accord: explorers and scalers, once at odds, synchronize via phases, preempting pivots’ friction.

Caution tempered zeal: bandits thrive on voluminous outcomes, not trivial toggles; overzealous testing paralyzes. As AI cheapens variants—code gen’s bounty—feedback scaffolds intensify, bandits as arbiters ensuring quality amid abundance. Ben’s coda: wield judiciously, blending craft’s flair with datum’s discipline for endeavors audacious yet assured.

Algorithmic Essence and Variants

Ben unpacked epsilon-greedy’s equilibrium: 90% best-arm fealty, 10% novelty nudges; Thompson’s Bayesian ballet contextualizes. UCB (Upper Confidence Bound) optimism tempers regret, ideal for sparse signals—dev.to’s post tweaks, engagement echoes guiding refinements.

Embeddings in Dev Workflows

Balancing clusters bandit-route requests; flags unleash cohorts, telemetry triumphs. ML’s parameter quests, smoke’s sentinel sweeps—all bandit-bolstered. Ben’s ethos: binary pass-fails sideline; array assays exalt, infrastructure for insight paramount.

Strategic Alignment and Prudence

Projects arc: explore’s ideation inferno yields scale’s forge. Ben bridged divides—stakeholder symposia in bandit vernacular—averting misalignment. Overreach warns: grand stakes summon science; mundane mandates art’s alacrity, future’s variant deluge demanding deft discernment.

Links:

PostHeaderIcon [DevoxxGR2025] Engineering for Social Impact

Giorgos Anagnostaki and Kostantinos Petropoulos, from IKnowHealth, delivered a concise 15-minute talk at Devoxx Greece 2025, portraying software engineering as a creative process with profound social impact, particularly in healthcare.

Engineering as Art

Anagnostaki likened software engineering to creating art, blending design and problem-solving to build functional systems from scratch. In healthcare, this creativity carries immense responsibility, as their work at IKnowHealth supports radiology departments. Their platform, built for Greece’s national imaging repository, enables precise diagnoses, like detecting cancer or brain tumors, directly impacting patients’ lives. This human connection fuels their motivation, transforming code into life-saving tools.

The Radiology Platform

Petropoulos detailed their cloud-based platform on Azure, connecting hospitals and citizens. Hospitals send DICOM imaging files and HL7 diagnosis data via VPN, while citizens access their medical history through a portal, eliminating CDs and printed reports. The system supports remote diagnosis and collaboration, allowing radiologists to share anonymized cases for second opinions, enhancing accuracy and speeding up critical decisions, especially in understaffed regions.

Technical Challenges

The platform handles 2.5 petabytes of imaging data annually from over 100 hospitals, requiring robust storage and fast retrieval. High throughput (up to 600 requests per minute per hospital) demands scalable infrastructure. Front-end challenges include rendering thousands of DICOM images without overloading browsers, while GDPR-compliant security ensures data privacy. Integration with national health systems added complexity, but the platform’s impact—illustrated by Anagnostaki’s personal story of his father’s cancer detection—underscores its value.

Links

PostHeaderIcon [DevoxxBE2024] Mayday Mark 2! More Software Lessons From Aviation Disasters by Adele Carpenter

At Devoxx Belgium 2024, Adele Carpenter delivered a gripping follow-up to her earlier talk, diving deeper into the technical and human lessons from aviation disasters and their relevance to software engineering. With a focus on case studies like Air France 447, Copa Airlines 201, and British Midlands 92, Adele explored how system complexity, redundancy, and human factors like cognitive load and habituation can lead to catastrophic failures. Her session, packed with historical context and practical takeaways, highlighted how aviation’s century-long safety evolution offers critical insights for building robust, human-centric software systems.

The Evolution of Aviation Safety

Adele began by tracing the rapid rise of aviation from the Wright Brothers’ 1903 flight to the jet age, catalyzed by two world wars and followed by a 20% annual growth in commercial air traffic by the late 1940s. This rapid adoption led to a peak in crashes during the 1970s, with 230 fatal incidents, primarily due to pilot error, as shown in data from planecrashinfo.com. However, safety has since improved dramatically, with fatalities dropping to one per 10 million passengers by 2019. Key advancements, like Crew Resource Management (CRM) introduced after the 1978 United Airways 173 crash, reduced pilot-error incidents by enhancing cockpit communication. The 1990s and 2000s saw further gains through fly-by-wire technology, automation, and wind shear detection systems, making aviation a remarkable engineering success story.

The Perils of Redundancy and Complexity

Using Air France 447 (2009) as a case study, Adele illustrated how excessive redundancy can overwhelm users. The Airbus A330’s three pitot tubes, feeding airspeed data to multiple Air Data Inertial Reference Units (ADIRUs), failed due to icing, causing the autopilot to disconnect and bombard pilots with alerts. In alternate law, without anti-stall protection, the less-experienced pilot’s nose-up input led to a stall, exacerbated by conflicting control inputs in the dark cockpit. This cascade of failures—compounded by sensory overload and inadequate training—resulted in 228 deaths. Adele drew parallels to software, recounting an downtime incident at Trifork caused by a RabbitMQ cluster sync issue, highlighting how poorly understood redundancy can paralyze systems under pressure.

Deadly UX and Consistency Over Correctness

Copa Airlines 201 (1992) underscored the dangers of inconsistent user interfaces. A faulty captain’s vertical gyro fed bad data, disconnecting the autopilot. The pilots, trained on a simulator where a switch’s “left” position selected auxiliary data, inadvertently set both displays to the faulty gyro due to a reversed switch design in the actual Boeing 737. This “deadly UX” caused the plane to roll out of the sky, killing all aboard. Adele emphasized that consistency in design—over mere correctness—is critical in high-stakes systems, as it aligns with human cognitive limitations, reducing errors under stress.

Human Factors: Assumptions and Irrationality

British Midlands 92 (1989) highlighted how assumptions can derail decision-making. Experienced pilots, new to the 737-400, mistook smoke from a left engine fire for a right engine issue due to a design change in air conditioning systems. Shutting down the wrong engine led to a crash beside a motorway, though 79 of 126 survived. Adele also discussed irrational behavior under stress, citing the Manchester Airport disaster (1984), where 55 died from smoke inhalation during an evacuation. Post-crash recommendations, like strip lighting and wider exits, addressed irrational human behavior in emergencies, offering lessons for software in designing for stressed users.

Habituation and Complacency

Delta Airlines 1141 (1988) illustrated the risks of habituation, where routine dulls vigilance. Pilots, accustomed to the pre-flight checklist, failed to deploy flaps, missing a warning due to a modified takeoff alert system. The crash after takeoff killed 14. Adele likened this to software engineers ignoring frequent alerts, like her colleague Pete with muted notifications. She urged designing systems that account for human tendencies like habituation, ensuring alerts are meaningful and workflows prevent complacency. Her takeaways emphasized understanding users’ cognitive limits, balancing redundancy with simplicity, and prioritizing human-centric design to avoid software disasters.

Links:

PostHeaderIcon [DevoxxFR2014] Git-Deliver: Streamlining Deployment Beyond Java Ecosystems

Lecturer

Arnaud Bétrémieux is a passionate developer with 18 years of experience, including 8 professionally, specializing in open-source technologies, GNU/Linux, and languages like Java, PHP, and Lisp. He works at Key Consulting, providing development, hosting, consulting, and expertise services. Sylvain Veyrié, with nearly a decade in Java platforms, serves as Director of Delivery at Transparency Rights Management, focusing on big data, and has held roles in development, project management, and training at Key Consulting.

Abstract

This article investigates git-deliver, a deployment tool leveraging Git’s integrity guarantees for simple, traceable, and atomic deployments across diverse languages. It dissects the tool’s mechanics, from remote setup to rollback features, and discusses customization via scripts and presets, emphasizing its role in replacing ad-hoc scripts in dynamic language projects.

Core Principles and Setup

Git-deliver emerges as a Bash script extending Git with a “deliver” subcommand, aiming for simplicity, reliability, efficiency, and universality in deployments. Targeting non-Java environments like Node.js, PHP, or Rails, it addresses the pitfalls of custom scripts that introduce risks in traceability and atomicity.

A deployment target equates to a Git remote over SSH. For instance, creating remotes for test and production environments involves commands like git remote add test deliver@test.example.fr:/appli and git remote add prod deliver@example.fr:/appli. Deliveries invoke git deliver <remote> <version>, where version can be a branch, commit SHA, or tag.

On the target server, git-deliver initializes a bare Git repository alongside a “delivered” directory containing clones for each deployment. Each clone includes Git metadata and a working copy checked out to the specified version. Symbolic links, particularly “current,” point to the latest clone, ensuring a fixed path for applications and atomic switches— the link updates instantaneously, avoiding partial states.

Directory names incorporate timestamps and abbreviated SHAs, facilitating quick identification of deployed versions. This structure preserves history, enabling audits and rollbacks.

Information Retrieval and Rollback Mechanisms

To monitor deployments, git-deliver offers a “status” option. Without arguments, it surveys all remotes, reporting the current commit SHA, tag if applicable, deployment timestamp, and deployer. It also verifies integrity, alerting to uncommitted changes that might indicate manual tampering.

Specifying a remote yields a detailed history of all deliveries, including directory identifiers. Additionally, git-deliver auto-tags each deployment in the local repository, annotating with execution logs and optional messages. Pushing these tags to a central repository shares deployment history team-wide.

Rollback supports recovery: git deliver rollback <remote> reverts to the previous version by updating the “current” symlink to the prior clone. For specific versions, provide the directory name. This leverages preserved clones, ensuring exact restoration even if files were altered post-deployment.

Customization and Extensibility

Deployments divide into stages (e.g., init-remote for first-time setup, post-symlink for post-switch actions), allowing user-provided scripts executed at each. For normal deliveries, scripts might install dependencies or migrate databases; for rollbacks, they handle reversals like database adjustments.

To foster reusability, git-deliver introduces “presets”—collections of stage scripts for frameworks like Rails or Flask. Dependencies between presets (e.g., Rails depending on Ruby) enable modular composition. The “init” command copies preset scripts into a .deliver directory at the project root, customizable and versionable via Git.

This extensibility accommodates varied workflows, such as compiling sources on-server for compiled languages, though git-deliver primarily suits interpreted ones.

Broader Impact on Deployment Practices

By harnessing Git’s push mechanics and integrity checks, git-deliver minimizes errors from manual interventions, ensuring deployments are reproducible and auditable. Its atomic nature prevents service disruptions, crucial for production environments.

While not yet supporting distributed deployments natively, scripts can orchestrate multi-server coordination. Future enhancements might incorporate remote groups for parallel pushes.

In production at Key Consulting, git-deliver demonstrates maturity beyond prototyping, offering a lightweight alternative to complex tools, promoting standardized practices across projects.

Links: