Recent Posts
Archives

Posts Tagged ‘AIAgents’

PostHeaderIcon [VoxxedDaysTicino2026] Backlog.md: The Simplest Project Management Tool for the AI Era

Lecturer

Alex Gavrilescu is a full-stack developer with extensive experience in .NET and Vue.js technologies. He has been actively involved in software development for many years and has shifted his focus toward artificial intelligence since last year. Alex developed Backlog.md as a side project starting from the end of May 2025, while maintaining a full-time role in the casino industry. He shares insights through blog articles on platforms like LinkedIn and X (formerly Twitter). Relevant links include his LinkedIn profile (https://www.linkedin.com/in/alex-gavrilescu/) and X account (https://x.com/alexgavrilescu).

Abstract

This article examines Alex Gavrilescu’s presentation on his journey in AI-assisted software development and the creation of Backlog.md, a terminal-based project management tool designed to enhance predictability and structure in workflows involving AI agents. Drawing from personal experiences, the discussion analyzes the evolution from unstructured prompting to a systematic approach, emphasizing task decomposition, context management, and delegation modes. It explores the tool’s features, limitations, and implications for spec-driven AI development, highlighting how such methodologies foster deterministic outcomes in non-deterministic AI environments.

Context of AI Integration in Development Workflows

In the evolving landscape of software engineering, the integration of artificial intelligence agents has transformed traditional practices. Alex begins by contextualizing his experiences, noting the shift from basic code completions in integrated development environments (IDEs) like Visual Studio’s IntelliSense, which relied on simple machine learning or pattern matching, to more advanced tools. The advent of models like ChatGPT allowed developers to query and incorporate code snippets, reducing friction but still requiring manual transfers.

The introduction of GitHub Copilot marked a significant advancement, embedding AI directly into IDEs for contextual queries and modifications. However, the true leap came with agent modes, where AI operates in a loop, utilizing tools and gathering context autonomously until task completion. Alex distinguishes between “steer mode,” where developers iteratively guide AI through prompts and approvals, and “delegate mode,” where comprehensive instructions are provided upfront for independent execution. His focus leans toward delegation, aiming for reliable outcomes without constant intervention.

This context is crucial as AI models are inherently non-deterministic, yielding varied results from identical prompts. Alex draws parallels to human collaboration, where structured information—clarifying the “why,” “what,” and “how”—ensures success. He references practices like Gherkin scenarios (given-when-then) but simplifies them to acceptance criteria and definitions of done, adapting them for AI efficiency. Early challenges, such as limited context windows in models like those from May 2025, necessitated task breakdown to avoid information loss during compaction.

The implications are profound: unstructured AI use often leads to abandonment, as complexity escalates failure rates. Alex classifies developers into categories like “vibe coders” (improvisational prompting without code review) and “AI product managers” (structured delegation with final reviews), illustrating how his journey from near-abandonment to 95% success stemmed from imposing structure.

Development and Features of Backlog.md

Backlog.md emerged as Alex’s solution to the limitations of manual task structuring. Initially, he created tasks in Markdown files, logging them in Git repositories for sharing and history. This allowed referencing between tasks, scoping to prevent derailment, and assigning tasks to specialized agents (e.g., Opus for UI, Codex for backend). By avoiding database or API dependencies, agents could directly read files, enhancing efficiency.

The tool formalizes this into a command-line interface (CLI) resembling Git commands: backlog task create, edit, list. Tasks are stored as Markdown with a front-matter section for metadata (title, ID, dependencies, status). Sections include “why” for problem context, acceptance criteria with checkboxes for self-verification, implementation plans generated by agents, and notes/summaries for pull request descriptions.

Backlog.md supports subtasks, dependencies (e.g., “relates to” or “blocked by”), and a web interface for easier editing, including rich text and dark mode. It operates offline, uses Git for synchronization across branches, and avoids conflicts by leveraging repository permissions for security. Notably, 99% of its code was AI-generated, with Alex reviewing initial tasks, demonstrating the tool’s recursive utility.

Limitations include no direct task initiation from the interface, self-hosting requirements, single-repo support, experimental documentation/decisions sections, and absent integrations like GitHub Issues or Jira. As a solo side project, it lacks production-grade support, but welcomes community contributions via issues or pull requests.

In practice, Alex showcases Backlog.md in a live demo for spec-driven development. Starting with a product requirements document (PRD) generated by an agent like Claude, tasks are decomposed. Implementation plans are reviewed per task to adapt to changes, ensuring accuracy. Sub-agents orchestrate parallel planning, with human checkpoints at description, plan, and code stages.

Methodological Implications for Spec-Driven AI Development

Spec-driven AI development, as outlined, requires clear intent expression before execution. Backlog.md facilitates this by breaking projects into manageable tasks, delegating to agents for research, planning, and coding. A feedback loop refines agent instructions, specs, and processes.

Alex’s workflow begins with PRD creation, followed by task decomposition adhering to Backlog.md guidelines. Agents generate plans only upon task start, preventing obsolescence. For a task-scheduling feature, he demonstrates PRD prompting, task creation, and sub-agent orchestration for plans, emphasizing acceptance criteria for verification.

The methodology promotes one-task-per-context-window sessions, referencing summaries to avoid bloat. Definitions of done, global across projects, enforce testing, linting, and security checks. This counters “vibe coding’s” directional uncertainty, ensuring guardrails like unit tests prevent premature completion claims.

Implications extend to project readiness: documentation for agent onboarding mirrors human processes, with skills, code styles, and self-verification loops enhancing efficiency. Alex references a Factory.ai article on AI-ready maturity levels, underscoring documentation’s role.

Challenges persist in UI verification, requiring human QA, and complex integrations. Yet, the approach allows iterations without full restarts, leveraging cheap tokens for refinements.

Consequences and Future Directions

Backlog.md’s simplicity yields repeatability, boosting success from 50% (slot-machine-like prompting) to 95%. By structuring delegation, it mitigates AI’s non-determinism, fostering predictable workflows. Consequences include democratized AI use—no prior experience needed beyond basic Git—potentially broadening adoption.

For teams, Git synchronization enables collaboration, though self-hosting limits non-technical access. Future enhancements might include multi-repo support, integrations, and improved documentation, driven by its 4,600 GitHub stars and community feedback.

Broader implications question AI’s role: accepting “good enough” results accelerates development, but human input remains vital for steering and verification. As models improve (e.g., Opus 5.6’s million-token window), tools like Backlog.md evolve, but foundational structure endures.

In conclusion, Alex’s tool and methodology exemplify pragmatic AI integration, balancing innovation with reliability in an era where agents redefine development.

Links:

PostHeaderIcon [DotAI2024] DotAI 2024: Grigorij Dudnik – Orchestrating AI Ensembles: Empowering Autonomous Application Assembly

Grigorij Dudnik, AI innovator and co-founder/CTO at Takżyli.pl—a poignant platform preserving legacies through memory profiles—unveiled the intricacies of agentic alliances at DotAI 2024. As the mind behind Clean Coder, an open-source scaffold for self-sustaining scriptcraft, Dudnik drew from December 2023’s dawn, when indolence ignited invention: a framework mirroring mortal makers, tasked with frontend forays or backend balms, sans solitary strife. His chronicle chronicled the crusade from crude constructs to collaborative crescendos, where agents alleviate authoring agonies, transforming textual toil into triumphant tapestries.

From Primal Prompts to Polished Pipelines: The Genesis of Guided Generation

Dudnik evoked the elemental era: AutoGen’s nascent node, a solitary sentinel scripting sans scaffolding—plain prose prone to pitfalls, oblivious to oversights or orthogonals. Agonies abounded: functions fractured mid-file, imports invoked in isolation, syntax’s specters unspotted. The antidote? Augmentation’s arsenal—linters as lieutenants, syntax sentinels summoning scrutiny; formatters as faithful forges, refining runes routinely.

Clean Coder crystallized this calculus: agents as artisans, armed with arsenals—Git’s granary for granular grapples, test harnesses for trial by fire. Dudnik delineated the duo: programmer’s prowess in prose-to-practice, tester’s tenacity in truth-seeking—each edict executed, examined, enshrined. Yet, entropy encroached: unchecked check-ins, unverified ventures—chaos in code’s cosmos.

Holistic harmony ensued: scopes sculpted slender—managers morphed to manifestos, task tomes tendered to ticketing titans. Frameworks ferried flows: automations absolving agents of ancillary acts, prompts pared to precision—singular summons yielding superb solutions. Dudnik’s dictum: delimit duties, delegate drudgery—multitasking’s mire mastered through modular mandates.

Elevating Ensembles: Holistic Horizons for Harmonious Handiwork

Dudnik delved deeper into delegation’s dividends: tools trimmed, automations amplified—agents unburdened, outputs outshining. Clean Coder’s canon: context’s clasp via .coderrules charters, RAG’s retrievals refining researches—performance’s pinnacle, where searches surge sans stagnation.

He heralded the handoff: tasks tallied, trials tendered, triumphs tabulated—framework’s fidelity ensuring fidelity. Dudnik’s startups stand sentinel: Takżyli.pl’s tapestry, woven with agentic weaves—code’s cadence conserved, creativity’s core conserved.

In invocation, Dudnik implored ideation: interrogate isolations—offload orthogonals, slenderize scopes—where agents ascend, authorship alchemized. QR’s quarry: Clean Coder’s quarry, stars as summons—forge frameworks, foster futures, where AI’s ease echoes artisans’ artistry.

Links:

PostHeaderIcon [DotJs2025] Modern Day Mashups: How AI Agents are Reviving the Programmable Web

Nostalgia’s glow recalls Web 2.0’s mashup mania—APIs alchemized into novelties, Google Maps wedding Craigslist for HousingMaps’ geospatial grace. Angie Jones, Block’s global VP of developer relations and 27-patent savant, resurrected this renaissance at dotJS 2025, heralding AI agents as programmable web’s phoenix via MCP (Model Context Protocol). An IBM Master Inventor turned educator, Angie’s odyssey—from virtual worlds to Azure’s principal—now orchestrates Goose, Block’s open-source agent, mashing MCPs for emergent enchantments.

Angie’s arc: 2000s’ closed gardens yielded to API avalanches—crime overlays, restaurant radars—yet silos stifled. AI’s advent: agents as conductors, LLMs querying MCPs—modular connectors to calendars, codebases, clouds. Goose’s genesis: MCP client, extensible via SDKs, wielding refs like filesystem fetches or GitHub grapples. Demos dazzled: Slack summons, Drive dossiers, all agent-autonomous—prompts birthing behaviors, mashups manifesting sans scaffolding.

MCP’s mosaic: directories like Glama AI’s report cards (security scores, license litmus), PostMCP’s popularity pulses, Block’s nascent registry—metadata-rich, versioned vaults. 2025’s swell: thousands tally, community curating—creators crafting custom conduits, from Figma flows to Figma fusions. Angie’s axiom: revive 2000s’ whimsy, amplified—productivity’s polish, creativity’s canvas—democratized by open forges.

This resurgence: agents as artisans, web as workshop—mash to manifest, share to spark.

Mashup’s Metamorphosis

Angie animated epochs: HousingMaps’ heuristic hacks to MCP’s modular might—agents querying conduits, emergent apps from elemental exchanges. Goose’s grace: SDK-spawned servers, refs routing realms—Slack’s summons, Drive’s deluge.

MCP’s Marketplace and Momentum

Directories discern: Glama’s grades, PostMCP’s pulses—Block’s beacon unifying. Thousands thrive, tinkerers tailoring—Figma to finance, fun’s frontier.

Links:

PostHeaderIcon [DotAI2024] DotAI 2024: Stanislas Polu – Tracing the Evolution of LLM Reasoning and Agency

Stanislas Polu, a trailblazing researcher and co-founder of Dust, offered a panoramic view of large language models’ ascent at DotAI 2024. With a background spanning Polytechnique, Stanford, and pivotal roles at Stripe and OpenAI—where he advanced mathematical reasoning in LLMs—Polu now steers Dust toward AI-augmented enterprise tools. His discourse framed the AI epoch as a societal phase shift, paralleling seismic transitions like agriculture or electrification, and dissected how LLMs’ cognitive prowess is reshaping work and innovation.

Societal Shifts Catalyzed by Emergent Intelligence

Polu likened the pre- to post-AI era to historical ruptures, pinpointing AlphaZero’s 2017 debut as the inflection. This system, ingesting mere rules to master Go and chess beyond human bounds, evoked extraterrestrial ingenuity—crunching simulations to forge strategies unattainable through rote play. ChatGPT’s 2022 emergence amplified this, birthing agents that orchestrate tasks autonomously, while recent milestones like an AI securing a bronze at the International Mathematical Olympiad signal prowess in abstract deduction.

These strides, Polu observed, provoke institutional ripples: Nobel nods to AI-driven physics and biology breakthroughs affirm computation’s ascendancy in discovery. Yet, deployment lags potential; in mid-2022, OpenAI’s revenues hovered in tens of millions, with scant workplace adoption. This chasm propelled Polu’s pivot from research to product, hypothesizing that interfaces, not algorithms, bottleneck utility.

Dust embodies this thesis, granting teams bespoke assistants attuned to proprietary data and actions. Unlike monolithic bots, specialized agents—narrowly scoped for tasks like query resolution or report synthesis—yield superior accuracy by mitigating retrieval noise and model hallucinations. Polu’s narrative stresses infrastructure’s role: plumbing data silos and action endpoints to empower models without exposing sensitivities.

Unlocking Workplace Transformation Through Tailored AI

At Dust’s core lies dual convictions: seamless enterprise integration and multiplicity of agents. The former demands robust pipes—secure data federation and API orchestration—while the latter champions modularity, where assistants evolve via iterative refinement, drawing from domain lore to eclipse generalists.

Polu recounted Dust’s genesis amid GPT’s hype, yet workplace AI remains nascent, mired in “pre-GPT” paradigms of siloed tools. His solution: hyper-focused agents that ingest contextual artifacts, execute workflows, and iterate on feedback loops. This architecture not only boosts efficacy but fosters emergent behaviors, like chaining assistants for complex pipelines.

Envision a sales team querying leads enriched by CRM insights, or engineers debugging via code-aware bots—scenarios where Dust’s agnosticism across models ensures longevity. Polu advocated starting small: automate a 30-minute drudgery with GPT or Dust, scaling from there. This pragmatic ethos, he contended, unlocks boundless augmentation, where AI amplifies human ingenuity rather than supplants it.

As enterprises grapple with AI’s dual-edged sword—efficiency gains versus integration hurdles—Polu’s blueprint charts a collaborative path. Dust’s trajectory, blending research rigor with product agility, heralds a workspace where intelligence permeates, propelling productivity into uncharted realms.

Links: