Recent Posts
Archives

Archive for the ‘en-US’ Category

PostHeaderIcon [NodeCongress2024] Bridging Runtimes: Advanced Testing Strategies for Cloudflare Workers with Vitest

Lecturer: Brendan Coll

Brendan Coll is a software engineer and key contributor to the Cloudflare Workers ecosystem. He is recognized as the creator of Miniflare, an open-source, fully-local simulator designed for the development and testing of Cloudflare Workers. His work focuses heavily on improving the developer experience for serverless and edge computing environments, particularly concerning local development, robust testing, and TypeScript integration. He has played a crucial role in leveraging and contributing to the open-source Workers runtime, workerd, to enhance performance and local fidelity.

Relevant Links:
* Cloudflare Author Profile: https://blog.cloudflare.com/author/brendan-coll/
* Cloudflare TV Discussion on Miniflare: https://cloudflare.tv/event/fireside-chat-with-brendan-coll-the-creator-of-miniflare/dgMlnqZD
* Cloudflare Developer Platform: https://pages.cloudflare.com/

Abstract

This article investigates the architectural methodology employed to integrate the Vitest testing framework, a Node.js-centric tool, with the Cloudflare Workers environment, which utilizes the custom workerd runtime. The analysis focuses on the development of a Custom Pool for process management, the fundamental architectural modifications required within workerd to support dynamic code evaluation, and the introduction of advanced developer experience features such as isolated per-test storage and declarative mocking. The integration serves as a significant case study in porting widely adopted testing standards to alternative serverless runtimes.

Custom Runtimes and the Vitest Testing Architecture

The Context of Alternative Runtimes

Cloudflare Workers operate on the workerd runtime, a V8-based environment optimized for high concurrency and low latency in a serverless, edge context. Developers interact with this environment locally through the Miniflare simulator and the Wrangler command-line interface. The objective of this methodology was to enable the use of Vitest, a popular Node.js testing library that typically relies on Node.js-specific primitives like worker threads, within the workerd runtime.

Methodology: Implementing the Custom Pool

The core innovation for this integration lies in the implementation of a Custom Pool within Vitest. Vitest typically uses pools (e.g., threads, forks) to manage the parallel execution of tests. The Cloudflare methodology replaced the standard Node.js thread management with a Custom Pool designed to orchestrate communication between the Node.js driver process (which runs Vitest itself) and the dedicated workerd process (where the actual Worker code executes).

This Custom Pool utilizes a two-way Inter-Process Communication (IPC) channel, typically established over sockets, to send test code, configuration, and receive results and logging from the isolated workerd environment.

Architectural Challenges: Dynamic Code Evaluation

A major architectural challenge arose from workerd‘s initial lack of support for dynamic code evaluation methods such as eval() or new Function(), which are essential for test runners like Vitest to process and execute test files dynamically.

The solution involved introducing a new primitive into the workerd runtime called the Module Inspector. This primitive enables the runtime to accept code dynamically and execute it as a module, thereby satisfying the requirements of the Vitest framework. This necessary modification to the underlying runtime highlights the complexity involved in making non-Node.js environments compatible with the Node.js testing ecosystem.

Enhanced Developer Experience (DX) and Test Isolation

The integration extends beyond mere execution compatibility by introducing features focused on improving testing ergonomics and isolation:

  1. Isolated Storage: The use of Miniflare enables hermetic, per-test isolation of all storage resources, including KV (Key-Value storage), R2 (Object storage), and D1 (Serverless Database). This is achieved by creating and utilizing a temporary directory for each test run, ensuring that no test can pollute the state of another, which is a fundamental requirement for reliable unit and integration testing.
  2. Durable Object Test Helpers: A specialized helper function, get and wait for durable object, was developed to simplify the testing of Durable Objects (Cloudflare’s stateful serverless primitive). This allows developers to interact with a Durable Object instance directly, treating it effectively as a standard JavaScript class for testing purposes.
  3. Declarative HTTP Mocking: To facilitate isolated testing of external dependencies, the methodology leverages the undici MockAgent for declarative HTTP request mocking. This system intercepts all outgoing fetch requests, using undici‘s DispatchHandlers to match and return mocked responses, thereby eliminating reliance on external network access during testing. The onComplete handler is utilized to construct and return a standard Response object based on the mocked data.

Links

PostHeaderIcon [DevoxxBE2024] Wired 2.0! Create Your Ultimate Learning Environment by Simone de Gijt

Simone de Gijt’s Devoxx Belgium 2024 session offered a neuroscience-informed guide to optimizing learning for software developers. Building on her Wired 1.0 talk, Simone explored how to retain knowledge amidst the fast-evolving tech landscape, including AI’s impact. Over 48 minutes, she shared strategies like chunking, leveraging emotional filters, and using AI tools like NotebookLM and Napkin to enhance learning. Drawing from her background as a speech and language therapist turned Java/Kotlin developer, she provided actionable techniques to create a focused, effective learning environment.

Understanding the Information Processing Model

Simone introduced the information processing model, explaining how sensory input filters through short-term memory to the working memory, where problem-solving occurs. Emotions act as a gatekeeper, prioritizing survival-related or emotionally charged data. Negative experiences, like struggling in a meeting, can attach to topics, discouraging engagement. Simone advised developers to ensure a calm state of mind before learning, as stress or emotional overload can block retention. She highlighted that 80% of new information is lost within 24 hours unless actively encoded, emphasizing the need for deliberate learning strategies.

Sense and Meaning: Foundations of Learning

To encode knowledge effectively, Simone proposed two key questions: “Do I understand it?” and “Why do I need to know it?” Understanding requires a foundational knowledge base; if lacking, developers should step back to build it. Relevance ensures the brain prioritizes information, making it memorable. For example, linking a conference talk’s concepts to immediate job tasks increases retention. Simone stressed focusing on differences rather than similarities when learning (e.g., distinguishing Java’s inheritance from polymorphism), as this aids retrieval by creating distinct mental cues.

Optimizing Retrieval Through Chunking

Retrieval relies on cues, mood, context, and storage systems. Simone emphasized “chunking” as a critical skill, where information is grouped into meaningful units. Senior developers excel at chunking, recalling code as structured patterns rather than individual lines, as shown in a study where seniors outperformed juniors in code recall due to better organization. She recommended code reading clubs to practice chunking, sharing a GitHub resource for organizing them. Categorical chunking, using a blueprint like advantages, disadvantages, and differences, further organizes knowledge for consistent retrieval across topics.

Timing and Cycles for Effective Learning

Simone discussed biological cycles affecting focus, noting a “dark hole of learning” post-midday when energy dips. She advised scheduling learning for morning or late afternoon peaks. The primacy-recency effect suggests splitting a learning session into three cycles of prime time (intense focus), downtime (reflection or breaks), and a second prime time. During downtime, avoid distractions like scrolling X, as fatigue amplifies procrastination. Instead, practice with new knowledge or take a walk to boost blood flow, enhancing retention by allowing the brain to consolidate information.

AI as a Learning Accelerator

Simone hypothesized that AI tools like ChatGPT, NotebookLM, and Napkin accelerate learning by providing personalized, accessible content but may weaken retrieval by reducing neural pathway reinforcement. She demonstrated using ChatGPT to plan a quantum computing session, dividing it into three blocks with reflection and application tasks. NotebookLM summarized sources into podcasts, while Napkin visualized concepts like process flows. These tools enhance engagement through varied sensory inputs but require critical thinking to evaluate outputs. Simone urged developers to train this skill through peer reviews and higher-order questioning, ensuring AI complements rather than replaces human judgment.

Links:

PostHeaderIcon [DefCon32] Listen to the Whispers: Web Timing Attacks that Actually Work

Timing attacks, long dismissed as theoretically potent yet practically elusive, gain new life through innovative techniques. James Kettle bridges the “timing divide,” transforming abstract concepts into reliable exploits against live systems. By amplifying signals and mitigating noise, Kettle unveils server secrets like masked misconfigurations, blind injections, hidden routes, and untapped attack surfaces.

Traditional hurdles—network jitter and server noise—once rendered attacks unreliable. HTTP/2’s concurrency, enhanced by Kettle’s single-packet method, synchronizes requests in one TLS record, eliminating jitter. Coalescing headers via sacrificial PING frames counters sticky ordering, making attacks “local” regardless of distance.

Server noise, from load variances to cloud virtualization, demands signal amplification: repeating headers for cumulative delays or denial-of-service tactics like nested XML entities. Repetition exploits caching, reducing variability; trimming requests minimizes unnecessary processing.

Parameter Discovery and Control Flow Insights

Kettle adapts Param Miner for time-based parameter/header guessing, uncovering hidden features on thousands of bug bounty sites. Timing reveals parameters altering responses subtly, like JSON-validated headers or cache keys signaling web cache poisoning risks.

Control flow changes, such as exceptions, emerge vividly. A Web Application Firewall (WAF) bypass exemplifies: repeated “exec” parameters trigger prolonged analysis, escalating to denial-of-service; excess parameters expose max-header limits, enabling evasion.

IP spoofing headers like “True-Client-IP” induce DNS caching delays, confirmed via pingbacks. Non-caching variants suggest third-party geo-lookups, bypassing with hostnames.

Server-Side Injection Vulnerabilities

Timing excels at blind injections in non-sleep-capable languages. Serde JSON injections manifest as microsecond differentials; combining with client-side reflections infers standalone processing, aiding exploitation.

Blind Serde parameter pollution contrasts reserved/unreserved characters, yielding exploits. Doppelgangers—non-blind equivalents—guide understanding, turning detections into impacts.

SQL injections via sleep evade WAFs but overlap existing tools; timing shines where sleep fails, though exploitation demands deep target insight.

Scoped Server-Side Request Forgery Detection

Overlooked for years, scoped SSRF—proxies accessing only target subdomains—defies DNS pingbacks. Timing detects via DNS caching or label-length timeouts: valid hostnames delay; invalids accelerate or prolong.

Automating exploration, Kettle probes subdomains directly and via proxies, flagging discrepancies like missing headers. Exploits span firewall bypasses, internal DNS resolutions uncovering staging servers, pre-launch consoles, and frontend circumventions.

Frontend impersonation leverages trusted internal headers for authentication bypasses, exploitable via proxies, direct backend access, or smuggling. Timing guesses header names, enabling severe breaches.

Links:

EN_DEFCON32MainStageTalks_004_005.md

PostHeaderIcon [DotJs2024] Adding Undo/Redo to Complex Web Apps

Enhancing user agency in labyrinthine applications hinges on forgiving interactions, where missteps dissolve with a keystroke. Bassel Ghandour, a senior software engineer at Contentsquare, distilled this essence into a brisk yet profound primer at dotJS 2024. From Paris’s vibrant tech scene—now his U.S. outpost—Ghandour lamented a botched virtual Tokyo greeting, swiftly invoking undo’s allure. His focus: retrofitting undo/redo into state-heavy web apps, sidestepping snapshot pitfalls for action-centric resilience.

Ghandour commenced with state management basics in a todo app: frontend mirrors app state, enter-press morphs it—additions, UI ripples. Naive undo? Timestamped snapshots, hopping between epochs. Reality intrudes: actions cascade side effects, backend ops interweave, concurrency clashes. Rapid todo barrages spawn interleaved sequences; snapshot reversion mid-thread invites chaos. Solution: encapsulate sequences under UUIDs, treating batches as atomic units. Parallel: forge inverses—add’s delete, toggle’s revert—mapping each to its antithesis.

This duality—do and undo in tandem—preserves fidelity. User crafts todo: UUID wraps creation, displays; inverse queues deletion. Subsequent show-toggle: nested UUID, inverse queued. Undo invokes the stack’s apex inverse, state reverts cleanly; redo replays forwards. Ghandour’s flow: capture actions, inverse-map, sequence-bundle, command-apply. Backend sync? Optimistic updates, rollbacks on failure. Contentsquare’s engineering blog details implementations, from Redux sagas to custom dispatchers.

Ghandour’s brevity belied depth: this pattern scales to e-commerce carts, design canvases, empowering serene navigation amid complexity. By prioritizing actions over states, developers liberate users, fostering trust in intricate digital environs.

Encapsulating Actions for Resilience

Ghandour advocated UUID-wrapped sequences, neutralizing concurrency: todo volleys become discrete do/undo pairs, applied reversibly. Inverse mapping—add-to-delete—ensures symmetry, backend integrations via optimistic commits. This sidesteps snapshot bloat, embracing flux with grace.

Implementing Undo/Redo Commands

Stacks manage history: push do with inverse, pop applies antithesis. Redo mirrors. Ghandour teased Contentsquare’s saga: Redux orchestration, UI hooks triggering cascades—scalable, testable, user-delighting.

Links:

PostHeaderIcon [DefCon32] Breaking Secure Web Gateways for Fun and Profit

Secure Web Gateways (SWGs), integral to enterprise Secure Access Service Edge (SASE) and Security Service Edge (SSE) frameworks, promise robust defenses against web threats. Vivek Ramachandran and Jeswin Mathai expose architectural flaws in these systems, introducing “Last Mile Reassembly Attacks” that evade detection across major vendors. Their findings underscore the limitations of network-level analysis in confronting modern browser capabilities.

SWGs intercept SSL traffic for malware scanning, threat prevention, URL filtering, and data loss prevention (DLP). Yet, as browsers evolve into sophisticated compute environments, attackers exploit client-side processing to reassemble threats post-proxy. Ramachandran highlights how SWGs lack context on DOM changes, events, and user interactions, operating blindly on flat traffic. Cloud constraints—file size limits (15-50 MB) and incomplete archive scanning—exacerbate vulnerabilities, often forcing blanket policies.

Vendors’ service level agreements (SLAs) claim 100% prevention of known malware, but these attacks shatter such guarantees. Pricing models ($2-4 per user/month) prioritize efficiency over exhaustive analysis, leaving gaps in protocol support and file handling.

Unmonitored Channels and Hiding in Plain Sight

Mathai demonstrates unmonitored protocols like WebRTC, WebSockets, gRPC, and Server-Sent Events smuggling malware undetected. These channels, essential for real-time apps, bypass interception; blocking them degrades user experience. Demos show seamless downloads of known malicious files via these vectors, indistinguishable from standard HTTP.

Further evasion involves embedding payloads in HTML, CSS, JavaScript, or SVG, extracting them client-side for reconstruction. SWGs scan individual resources but miss browser-side assembly. Encryption/decryption and encoding/decoding (e.g., Base64, UUencode) transform binaries in memory, dropping unencrypted files without triggering content disposition headers.

Last Mile Reassembly Techniques

Core to their research, Last Mile Reassembly fragments files into chunks—straight splits, reverses, randomized sizes, or mixes—fetched via multiple requests and reassembled via JavaScript. SWGs analyze fragments independently, failing to detect malice. Extending to WebAssembly modules constructing documents (e.g., malicious Excel) locally, no file download occurs from the proxy’s view.

File uploads reverse this: insiders fragment sensitive data, sending as form submissions evading DLP rules. Overlapping fragments mimic historical network attacks, fully bypassing inspections.

Phishing sites, converted to MHTML archives and smuggled via reassembly, repaint via canvas, reusing known malicious pages undetected. SWGs fingerprint server-side but overlook client-side rendering.

Architectural Challenges and Vendor Responses

SWGs’ server-side nature precludes real-time browser syncing or per-tab emulation, unscalable amid millions of events. Ramachandran argues for browser-integrated security to access rich data, contrasting cloud-centric models’ economic allure with practical failures.

Vendor engagements yielded mixed results: some acknowledged issues and pursued fixes; others claimed partial detection or disengaged. Open-sourcing 25 bypasses at browser.security empowers testing, urging vendors to address rather than block the site.

Their toolkit facilitates red-team simulations, exposing SLAs’ fragility. Enterprises must rethink web threat defenses, prioritizing client-side visibility over network proxies.

Links:

PostHeaderIcon [OxidizeConf2024] Exploring Slint as a Rust Alternative to QML for GUI Development

The Evolution of GUI Development

In the ever-evolving realm of graphical user interface (GUI) development, the quest for robust, safe, and efficient tools is paramount. David Vincze, a senior software engineer at Felgo, presented a compelling exploration of Slint as a Rust-based alternative to QML at OxidizeConf2024. With a background steeped in C++ and Qt, particularly in automotive instrument clusters, David shared his insights into how Slint offers a fresh perspective for developers accustomed to QML’s declarative approach within the Qt framework. His presentation illuminated the potential of Slint to address the limitations of QML, particularly its reliance on C++ and JavaScript, which can introduce runtime errors that challenge developers in safety-critical environments.

QML, a mature language with over a decade of use, has been a cornerstone for cross-platform GUI development, enabling developers to write a single codebase that runs on embedded, mobile, desktop, and web platforms. Its JSON-like syntax, coupled with reactive property bindings and JavaScript logic, simplifies prototyping and maintenance. However, David highlighted the inherent risks, such as performance bottlenecks due to JavaScript’s runtime interpretation and the dependency on the extensive Qt ecosystem, which can entail licensing costs. Slint, a newer toolkit built with Rust, emerges as a promising alternative, compiling to native code to catch errors at build time, thus enhancing reliability for embedded systems.

Comparing Slint and QML

David’s analysis centered on a practical comparison between Slint and QML, drawing from a demo weather application and home automation UI developed at Felgo. This project, available on Felgo’s GitHub as the Rusty Weather App, reimplemented a QML-based application in Slint, showcasing its multiplatform capabilities on desktop, embedded Raspberry devices, and Android. The comparison revealed striking similarities in their declarative syntax, with both languages using component-based structures to define UI elements. However, Slint’s code is notably more compact, and its components can be exported to native code, offering greater integration flexibility compared to QML’s C++-centric approach.

A key differentiator is Slint’s compilation to native code, which eliminates runtime errors common in QML’s JavaScript logic. This is particularly advantageous for embedded systems, where performance and reliability are critical. David demonstrated how Slint’s lightweight runtime and reactive property bindings mirror QML’s strengths but leverage Rust’s memory safety to prevent common programming errors. However, Slint lacks some of QML’s advanced features, such as multimedia support, 3D rendering, and automated UI testing, which are still in development. Despite these gaps, Slint’s rapid evolution, with frequent releases, signals its potential to rival QML in the future.

Challenges and Opportunities in Transitioning

Transitioning from QML to Slint presents both opportunities and challenges. David emphasized Slint’s benefits, including its integration with Rust’s ecosystem, which offers a robust package manager (Cargo) and seamless cross-compilation. The Slint VS Code extension, with its live preview feature, accelerates development by allowing real-time UI modifications without recompilation. This contrasts with QML’s reliance on tools like Qt Creator, which, while comprehensive, tie developers to the Qt ecosystem. Slint’s open-source nature and multi-language APIs (supporting Rust, C++, JavaScript, and Python) further enhance its appeal for diverse projects.

However, David acknowledged challenges, particularly in areas like internationalization, where Slint’s reliance on the Gettext library complicates translation processes compared to Qt’s well-established framework. Features like multi-window support and internal timers are also underdeveloped in Slint, posing hurdles for developers accustomed to QML’s mature ecosystem. Despite these, David advocated for Slint’s adoption in Rust-centric projects, citing its predictability and performance advantages, especially for embedded development. The community’s active development and planned UI testing support suggest that Slint’s limitations may soon be addressed, making it a compelling choice for forward-thinking developers.

Links:

PostHeaderIcon [DefCon32] Abusing Windows Hello Without a Severed Hand

In the realm of cybersecurity, exploring vulnerabilities in authentication mechanisms often reveals unexpected pathways for exploitation. Ceri Coburn and Dirk-jan Mollema delve into the intricacies of Windows Hello, Microsoft’s passwordless technology, highlighting how attackers can manipulate its components without relying on physical biometric data. Their presentation uncovers the architecture of Windows Hello, from key storage providers to protectors and keys, demonstrating real-world abuses that challenge the system’s perceived robustness.

Coburn begins by outlining the foundational elements of Windows Hello, emphasizing its role in generating keys for operating system logins, passkeys, and third-party applications. The distinction between Windows Hello and Windows Hello for Business lies primarily in the latter’s focus on certificate-based authentication for Active Directory environments. Both utilize key storage providers (KSPs), which serve as APIs for cryptographic operations. Traditional providers include software-based ones, TPM-backed platforms, and smart card integrations, but Windows Hello introduces the Passport KSP, acting as a proxy to these existing systems.

The Passport KSP comprises two services: the NGC service for application communication via RPC and the NGC controller service for metadata storage under the local service account, accessible only with system-level privileges. Each user enrollment creates a unique container folder identified by a GUID, housing protectors, key metadata, and recovery options. Protectors represent authentication methods like PINs or biometrics, encrypting intermediate PINs that unlock enrolled keys. These intermediate PINs—split into signing, decryption, and external variants—remain constant across protectors, allowing bypasses once accessed.

Unprivileged Attacks and Primary Refresh Tokens

Shifting focus, Mollema addresses attacks feasible without administrative privileges, centering on Primary Refresh Tokens (PRTs) in Windows Hello for Business scenarios. PRTs function as single sign-on tokens, requested via JSON Web Tokens (JWTs) signed by device certificates, ensuring trust from Entra (formerly Azure AD). When using Windows Hello, these requests incorporate data signed by private keys, including nonces to prevent replays.

A critical flaw arises from the ability to generate assertions without prompting for PINs or biometrics post-login, as keys are cached in sessions. Mollema demonstrates crafting “golden assertions” with extended validity, though Microsoft mitigated this by enforcing nonces server-side in May 2024. Nonetheless, within a five-minute window, attackers can request new PRTs on rogue devices, bypassing TPM protections and enabling persistence for up to 90 days.

This technique exploits RDP scenarios where PRTs on non-TPM devices expose credentials. Even with virtualization-based security or LSA protections, such attacks persist, underscoring the need for device compliance monitoring and restrictions on RDP to non-TPM systems.

Privileged Exploitation of Containers and Protectors

Under privileged access, Coburn dissects container structures, revealing metadata in .dat files detailing user SIDs, backing KSPs, and recovery keys. Protectors encrypt intermediate PINs differently: PIN protectors use PBKDF2 derivation for software KSPs or hex conversion for TPM unsealing. Biometric protectors, surprisingly, rely on system DPAPI keys, enabling reversal without actual biometrics via Vault decryption.

Recovery protectors, exclusive to business scenarios, involve Azure-encrypted blobs requiring MFA claims, yet their storage outside protector folders poses risks. Pre-boot and deprecated companion device protectors receive brief mentions, with further research needed.

Abuses include brute-forcing software-backed PINs via Hashcat masks, exploiting known lengths for rapid cracks—seconds for eight digits. TPM-backed PINs resist better, though four-digit variants succumb in months due to anti-hammering.

Key Types and Persistence Implications

Enrolled keys leverage intermediate PINs: vault keys decrypt local passwords in consumer setups, entry keys handle business enrollments and passkeys, and external keys support third-party apps like Okta FastPass. Software-backed keys allow extraction off-device, amplifying risks.

Mollema extends this to PRT theft, using cached keys for assertions on different devices, even without TPMs, facilitating identity persistence. Reported vulnerabilities led to CVE assignments, with server-side enforcements post-July 2023.

Endpoint mitigations include Windows Hello Extended Session Security (ESS), rewriting containers in JSON under secure processes. Detections monitor NGC metadata access, alerting on non-controller processes.

Their tools—Shay for Hello abuses and ROADtools for Azure AD—aid offensive and defensive efforts, drawing from blogs by Teal and others.

Links:

PostHeaderIcon [DefCon32] Taming the Beast: Inside Llama 3 Red Team Process

As large language models (LLMs) like Llama 3, trained on 15 trillion tokens, redefine AI capabilities, their risks demand rigorous scrutiny. Alessandro Grattafiori, Ivan Evtimov, and Royi Bitton from Meta’s AI Red Team unveil their methodology for stress-testing Llama 3. Their process, blending human expertise and automation, uncovers emergent risks in complex AI systems, offering insights for securing future models.

Alessandro, Ivan, and Royi explore red teaming’s evolution, adapting traditional security principles to AI. They detail techniques for discovering vulnerabilities, from prompt injections to multi-turn adversarial attacks, and assess Llama 3’s resilience against cyber and national security threats. Their open benchmark, CyberSecEvals, sets a standard for evaluating AI safety.

The presentation highlights automation’s role in scaling attacks and the challenges of applying conventional security to AI’s unpredictable nature, urging a collaborative approach to fortify model safety.

Defining AI Red Teaming

Alessandro outlines red teaming as a proactive hunt for AI weaknesses, distinct from traditional software testing. LLMs, with their vast training data, exhibit emergent behaviors that spawn unforeseen risks. The team targets capabilities like code generation and strategic planning, probing for exploits like jailbreaking or malicious fine-tuning.

Their methodology emphasizes iterative testing, uncovering how helpfulness training can lead to vulnerabilities, such as hallucinated command flags.

Scaling Attacks with Automation

Ivan details their automation framework, using multi-turn adversarial agents to simulate complex attacks. These agents, built on Llama 3, attempt tasks like vulnerability exploitation or social engineering. While effective, they struggle with long-form planning, mirroring a novice hacker’s limitations.

CyberSecEvals benchmarks these risks, evaluating models across high-risk scenarios. The team’s findings, shared openly, enable broader scrutiny of AI safety.

Cyber and National Security Threats

Royi addresses advanced threats, including attempts to weaponize LLMs for cyberattacks or state-level misuse. Tests reveal Llama 3’s limitations in complex hacking, but emerging techniques like “obliteration” remove safety guardrails, posing risks for open-weight models.

The team’s experiments with uplifting non-expert users via AI assistance show promise but highlight gaps in achieving expert-level exploits, referencing Google’s Project Naptime.

Future Directions and Industry Gaps

The researchers advocate integrating security lessons into AI safety, emphasizing automation and open-source collaboration. Alessandro notes the psychological toll of red teaming, handling extreme content like nerve gas research. They call for more security experts to join AI safety efforts, addressing gaps in testing emergent risks.

Their work, supported by CyberSecEvals, sets a foundation for safer AI, urging the community to explore novel vulnerabilities.

Links:

PostHeaderIcon [PyData Global 2024] Making Gaussian Processes Useful

Bill Engels and Chris Fonnesbeck, both brilliant software developers from PyMC Labs, delivered an insightful 90-minute tutorial at PyData Global 2024 titled “Making Gaussian Processes Useful.” Aimed at demystifying Gaussian processes (GPs) for practicing data scientists, their session bridged the gap between theoretical complexity and practical application. Using baseball analytics as a motivating example, Chris introduced Bayesian modeling and GPs, while Bill provided hands-on strategies for overcoming computational and identifiability challenges. This post explores their comprehensive approach, offering actionable insights for leveraging GPs in real-world scenarios.

Bayesian Inference and Probabilistic Programming

Chris kicked off the tutorial by grounding the audience in Bayesian inference, often implemented through probabilistic programming. He described it as writing software with partially random outputs, enabled by languages like PyMC that provide primitives for random variables. Unlike deterministic programming, probabilistic programming allows modeling distributions over variables, including functions via GPs. Chris explained that Bayesian inference involves specifying a joint probability model for data and parameters, using Bayes’ formula to derive the posterior distribution. This posterior reflects what we learn about unknown parameters after observing data, with the likelihood and priors as key components. The computational challenge lies in the normalizing constant, a multidimensional integral that probabilistic programming libraries handle numerically, freeing data scientists to focus on model specification.

Hierarchical Modeling with Baseball Data

To illustrate Bayesian modeling, Chris used the example of estimating home run probabilities for baseball players. He introduced a simple unpooled model where each player’s home run rate is modeled with a beta prior and a binomial likelihood, reflecting hits over plate appearances. Using PyMC, this model is straightforward to implement, with each line of code corresponding to a mathematical component. However, Chris highlighted its limitations: players with few at-bats yield highly uncertain estimates, leaning heavily on the flat prior. This led to the introduction of hierarchical modeling, or partial pooling, where individual home run rates are drawn from a population distribution with hyperparameters (mean and standard deviation). This approach shrinks extreme estimates, producing more realistic rates, as seen when comparing unpooled estimates (with outliers up to 80%) to pooled ones (clustered below 10%, aligning with real-world data like Barry Bonds’ 15% peak).

Gaussian Processes as a Hierarchical Extension

Chris transitioned to GPs, framing them as a generalization of hierarchical models for continuous predictors, such as player age affecting home run rates. Unlike categorical groups, GPs model relationships where similarity decreases with distance (e.g., younger players’ performance is more similar). A GP is a distribution over functions, parameterized by a mean function (often zero) and a covariance function, which defines how outputs covary based on input proximity. Chris emphasized two key properties of multivariate Gaussians—easy marginalization and conditioning—that make GPs computationally tractable despite their infinite dimensionality. By evaluating a covariance function at specific inputs, a GP yields a finite multivariate normal, enabling flexible, nonlinear modeling without explicitly parameterizing the function’s form.

Computational Challenges and the HSGP Approximation

One of the biggest hurdles with GPs is their computational cost, particularly for latent GPs used with non-Gaussian data like binomial home run counts. Chris explained that the posterior covariance function requires inverting a matrix, which scales cubically with the number of data points (e.g., thousands of players). This makes exact GPs infeasible for large datasets. To address this, he introduced the Hilbert Space Gaussian Process (HSGP) approximation, which reduces cubic compute time to linear by approximating the GP with a finite set of basis functions. These functions depend on the data, while coefficients rely on hyperparameters like length scale and amplitude. Chris demonstrated implementing an HSGP in PyMC to model age effects, specifying 100 basis functions and a boundary three times the data range, resulting in a model that ran in minutes rather than years.

Practical Debugging with GPs

Bill took over to provide practical tips for fitting GPs, emphasizing their sensitivity to priors and the need for debugging. He revisited the baseball example, modeling batting averages with a hierarchical model before introducing a GP to account for age effects. Bill showed that a standard hierarchical model treats players as exchangeable, pooling information equally across all players. A GP, however, allows local pooling, where players of similar ages inform each other more strongly. He introduced the exponentiated quadratic covariance function, which uses a length scale to define “closeness” in age and a scale parameter for effect size. Bill highlighted common pitfalls, such as small length scales reducing a GP to a standard hierarchical model or large length scales causing identifiability issues with intercepts, and provided solutions like informative priors (e.g., inverse gamma, log-normal) to constrain length scales to realistic ranges.

Advanced GP Modeling for Slugging Percentage

Bill concluded with a sophisticated model for slugging percentage, a metric reflecting hitting power, using 10 years of baseball data. The model included player, park, and season effects, with an HSGP to capture age effects. He initially used an exponentiated quadratic covariance function but encountered sampling issues (divergences), a common problem with GPs. Bill fixed this by switching to a Matern 5/2 covariance function, which assumes less smoothness and better suits real-world data, and adopting a centered parameterization for stronger age effects. These changes reduced divergences to near zero, producing a reliable model. The resulting age curve peaked at 26, aligning with baseball wisdom, and showed a decline for older players, demonstrating the GP’s ability to capture nonlinear trends.

Key Takeaways and Resources

Bill and Chris emphasized that GPs extend hierarchical models by enabling local pooling over continuous variables, but their computational and identifiability challenges require careful handling. Informative priors, appropriate covariance functions (e.g., Matern over exponential quadratic), and approximations like HSGP are critical for practical use. They encouraged using PyMC for its high-level interface and the Nutpie sampler for efficiency, while noting alternatives like GPFlow for specialized needs. Their GitHub repository, linked below, includes slides and notebooks for further exploration, making this tutorial a valuable resource for data scientists aiming to apply GPs effectively.

Links:

 

PostHeaderIcon [DefCon32] Securing CCTV Cameras Against Blind Spots

As CCTV systems underpin public safety, their vulnerabilities threaten to undermine trust. Jacob Shams, a security researcher, exposes a critical flaw in object detection: location-based confidence weaknesses, or “blind spots.” His analysis across diverse locations—Broadway, Shibuya Crossing, and Castro Street—reveals how pedestrian positioning impacts detection accuracy, enabling malicious actors to evade surveillance. Jacob’s novel attack, TipToe, exploits these gaps to craft low-confidence paths, reducing detection rates significantly.

Jacob’s research spans five object detectors, including YOLOv3 and Faster R-CNN, under varied lighting conditions. By mapping confidence levels to position, angle, and distance, he identifies areas where detection falters. TipToe leverages these findings, offering a strategic evasion tool with implications for urban security and beyond.

The study underscores the need for robust CCTV configurations, urging developers to address positional biases in detection algorithms to safeguard critical infrastructure.

Understanding Blind Spots

Jacob’s experiments reveal that pedestrian position—distance, angle, height—affects detector confidence by up to 0.7. Heatmaps from lab and real-world footage, including Shibuya Crossing, highlight areas of low confidence, persisting across YOLOv3, SSD, and others. These blind spots, independent of video quality or lighting, create exploitable gaps.

For instance, at Shibuya, TipToe reduces average path confidence by 0.16, enabling stealthy movement. This phenomenon, consistent across locations, exposes systemic flaws in current detection models.

The TipToe Evasion Attack

TipToe constructs minimum-confidence paths through CCTV scenes, leveraging positional data to minimize detection. Jacob demonstrates its efficacy, achieving significant confidence reductions in public footage. Unlike invasive methods like laser interference, TipToe requires no suspicious equipment, relying solely on strategic positioning.

This attack highlights the ease of exploiting blind spots, urging integrators to reassess camera placement and algorithm tuning.

Mitigating Detection Weaknesses

Jacob proposes recalibrating object detectors to account for positional variances, enhancing confidence in weak areas. Multi-angle camera setups and advanced models could further reduce blind spots. His open-source tools encourage community validation, fostering improvements in surveillance security.

The research calls for a paradigm shift in CCTV design, prioritizing resilience against evasion tactics to protect public spaces.