[PyData Global 2024] Making Gaussian Processes Useful
Bill Engels and Chris Fonnesbeck, both brilliant software developers from PyMC Labs, delivered an insightful 90-minute tutorial at PyData Global 2024 titled “Making Gaussian Processes Useful.” Aimed at demystifying Gaussian processes (GPs) for practicing data scientists, their session bridged the gap between theoretical complexity and practical application. Using baseball analytics as a motivating example, Chris introduced Bayesian modeling and GPs, while Bill provided hands-on strategies for overcoming computational and identifiability challenges. This post explores their comprehensive approach, offering actionable insights for leveraging GPs in real-world scenarios.
Bayesian Inference and Probabilistic Programming
Chris kicked off the tutorial by grounding the audience in Bayesian inference, often implemented through probabilistic programming. He described it as writing software with partially random outputs, enabled by languages like PyMC that provide primitives for random variables. Unlike deterministic programming, probabilistic programming allows modeling distributions over variables, including functions via GPs. Chris explained that Bayesian inference involves specifying a joint probability model for data and parameters, using Bayes’ formula to derive the posterior distribution. This posterior reflects what we learn about unknown parameters after observing data, with the likelihood and priors as key components. The computational challenge lies in the normalizing constant, a multidimensional integral that probabilistic programming libraries handle numerically, freeing data scientists to focus on model specification.
Hierarchical Modeling with Baseball Data
To illustrate Bayesian modeling, Chris used the example of estimating home run probabilities for baseball players. He introduced a simple unpooled model where each player’s home run rate is modeled with a beta prior and a binomial likelihood, reflecting hits over plate appearances. Using PyMC, this model is straightforward to implement, with each line of code corresponding to a mathematical component. However, Chris highlighted its limitations: players with few at-bats yield highly uncertain estimates, leaning heavily on the flat prior. This led to the introduction of hierarchical modeling, or partial pooling, where individual home run rates are drawn from a population distribution with hyperparameters (mean and standard deviation). This approach shrinks extreme estimates, producing more realistic rates, as seen when comparing unpooled estimates (with outliers up to 80%) to pooled ones (clustered below 10%, aligning with real-world data like Barry Bonds’ 15% peak).
Gaussian Processes as a Hierarchical Extension
Chris transitioned to GPs, framing them as a generalization of hierarchical models for continuous predictors, such as player age affecting home run rates. Unlike categorical groups, GPs model relationships where similarity decreases with distance (e.g., younger players’ performance is more similar). A GP is a distribution over functions, parameterized by a mean function (often zero) and a covariance function, which defines how outputs covary based on input proximity. Chris emphasized two key properties of multivariate Gaussians—easy marginalization and conditioning—that make GPs computationally tractable despite their infinite dimensionality. By evaluating a covariance function at specific inputs, a GP yields a finite multivariate normal, enabling flexible, nonlinear modeling without explicitly parameterizing the function’s form.
Computational Challenges and the HSGP Approximation
One of the biggest hurdles with GPs is their computational cost, particularly for latent GPs used with non-Gaussian data like binomial home run counts. Chris explained that the posterior covariance function requires inverting a matrix, which scales cubically with the number of data points (e.g., thousands of players). This makes exact GPs infeasible for large datasets. To address this, he introduced the Hilbert Space Gaussian Process (HSGP) approximation, which reduces cubic compute time to linear by approximating the GP with a finite set of basis functions. These functions depend on the data, while coefficients rely on hyperparameters like length scale and amplitude. Chris demonstrated implementing an HSGP in PyMC to model age effects, specifying 100 basis functions and a boundary three times the data range, resulting in a model that ran in minutes rather than years.
Practical Debugging with GPs
Bill took over to provide practical tips for fitting GPs, emphasizing their sensitivity to priors and the need for debugging. He revisited the baseball example, modeling batting averages with a hierarchical model before introducing a GP to account for age effects. Bill showed that a standard hierarchical model treats players as exchangeable, pooling information equally across all players. A GP, however, allows local pooling, where players of similar ages inform each other more strongly. He introduced the exponentiated quadratic covariance function, which uses a length scale to define “closeness” in age and a scale parameter for effect size. Bill highlighted common pitfalls, such as small length scales reducing a GP to a standard hierarchical model or large length scales causing identifiability issues with intercepts, and provided solutions like informative priors (e.g., inverse gamma, log-normal) to constrain length scales to realistic ranges.
Advanced GP Modeling for Slugging Percentage
Bill concluded with a sophisticated model for slugging percentage, a metric reflecting hitting power, using 10 years of baseball data. The model included player, park, and season effects, with an HSGP to capture age effects. He initially used an exponentiated quadratic covariance function but encountered sampling issues (divergences), a common problem with GPs. Bill fixed this by switching to a Matern 5/2 covariance function, which assumes less smoothness and better suits real-world data, and adopting a centered parameterization for stronger age effects. These changes reduced divergences to near zero, producing a reliable model. The resulting age curve peaked at 26, aligning with baseball wisdom, and showed a decline for older players, demonstrating the GP’s ability to capture nonlinear trends.
Key Takeaways and Resources
Bill and Chris emphasized that GPs extend hierarchical models by enabling local pooling over continuous variables, but their computational and identifiability challenges require careful handling. Informative priors, appropriate covariance functions (e.g., Matern over exponential quadratic), and approximations like HSGP are critical for practical use. They encouraged using PyMC for its high-level interface and the Nutpie sampler for efficiency, while noting alternatives like GPFlow for specialized needs. Their GitHub repository, linked below, includes slides and notebooks for further exploration, making this tutorial a valuable resource for data scientists aiming to apply GPs effectively.
Links:
- Bill Engels’ LinkedIn
- Chris Fonnesbeck’s LinkedIn
- PyMC Labs
- PyMC
- GitHub Repository
- PyData Global 2024
- NumFOCUS
[DefCon32] Securing CCTV Cameras Against Blind Spots
As CCTV systems underpin public safety, their vulnerabilities threaten to undermine trust. Jacob Shams, a security researcher, exposes a critical flaw in object detection: location-based confidence weaknesses, or “blind spots.” His analysis across diverse locations—Broadway, Shibuya Crossing, and Castro Street—reveals how pedestrian positioning impacts detection accuracy, enabling malicious actors to evade surveillance. Jacob’s novel attack, TipToe, exploits these gaps to craft low-confidence paths, reducing detection rates significantly.
Jacob’s research spans five object detectors, including YOLOv3 and Faster R-CNN, under varied lighting conditions. By mapping confidence levels to position, angle, and distance, he identifies areas where detection falters. TipToe leverages these findings, offering a strategic evasion tool with implications for urban security and beyond.
The study underscores the need for robust CCTV configurations, urging developers to address positional biases in detection algorithms to safeguard critical infrastructure.
Understanding Blind Spots
Jacob’s experiments reveal that pedestrian position—distance, angle, height—affects detector confidence by up to 0.7. Heatmaps from lab and real-world footage, including Shibuya Crossing, highlight areas of low confidence, persisting across YOLOv3, SSD, and others. These blind spots, independent of video quality or lighting, create exploitable gaps.
For instance, at Shibuya, TipToe reduces average path confidence by 0.16, enabling stealthy movement. This phenomenon, consistent across locations, exposes systemic flaws in current detection models.
The TipToe Evasion Attack
TipToe constructs minimum-confidence paths through CCTV scenes, leveraging positional data to minimize detection. Jacob demonstrates its efficacy, achieving significant confidence reductions in public footage. Unlike invasive methods like laser interference, TipToe requires no suspicious equipment, relying solely on strategic positioning.
This attack highlights the ease of exploiting blind spots, urging integrators to reassess camera placement and algorithm tuning.
Mitigating Detection Weaknesses
Jacob proposes recalibrating object detectors to account for positional variances, enhancing confidence in weak areas. Multi-angle camera setups and advanced models could further reduce blind spots. His open-source tools encourage community validation, fostering improvements in surveillance security.
The research calls for a paradigm shift in CCTV design, prioritizing resilience against evasion tactics to protect public spaces.
[DefCon32] Smishing Smackdown: Unraveling the Threads of USPS Smishing and Fighting Back
In an era where digital scams proliferate, SMS phishing, or smishing, has surged, exploiting trust in institutions like the United States Postal Service (USPS). S1nn3r, a red team operator and founder of Phantom Security Group, recounts her journey tackling the “Smishing Triad,” a sophisticated operation distributing scam kits. Motivated by personal encounters with these fraudulent texts, S1nn3r’s investigation uncovers vulnerabilities in the kits, enabling access to their admin panels and exposing over 390,000 stolen credit card details across 900 domains.
S1nn3r’s expertise in web application testing, honed through bug bounties, drives her to reverse-engineer these kits. Collaborating with peers, she identifies two critical flaws, granting entry to administrative interfaces. This access reveals not only victim data but also scammer details like login IPs and passwords. Her findings, shared with banks and the USPS Inspector’s Office, aid in protecting nearly 880,000 victims, highlighting the power of proactive cybersecurity.
The talk illuminates the technical ingenuity behind smishing campaigns and offers strategies to combat them, emphasizing client-side filtering to thwart future attacks.
Anatomy of the Smishing Triad
S1nn3r begins by dissecting the USPS smishing campaign, which spiked during the holiday season. These messages, mimicking USPS alerts, lure users to fraudulent sites via links. The Smishing Triad’s kit, a scalable tool sold to scammers, automates these attacks, capturing credentials and financial data.
Through meticulous analysis, S1nn3r uncovers the kit’s structure, leveraging web vulnerabilities to infiltrate admin panels. This access exposes databases containing victim information, revealing the campaign’s vast reach.
Exploiting Kit Vulnerabilities
The investigation reveals two pivotal weaknesses: insecure authentication and misconfigured APIs. By exploiting these, S1nn3r gains administrative control, extracting data from over 40 panels. This includes scammer metadata, such as IPs and cracked passwords, offering insights into their operations.
Her collaboration with a Wired journalist and law enforcement underscores the real-world impact, linking stolen credit cards to specific scams. This evidence strengthens investigations, despite challenges in victim identification.
Countermeasures and Future Defenses
S1nn3r advocates enhanced client-side filtering, suggesting AI-driven solutions to detect suspicious texts. Third-party integrations, like Truecaller, offer practical defenses by flagging non-official USPS links. She cautions against man-in-the-middle attacks on SMS, emphasizing scalable, user-friendly protections.
Her work, shared via open-source tools, invites further research to dismantle smishing ecosystems, urging collective action against evolving scams.
Links:
[Scala IO Paris 2024] Escaping the False Dichotomy: Sanely Automatic Derivation in Scala
In the ScalaIO Paris 2024 session “Slow Auto, Inconvenient Semi: Escaping False Dichotomy with Sanely Automatic Derivation,” Mateusz Kubuszok delivered a user-focused exploration of typeclass derivation in Scala. With over nine years of Scala experience and as a co-maintainer of the Chimney library, Mateusz examined the trade-offs between automatic and semi-automatic derivation, proposing a “sanely automatic” approach that balances usability, compile-time performance, and runtime efficiency. Using JSON serialization libraries like Circe and Jsoniter-Scala as case studies, the talk highlighted how library design choices impact users and offered a practical solution to common derivation pain points, supported by benchmarks and a public repository.
Understanding Typeclasses and Derivation
Mateusz began by demystifying typeclasses, describing them as parameterized interfaces (e.g., an Encoder[T]
for JSON serialization) whose implementations are provided by the compiler via implicits or givens. Typeclass derivation automates creating these implementations for complex types like case classes or sealed traits by combining implementations for their fields or subtypes. For example, encoding a User(name: String, address: Address)
requires encoders for String
and Address
, which the compiler recursively resolves.
Derivation comes in two flavors: automatic and semi-automatic. Automatic derivation, popularized by Circe, uses implicits to generate implementations on-demand, but can lead to slow compilation and runtime performance issues. Semi-automatic derivation requires explicit calls (e.g., deriveEncoder[Address]
) to define implicits, ensuring consistency but adding manual overhead. Mateusz emphasized that these choices, made by library authors, significantly affect users through compilation times, error messages, and performance.
The Pain Points of Automatic and Semi-Automatic Derivation
Automatic derivation in Circe recursively generates encoders for nested types, checking for existing implicits before falling back to derivation. This can cause circular dependencies or stack overflows if not managed carefully. Semi-automatic derivation avoids this by always generating new instances, but requires users to define implicits for every intermediate type, increasing code verbosity. Mateusz shared anecdotes of developers banning automatic derivation due to compilation times ballooning to 10 minutes for complex JSON hierarchies.
Benchmarks on a nested case class (five levels deep) revealed stark differences. On Scala 2.13, Circe’s automatic derivation took 14 seconds to compile a single file, versus 12 seconds for semi-automatic. On Scala 3, automatic derivation soared to 46 seconds (cold JVM) or 16 seconds (warm), while semi-automatic took 10 seconds or 1 second, respectively. Runtime performance also suffered: automatic derivation on Scala 3 was up to 10 times slower than semi-automatic, likely due to large inlined methods overwhelming JVM optimization.
Comparing Libraries: Circe vs. Jsoniter-Scala
Mateusz contrasted Circe with Jsoniter-Scala, which prioritizes performance. Jsoniter-Scala uses a single macro to generate recursive codec implementations, avoiding intermediate implicits except for custom overrides. This reduces memory allocation and compilation overhead. Benchmarks showed Jsoniter-Scala compiling faster than Circe (e.g., 2–3 times faster on Scala 3) and running three times faster, even when Circe was given a head start by testing on JSON representations rather than strings.
Jsoniter-Scala’s approach minimizes implicit searches, embedding logic in macros instead of relying on the compiler’s typechecker. For example, encoding a User
with an Address
field involves one codec handling all nested types, unlike Circe’s recursive implicit resolution. This results in fewer allocations and better error messages, as macros can pinpoint failure causes (e.g., a missing implicit for a specific field).
Sanely Automatic Derivation: A New Approach
Inspired by Jsoniter-Scala, Mateusz proposed “sanely automatic derivation” to combine automatic derivation’s convenience with semi-automatic’s performance. Using his Chimney library as a testbed, he split typeclasses into two traits: one for user-facing APIs and another for automatic derivation, invisible to implicit searches to avoid circular dependencies. This allows recursive derivation with minimal implicits, using macros to handle nested types efficiently.
Mateusz implemented this for a Jsoniter-Scala wrapper on Scala 3, achieving compilation times comparable to Jsoniter-Scala’s single-implicit approach and faster than Circe’s semi-automatic derivation. Runtime performance matched Jsoniter-Scala’s, with negligible overhead. Error messages were improved by logging macro actions (e.g., which field caused a failure) via Scala’s macro settings, viewable in IDEs like VS Code without console output.
A Fair Comparison: Custom Typeclass Benchmark
To ensure fairness, Mateusz created a custom Show
typeclass (similar to Circe’s ShowPretty
) for pretty-printing case classes using StringBuilder
. He implemented it with Shapeless (Scala 2), Mirrors (Scala 3), Magnolia (both), and his sanely automatic approach. Initial results showed his approach outperforming Shapeless and Mirrors but trailing Magnolia’s semi-automatic derivation. By adding caching within macros (reusing derived implementations for repeated types), his approach became the fastest across all platforms, compiling in less time than Shapeless, Mirrors, or Magnolia, with better runtime performance.
This caching, inspired by Jsoniter-Scala, avoids re-deriving the same type multiple times within a macro, reducing method size and enabling JVM optimization. The change required minimal code, demonstrating that library authors can adopt this approach with a single, non-invasive pull request.
Implications for Scala’s Future
Mateusz concluded by addressing Scala’s reputation for slow compilation, citing a Virtus Lab survey where 53% of developers complained about compile times, often tied to typeclass derivation. Libraries like Shapeless and Magnolia prioritize developer convenience over user experience, leading to opaque errors and performance issues. His sanely automatic derivation offers a user-friendly alternative: one import, fast compilation, efficient runtime, and debuggable errors.
By sharing his Chimney Macro Commons library, Mateusz encourages library authors to rethink derivation strategies. While macros require more maintenance than Shapeless or Magnolia, they become viable as libraries scale and prioritize user needs. He urged developers to provide feedback to library maintainers, challenging assumptions that automatic and semi-automatic are the only options, to make Scala more accessible and production-ready.
Links:
- ScalaIO Paris 2024 Session Page
- Benchmark Repository
- Chimney GitHub
- @MateuszKubuszok on Twitter/X
- Mateusz on LinkedIn
- Mateusz on GitHub
Hashtags: #Scala #TypeclassDerivation #ScalaIOParis2024 #Circe #JsoniterScala #Chimney #Performance #Macros #Scala3
[DevoxxGR2024] The Art of Debugging Inside K8s Environment at Devoxx Greece 2024 by Andrii Soldatenko
At Devoxx Greece 2024, Andrii Soldatenko, a seasoned software engineer and tech evangelist at Dynatrace, delivered an engaging presentation on mastering the art of debugging within Kubernetes (K8s) environments. With a blend of humor, practical insights, and real-world strategies, Andrii illuminated the complexities of troubleshooting cloud-native applications. Drawing from his extensive experience, he provided actionable techniques to enhance debugging efficiency, making the session a valuable resource for developers navigating the intricacies of Kubernetes. His talk emphasized proactive design, robust tooling, and a systematic approach to resolving issues in distributed systems.
The Challenges of Debugging in Kubernetes
Andrii began by acknowledging the inherent difficulties of debugging in modern cloud-native environments. Unlike traditional development, where a local debugger suffices, Kubernetes introduces layers of complexity with containers, pods, and distributed architectures. He humorously outlined his “eight stages of debugging,” from denial (“this can’t happen”) to self-realization (“I wrote this code”), resonating with developers who face similar emotional journeys. These stages underscore the psychological and technical hurdles of troubleshooting in K8s, where issues often stem from accidental complexities like misconfigured resources or network policies.
The dynamic nature of Kubernetes, with its orchestration of pods, nodes, and services, demands a shift in debugging mindset. Andrii emphasized that while writing YAML manifests for K8s is straightforward, ensuring they function as intended is not. He highlighted the absence of comprehensive debugging guides, noting that most literature focuses on deployment rather than troubleshooting. This gap inspired his talk, which aimed to equip developers with practical strategies to diagnose and resolve issues effectively.
Strategies for Effective Debugging
To tackle Kubernetes debugging, Andrii proposed a structured approach, starting with a high-level mind map for assessing pod states. For instance, a pod in a “Pending” state might indicate resource shortages or port conflicts, while a “Crashing” pod could signal health probe failures. He focused on scenarios where pods are running but behaving unexpectedly, a common yet challenging issue. Andrii advocated revisiting init containers, which perform setup tasks like data migrations. By temporarily replacing their commands with a sleep directive, developers can use kubectl exec
to inspect the container’s state, checking volumes, permissions, or network access.
For containers lacking debugging tools, Andrii introduced ephemeral containers, a Kubernetes feature since version 1.8 designed for interactive troubleshooting. By launching an ephemeral container with tools like netcat
or a debugger, developers can inspect a pod’s state without altering its primary container. He shared a practical example of debugging a Go application by sharing process namespaces, allowing access to the application’s processes. This approach enables setting breakpoints and navigating code, even in minimal, distroless containers.
Leveraging Tools for Enhanced Debugging
Andrii showcased several tools to streamline Kubernetes debugging. He recommended building custom debug containers tailored to specific needs, such as including sqlite
, python
, or network utilities, and shared his own debug container on GitHub. For network-related issues, he highlighted a pre-existing container with tools like tcpdump
, which simplifies packet inspection without requiring manual installations. Andrii also praised Stern
, a CLI tool for tailing logs across multiple pods in a replica set, making it easier to trace requests and identify exceptions.
For developers using Visual Studio Code, Andrii demonstrated remote debugging by configuring a launch.json
file to connect to a Kubernetes pod. By exposing a debug port and using tools like Telepresence, developers can intercept cluster traffic and test changes locally, bypassing slow CI/CD cycles. He also highlighted K9s, a terminal-based UI for Kubernetes, with a custom plugin for initiating debug sessions via kubectl debug
. These tools collectively enhance efficiency, allowing developers to focus on problem-solving rather than manual configuration.
Best Practices for Proactive Debugging
Andrii concluded with actionable best practices to prevent and address debugging challenges. He stressed embedding version information, like Git commit SHAs, into container images to synchronize codebases during remote debugging. Scaling down traffic to a single pod ensures consistent debugging sessions, avoiding request distribution across replicas. He also advocated for a blameless culture, where developers use debuggers to slow down and analyze issues methodically rather than rushing to fix symptoms.
By sharing his GitHub repository and additional resources, Andrii encouraged attendees to experiment with these techniques. His talk was a compelling call to action for developers to embrace robust debugging practices, ensuring resilience and reliability in Kubernetes environments. Through practical demonstrations and a lighthearted approach, he demystified the complexities of cloud-native debugging, empowering developers to tackle issues with confidence.
Links:
[DevoxxBE2024] Mayday Mark 2! More Software Lessons From Aviation Disasters by Adele Carpenter
At Devoxx Belgium 2024, Adele Carpenter delivered a gripping follow-up to her earlier talk, diving deeper into the technical and human lessons from aviation disasters and their relevance to software engineering. With a focus on case studies like Air France 447, Copa Airlines 201, and British Midlands 92, Adele explored how system complexity, redundancy, and human factors like cognitive load and habituation can lead to catastrophic failures. Her session, packed with historical context and practical takeaways, highlighted how aviation’s century-long safety evolution offers critical insights for building robust, human-centric software systems.
The Evolution of Aviation Safety
Adele began by tracing the rapid rise of aviation from the Wright Brothers’ 1903 flight to the jet age, catalyzed by two world wars and followed by a 20% annual growth in commercial air traffic by the late 1940s. This rapid adoption led to a peak in crashes during the 1970s, with 230 fatal incidents, primarily due to pilot error, as shown in data from planecrashinfo.com. However, safety has since improved dramatically, with fatalities dropping to one per 10 million passengers by 2019. Key advancements, like Crew Resource Management (CRM) introduced after the 1978 United Airways 173 crash, reduced pilot-error incidents by enhancing cockpit communication. The 1990s and 2000s saw further gains through fly-by-wire technology, automation, and wind shear detection systems, making aviation a remarkable engineering success story.
The Perils of Redundancy and Complexity
Using Air France 447 (2009) as a case study, Adele illustrated how excessive redundancy can overwhelm users. The Airbus A330’s three pitot tubes, feeding airspeed data to multiple Air Data Inertial Reference Units (ADIRUs), failed due to icing, causing the autopilot to disconnect and bombard pilots with alerts. In alternate law, without anti-stall protection, the less-experienced pilot’s nose-up input led to a stall, exacerbated by conflicting control inputs in the dark cockpit. This cascade of failures—compounded by sensory overload and inadequate training—resulted in 228 deaths. Adele drew parallels to software, recounting an downtime incident at Trifork caused by a RabbitMQ cluster sync issue, highlighting how poorly understood redundancy can paralyze systems under pressure.
Deadly UX and Consistency Over Correctness
Copa Airlines 201 (1992) underscored the dangers of inconsistent user interfaces. A faulty captain’s vertical gyro fed bad data, disconnecting the autopilot. The pilots, trained on a simulator where a switch’s “left” position selected auxiliary data, inadvertently set both displays to the faulty gyro due to a reversed switch design in the actual Boeing 737. This “deadly UX” caused the plane to roll out of the sky, killing all aboard. Adele emphasized that consistency in design—over mere correctness—is critical in high-stakes systems, as it aligns with human cognitive limitations, reducing errors under stress.
Human Factors: Assumptions and Irrationality
British Midlands 92 (1989) highlighted how assumptions can derail decision-making. Experienced pilots, new to the 737-400, mistook smoke from a left engine fire for a right engine issue due to a design change in air conditioning systems. Shutting down the wrong engine led to a crash beside a motorway, though 79 of 126 survived. Adele also discussed irrational behavior under stress, citing the Manchester Airport disaster (1984), where 55 died from smoke inhalation during an evacuation. Post-crash recommendations, like strip lighting and wider exits, addressed irrational human behavior in emergencies, offering lessons for software in designing for stressed users.
Habituation and Complacency
Delta Airlines 1141 (1988) illustrated the risks of habituation, where routine dulls vigilance. Pilots, accustomed to the pre-flight checklist, failed to deploy flaps, missing a warning due to a modified takeoff alert system. The crash after takeoff killed 14. Adele likened this to software engineers ignoring frequent alerts, like her colleague Pete with muted notifications. She urged designing systems that account for human tendencies like habituation, ensuring alerts are meaningful and workflows prevent complacency. Her takeaways emphasized understanding users’ cognitive limits, balancing redundancy with simplicity, and prioritizing human-centric design to avoid software disasters.
Links:
[DevoxxBE2024] How JavaScript Happened: A Short History of Programming Languages
In an engaging session at Devoxx Belgium 2024, Mark Rendle traced the evolution of programming languages leading to JavaScript’s creation in 1995. Titled “How JavaScript Happened: A Short History of Programming Languages,” the talk blended humor and history, from Ada Lovelace’s 1840s program to JavaScript’s rapid development for Netscape Navigator 2.0. Despite a brief battery scare during the presentation, Rendle’s storytelling and FizzBuzz examples across languages captivated the audience, offering insights into language design and JavaScript’s eclectic origins.
The Dawn of Programming
Rendle began in the 1830s with Ada Lovelace, who wrote the first program for Charles Babbage’s unbuilt Analytical Engine, introducing programming notation 120 years before computers existed. The 1940s saw programmable machines like Colossus, built to crack German ciphers, and ENIAC, programmed by women who deciphered its operation without manuals. These early systems, configured via patch cables, laid the groundwork for modern computing, though programming remained labor-intensive.
The Rise of High-Level Languages
The 1950s marked a shift with Fortran, created by John Backus to simplify machine code translation for IBM’s 701 mainframe. Fortran introduced if statements, the asterisk for multiplication (due to punch card limitations), and the iterator variable i
, still ubiquitous today. ALGOL 58 and 60 followed, bringing block structures, if-then-else, and BNF grammar, formalized by Backus. Lisp, developed by John McCarthy, introduced first-class functions, the heap, and early garbage collection, while Simula pioneered object-oriented programming with classes and inheritance.
From APL to C and Beyond
Rendle highlighted APL’s concise syntax, enabled by its unique keyboard and dynamic typing, influencing JavaScript’s flexibility. The 1960s and 70s saw BCPL, B, and C, with C introducing curly braces, truthiness, and the iconic “hello world” program. Smalltalk added reflection, virtual machines, and the console, while ML introduced functional programming concepts like arrow functions. Scheme, a simplified Lisp, directly influenced JavaScript’s initial design as a browser scripting language, shaped to compete with Java applets.
JavaScript’s Hasty Creation
In 1995, Brendan Eich created JavaScript in ten days for Netscape Navigator 2.0, initially as a Scheme-like language with a DOM interface. To counter Java applets, it adopted a C-like syntax and prototypal inheritance (inspired by Self), as classical inheritance wasn’t feasible in Scheme. Rendle humorously speculated on advising Eich to add static typing and classical inheritance, noting JavaScript’s roots in Fortran, ALGOL, Lisp, and others. Despite its rushed origins, JavaScript inherited a rich legacy, from Fortran’s syntax to Smalltalk’s object model.
The Legacy and Future of JavaScript
Rendle concluded by reflecting on JavaScript’s dominance, driven by its browser integration, and its ongoing evolution, with features like async/await (from C#) and proposed gradual typing. He dismissed languages like COBOL and Pascal for lacking influential contributions, crediting BASIC for inspiring programmers despite adding little to language design. JavaScript, a synthesis of 70 years of innovation, continues to evolve, shaped by decisions from 1955 to today, proving no language is immune to historical influence.
Links:
Hashtags: #JavaScript #ProgrammingHistory #MarkRendle #DevoxxBE2024
[DefCon32] QuickShell: Sharing Is Caring About RCE Attack Chain on QuickShare
In the interconnected world of file sharing, Google’s QuickShare, bridging Android and Windows, presents a deceptively inviting attack surface. Or Yair and Shmuel Cohen, researchers at SafeBreach, uncover ten vulnerabilities, culminating in QuickShell, a remote code execution (RCE) chain exploiting five flaws. Their journey, sparked by QuickShare’s Windows expansion, reveals logical weaknesses that enable file writes, traffic redirection, and system crashes, culminating in a sophisticated RCE.
Or, a vulnerability research lead, and Shmuel, formerly of Check Point, dissect QuickShare’s Protobuf-based protocol. Initial fuzzing yields crashes but no exploits, prompting a shift to logical vulnerabilities. Their findings, responsibly disclosed to Google, lead to patches and two CVEs, addressing persistent Wi-Fi connections and file approval bypasses.
QuickShare’s design, facilitating seamless device communication, lacks robust validation, allowing attackers to manipulate file transfers and network connections. The RCE chain combines these flaws, achieving unauthorized code execution on Windows systems.
Protocol Analysis and Fuzzing
Or and Shmuel begin with QuickShare’s protocol, using hooks to decode Protobuf messages. Their custom fuzzer targets the Windows app, identifying crashes but lacking exploitable memory corruptions. This pivot to logical flaws uncovers issues like unauthenticated file writes and path traversals, exposing user directories.
Tools built for device communication enable precise vulnerability discovery, revealing weaknesses in QuickShare’s trust model.
Vulnerability Discoveries
The researchers identify ten issues: file write bypasses, denial-of-service (DoS) crashes, and Wi-Fi redirection via crafted access points. Notable vulnerabilities include forcing file approvals without user consent and redirecting traffic to malicious networks.
A novel HTTPS MITM technique amplifies the attack, intercepting communications to escalate privileges. These flaws, present in both Android and Windows, highlight systemic design oversights.
Crafting the RCE Chain
QuickShell chains five vulnerabilities: a DoS to destabilize QuickShare, a file write to plant malicious payloads, a path traversal to target system directories, a Wi-Fi redirection to control connectivity, and a final exploit triggering RCE. This unconventional chain leverages seemingly minor bugs, transforming them into a potent attack.
Demonstrations show persistent connections and code execution, underscoring the chain’s real-world impact.
Takeaways for Developers and Defenders
Or and Shmuel emphasize that minor bugs, often dismissed, can cascade into severe threats. The DoS flaw, critical to their chain, exemplifies how non-security issues enable attacks. They advocate holistic security assessments, beyond memory corruptions, to evaluate logical behaviors.
Google’s responsive fixes, completed by January 2025, validate the research’s impact. The team’s open-source tools invite further exploration, urging developers to prioritize robust validation in file-sharing systems.
Links:
[DevoxxBE2024] Thinking Like an Architect
In a reflective talk at Devoxx Belgium 2024, Gregor Hohpe, a veteran architect, shared insights from two decades of experience in “Thinking Like an Architect.” Hohpe debunked the myth of architects as all-knowing decision-makers, instead portraying them as “IQ boosters” who enhance team decision-making through models, metaphors, and multi-level communication. Despite a minor issue with a clicker during the presentation, his engaging delivery and relatable examples, like the “architect elevator,” offered practical strategies for navigating complex organizational and technical landscapes.
Connecting Levels with the Architect Elevator
Hohpe introduced the “architect elevator,” a metaphor for architects’ role in bridging organizational layers—from developers to executives. He argued that the most valuable architects connect business strategy to technical implementation, translating complex trade-offs into terms executives understand without oversimplifying. For example, automation and frequent releases (developer priorities) enable security and cost-efficiency (executive concerns). This connection counters the isolation caused by layered organizations, where management may assume all is well due to buzzwords like Kubernetes, while developers operate with unchecked freedom.
Seeing More Dimensions in Decision-Making
Architects expand solution spaces by revealing additional dimensions, Hohpe explained. Using a sketch of a cylinder mistaken as a circle or rectangle, he showed how architects resolve debates—like speed versus quality—by introducing options like automated testing. At AWS, Hohpe tackled vendor lock-in by framing it as a two-dimensional trade-off: switching costs versus benefits. This approach, inspired by Adrian Cockcroft’s analogy of marriage as “accepted lock-in,” fosters rational discussions, avoiding binary thinking and helping teams find balanced solutions.
Selling Options to Defer Decisions
Hohpe likened architects to options traders, deferring decisions to reduce uncertainty. For instance, standard APIs allow language flexibility, sacrificing some protocol options to gain adaptability. In a financial firm, he explained this to executives using options trading, noting that options’ value rises with volatility—a concept they instantly grasped via the Black-Scholes formula. This metaphor underscores architecture’s increasing relevance in uncertain environments, aligning it with agile methodologies, which thrive under similar conditions. However, options come at the cost of complexity, a trade-off architects must weigh.
Zooming In and Out for System-Wide Perspective
To tackle complexity, architects must zoom in and out, balancing local and global optima. Hohpe illustrated this with two systems using identical components but different connections, yielding opposite characteristics (e.g., latency versus resilience). Local optimization, like perfecting a single component, often fails to ensure system-wide success, as seen in operations where “all lights are green, but nothing works.” By viewing systems holistically, architects ensure decisions align with broader goals, avoiding pitfalls like excessive layering that propagates changes unnecessarily.
Using Models to Navigate Uncertainty
Hohpe emphasized models as architects’ best tools for simplifying complexity. Comparing geocentric and heliocentric solar system models, he showed how the right model makes decisions obvious, even if imperfect. Models vary by purpose—topographical maps for hiking, population density for logistics—requiring architects to choose based on the question at hand. In uncertain environments, models shine by forcing assumptions, enabling scenario-based planning (e.g., low, medium, high user loads). Hohpe urged architects to avoid absolutes, embracing shades of gray to find optimal trade-offs.
Links:
- Devoxx Belgium 2024
- The Architect Elevator
- Cloud Strategy by Gregor Hohpe
- Adrian Cockcroft’s Blog
- Black-Scholes Formula
Hashtags: #SoftwareArchitecture #ArchitectMindset #AgileArchitecture #DevoxxBE2024
[DevoxxBE2024] Project Leyden: Improving Java’s Startup Time by Per Minborg, Sébastien Deleuze
Per Minborg and Sébastien Deleuze delivered an insightful joint presentation at Devoxx Belgium 2024, unveiling the transformative potential of Project Leyden to enhance Java application startup time, warmup, and footprint. Per, from Oracle’s Java Core Library team, and Sébastien, a Spring Framework core committer at Broadcom, explored how Leyden shifts computation across time to optimize performance. Despite minor demo hiccups, such as Wi-Fi-related delays, their talk combined technical depth with practical demonstrations, showcasing how Spring Boot 3.3 leverages Leyden’s advancements, cutting startup times significantly and paving the way for future Java optimizations.
Understanding Project Leyden’s Mission
Project Leyden, an open-source initiative under OpenJDK, aims to address long-standing Java performance challenges: startup time, warmup time, and memory footprint. Per explained startup as the duration from launching a program to its first useful operation, like displaying “Hello World” or serving a Spring app’s initial request. Warmup, conversely, is the time to reach peak performance via JIT compilation. Leyden’s approach involves shifting computations earlier (e.g., at build time) or later (e.g., via lazy initialization) while preserving Java’s dynamic nature. Unlike GraalVM Native Image or Project CRaC, which sacrifice dynamism for speed, Leyden maintains compatibility, allowing developers to balance performance and flexibility.
Class Data Sharing (CDS) and AOT Cache: Today’s Solutions
Per introduced Class Data Sharing (CDS), a feature available since JDK 5, and its evolution into the Ahead-of-Time (AOT) Cache, a cornerstone of Leyden’s strategy. CDS preloads JDK classes, while AppCDS, introduced in JDK 10, extends this to application classes. The AOT Cache, an upcoming enhancement, stores class objects, resolved linkages, and method profiles, enabling near-instant startup. Sébastien demonstrated this with a Spring Boot Pet Clinic application, reducing startup from 3.2 seconds to 800 milliseconds using CDS and AOT Cache. The process involves a training run to generate the cache, which is then reused for faster deployments, though it requires consistent JVM and classpath configurations.
Spring Boot’s Synergy with Leyden
Sébastien highlighted the collaboration between the Spring and Leyden teams, initiated after a 2023 JVM Language Summit case study. Spring Boot 3.3 introduces features to simplify CDS and AOT Cache usage, such as extracting executable JARs into a CDS-friendly layout. A demo showed how a single command extracts the JAR, runs a training phase, and generates a cache, which is then embedded in a container image. This reduced startup times by up to 4x and memory usage by 20% when combined with Spring’s AOT optimizations. Sébastien also demonstrated how AOT Cache retains JIT “warmness,” enabling near-peak performance from startup, though a minor performance plateau gap is being addressed.
Future Horizons and Trade-offs
Looking ahead, Leyden plans to introduce stable values, a hybrid between mutable and immutable fields, offering final-like performance with flexible initialization. Per emphasized that Leyden avoids the heavy constraints of GraalVM (e.g., limited reflection) or CRaC (e.g., Linux-only, security concerns with serialized secrets). While CRaC achieves millisecond startups, its lifecycle complexities and security risks limit adoption. Leyden’s AOT Cache, conversely, offers significant gains (2–4x faster startups) with minimal constraints, making it ideal for most use cases. Developers can experiment with Leyden’s early access builds to optimize their applications, with further enhancements like code cache storage on the horizon.
Links:
Hashtags: #ProjectLeyden #Java #SpringBoot #AOTCache #CDS #StartupTime #JVM #DevoxxBE2024 #PerMinborg #SébastienDeleuze