Jonathan Lalou's Blog

Posts Tagged ‘Elasticsearch’

[DevoxxPL2022] Did Anyone Say SemVer? • Philipp Krenn

Philipp Krenn, a developer advocate at Elastic, captivated audiences at Devoxx Poland 2022 with a witty and incisive exploration of semantic versioning (SemVer). Drawing from Elastic’s experiences with Elasticsearch, Philipp dissected the nuances of versioning, revealing why SemVer often ignites passionate debates. His talk navigated the ambiguities of defining APIs, the complexities of breaking changes, and the cultural dynamics of open-source versioning, offering a pragmatic lens for developers grappling with version management.

Decoding Semantic versioning

Philipp introduced SemVer, as formalized on semver.org, with its major version structure, where patch fixes bugs, minor adds features, and major introduces breaking changes. This simplicity, however, belies complexity in practice. He posed a sorting challenge with version strings like alpha.-, 2.-, and 11.-, illustrating SemVer’s arcane precedence rules, humorously cautioning against such obfuscation unless “trolling users.” Philipp noted that SemVer’s focus on APIs raises fundamental questions: what constitutes an API? For Elasticsearch, the REST API is sacrosanct, warranting major version bumps for changes, whereas plugin APIs, exposing internal Java packages, tolerate frequent breaks, sparking user frustration when plugins fail.

The Ambiguity of Breaking Changes

The definition of a breaking change varies by perspective, Philipp argued. Upgrading a supported JDK version, for instance, divides opinions—some view it as a system-altering break, others as an implementation detail. Security fixes further muddy the waters, as seen in Elastic’s handling of unintended insecure usage, where API “fixes” disrupted user workflows. Philipp cited the Log4j2 vulnerability, where maintainers supported multiple JDK versions across minor releases, avoiding major version increments. Accidental breaks, common in open-source projects, and asymmetric feature additions—easy to add, hard to remove—compound SemVer’s challenges, often leading to user distrust when expectations misalign.

Cultural and Practical Dilemmas

Philipp explored why SemVer debates are so heated, attributing it to differing interpretations of “correct” versioning. He critiqued version ranges, prevalent in npm but rare in Java, for introducing instability due to transitive dependency updates, advocating for tools like Dependabot to manage updates explicitly. Experimental APIs, marked as unstable, offer an escape hatch for breaking changes without major version bumps, though they demand diligent release note scrutiny. Pre-1.0 versions, dubbed the “Wild West,” lack SemVer guarantees, enabling unfettered changes but risking user confusion. Philipp contrasted SemVer with alternatives like calendar versioning, used by Ubuntu, noting its decline as SemVer dominates modern ecosystems.

Links:

Posted in en-US | Tags: APIManagement, DevoxxPL2022, DevoxxPoland, Elastic, Elasticsearch, PhilippKrenn, SemanticVersioning | No Comments »

[DevoxxPL2022] Integrate Hibernate with Your Elasticsearch Database • Bartosz de Boulange

Author: Jonathan Lalou

At Devoxx Poland 2022, Bartosz de Boulange, a Java developer at BGŻ BNP Paribas, Poland’s national development bank, delivered an insightful presentation on Hibernate Search, a powerful tool that seamlessly integrates traditional Object-Relational Mapping (ORM) with NoSQL databases like Elasticsearch. Bartosz’s talk focused on enabling full-text search capabilities within SQL-based applications, offering a practical solution for developers seeking to enhance search functionality without migrating entirely to a NoSQL ecosystem. Through a blend of theoretical insights and hands-on coding demonstrations, he illustrated how Hibernate Search can address complex search requirements in modern applications.

The Power of Full-Text Search

Bartosz began by addressing the challenge of implementing robust search functionality in applications backed by SQL databases. For instance, in a bookstore application, users might need to search for specific phrases within thousands of reviews. Traditional SQL queries, such as LIKE statements, are often inadequate for such tasks due to their limited ability to handle complex text analysis. Hibernate Search solves this by enabling full-text search, which includes character filtering, tokenization, and normalization. These features allow developers to remove irrelevant characters, break text into searchable tokens, and standardize data for efficient querying. Unlike native SQL full-text search capabilities, Hibernate Search offers a more streamlined and scalable approach, making it ideal for applications requiring sophisticated search features.

Integrating Hibernate with Elasticsearch

The core of Bartosz’s presentation was a step-by-step guide to integrating Hibernate Search with Elasticsearch. He outlined five key steps: creating JPA entities, adding Hibernate Search dependencies, annotating entities for indexing, configuring fields for NoSQL storage, and performing initial indexing. By annotating entities with @Indexed, developers can create indexes in Elasticsearch at application startup. Fields are annotated as @FullTextField for tokenization and search, @KeywordField for sorting, or @GenericField for basic querying. Bartosz emphasized the importance of the @FullTextField, which enables advanced search capabilities like fuzzy matching and phrase queries. His live coding demo showcased how to set up a Docker Compose file with MySQL and Elasticsearch, configure the application, and index a bookstore’s data, demonstrating the ease of integrating these technologies.

Scalability and Synchronization Challenges

A significant advantage of using Elasticsearch with Hibernate Search is its scalability. Unlike Apache Lucene, which is limited to a single node and suited for smaller projects, Elasticsearch supports distributed data across multiple nodes, making it ideal for enterprise applications. However, Bartosz highlighted a key challenge: synchronization between SQL and NoSQL databases. Changes in the SQL database may not immediately reflect in Elasticsearch due to communication overhead. To address this, he introduced an experimental outbox polling coordination strategy, which uses additional SQL tables to maintain update order. While still in development, this feature promises to improve data consistency, a critical aspect for production environments.

Practical Applications and Benefits

Bartosz demonstrated practical applications of Hibernate Search through a bookstore example, where users could search for books by title, description, or reviews. His demo showed how to query Elasticsearch for terms like “Hibernate” or “programming,” retrieving relevant results ranked by relevance. Additionally, Hibernate Search supports advanced features like sorting by distance for geolocation-based queries and projections for retrieving partial documents, reducing reliance on the SQL database for certain operations. These capabilities make Hibernate Search a versatile tool for developers aiming to enhance search performance while maintaining their existing SQL infrastructure.

Links:

BGŻ BNP Paribas company website

Posted in en-US | Tags: BartoszDeBoulange, BGŻBNPParibas, Development, DevoxxPL2022, DevoxxPoland, Elasticsearch, FullTextSearch, HibernateSearch, Java, NoSQL, SQL | No Comments »

[DevoxxFR2015] Visualizing Data with Elasticsearch, Logstash, and Kibana

Author: Jonathan Lalou

At Devoxx France 2015, David Pilato and Colin Surprenant, both deeply involved with Elasticsearch, shared a compelling narrative on leveraging the ELK stack (Elasticsearch, Logstash, Kibana) to transform raw data into actionable insights. David, a French customs service developer, and Colin, an Elasticsearch engineer, demonstrated how they met a marketing team’s data analysis needs in a fraction of the expected time.

Rapid Data Insights with ELK

David recounted a scenario where his marketing team needed to understand customer behavior and sentiment on Twitter. Using Logstash’s Twitter input plugin, he ingested data with simple configuration, leveraging API keys from Twitter’s developer portal. Elasticsearch indexed this data, enabling rapid searches, while Kibana visualized patterns, revealing customer trends in under 30 minutes.

This efficiency, David noted, showcases ELK’s power for real-time analytics.

Handling Dynamic Data Challenges

Colin addressed technical nuances, such as Logstash’s file input managing log rotations intelligently. Their setup handled dynamic file changes, ensuring continuous data flow. Q&A clarified plugin architectures resembling Ruby-like syntax, simplifying configuration. The duo’s approach turned complex data into clear visualizations, freeing time for open-source contributions.

This agility, Colin emphasized, accelerates business decision-making.

Community and Future Engagement

David and Colin invited attendees to a workshop and BOF session, promoting hands-on ELK exploration. Their open-source advocacy, evidenced by David’s RSS River plugin and Elasticsearch community leadership, underscored the stack’s accessibility. They encouraged leveraging GitHub resources for further learning.

This session inspires developers to harness ELK for data-driven solutions.

Links:

Posted in en-US | Tags: ColinSurprenant, DataVisualization, DavidPilato, DevoxxFR2015, Elasticsearch, Kibana, Logstash | No Comments »