Recent Posts
Archives

Posts Tagged ‘XML’

PostHeaderIcon [DevoxxUK2024] Processing XML with Kafka Connect by Dale Lane

Dale Lane, a seasoned developer at IBM with a deep focus on event-driven architectures, delivered a compelling session at DevoxxUK2024, unveiling a powerful Kafka Connect plugin designed to streamline XML data processing. With extensive experience in Apache Kafka and Flink, Dale addressed the challenges of integrating XML data into Kafka pipelines, a task often fraught with complexity due to the format’s incompatibility with Kafka’s native data structures like Avro or JSON. His presentation offers practical solutions for developers seeking to bridge external systems with Kafka, transforming XML into more manageable formats or generating XML outputs for legacy systems. Through clear examples, Dale illustrates how this open-source plugin enhances flexibility and efficiency in Kafka Connect pipelines, empowering developers to handle diverse data integration scenarios with ease.

Understanding Kafka Connect Pipelines

Dale begins by demystifying Kafka Connect, a robust framework for moving data between Kafka and external systems. He outlines two primary pipeline types: source pipelines, which import data from external systems into Kafka, and sink pipelines, which export Kafka data to external destinations. A source pipeline typically involves a connector to fetch data, optional transformations to modify or filter it, and a converter to serialize the data into formats like Avro or JSON for Kafka topics. Conversely, a sink pipeline starts with a converter to deserialize Kafka data, followed by transformations and a connector to deliver it to an external system. This foundational explanation sets the stage for understanding where and how XML processing fits into these workflows, ensuring developers grasp the pipeline’s modular structure before diving into specific use cases.

Converting XML for Kafka Integration

A common challenge Dale addresses is integrating XML data from external systems, such as IBM MQ or XML-based web services, into Kafka’s ecosystem, which favors structured formats. He introduces the Kafka Connect plugin, available on GitHub under an Apache license, as a solution to parse XML into structured records early in the pipeline. For instance, using an IBM MQ source connector, the plugin can transform XML documents from a message queue into a generic structured format, allowing subsequent transformations and serialization into JSON or Avro. Dale demonstrates this with a weather API that returns XML strings, showing how the plugin converts these into structured objects for further processing, making them compatible with Kafka tools that struggle with raw XML. This approach significantly enhances the usability of external data within Kafka’s ecosystem.

Generating XML Outputs from Kafka

For scenarios where external systems require XML, Dale showcases the plugin’s ability to convert Kafka’s JSON or Avro messages into XML strings within a sink pipeline. He provides an example using a Kafka topic with JSON messages destined for an IBM MQ system, where the plugin, integrated as part of the sink connector, transforms structured data into XML before delivery. Another case involves an HTTP sink connector posting to an XML-based web service, such as an XML-RPC API. Here, the pipeline deserializes JSON, applies transformations to align with the API’s payload requirements, and uses the plugin to produce an XML string. This flexibility ensures seamless communication with legacy systems, bridging modern Kafka workflows with traditional XML-based infrastructure.

Enhancing Pipelines with Schema Support

Dale emphasizes the plugin’s schema handling capabilities, which add robustness to XML processing. In source pipelines, the plugin can reference an external XSD schema to validate and structure XML data, which is then paired with an Avro converter to submit schemas to a registry, ensuring compatibility with Kafka’s schema-driven ecosystem. In sink pipelines, enabling schema inclusion generates an XSD alongside the XML output, providing a clear description of the data’s structure. Dale illustrates this with a stock price connector, where enabling schema support produces XML events with accompanying XSDs, enhancing interoperability. This feature is particularly valuable for maintaining data integrity across systems, making the plugin a versatile tool for complex integration tasks.

Links:

PostHeaderIcon Spring: Failed to read schema document

Case

I try to deploy a Mule ESB configuration, using this XML:

[xml]<?xml version="1.0" encoding="UTF-8"?>
<mule xmlns="http://www.mulesoft.org/schema/mule/core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:pattern="http://www.mulesoft.org/schema/mule/pattern"
xsi:schemaLocation="http://www.mulesoft.org/schema/mule/core
http://www.mulesoft.org/schema/mule/core/3.1/mule.xsd
http://www.mulesoft.org/schema/mule/pattern
http://www.mulesoft.org/schema/mule/pattern/3.1/mule-pattern.xsd
">
<pattern:simple-service name="authenticationService"
address="http://localhost:1234/authenticationService"
component-class="lalou.jonathan.esb.components.AuthenticationComponent"
type="direct" />
</mule>[/xml]

I get the following error:

[java]Ignored XML validation warning
org.xml.sax.SAXParseException: schema_reference.4: Failed to read schema document ‘http://www.mulesoft.org/schema/mule/pattern/3.1/mule-pattern.xsd'[/java]

Extended Stacktrace

[java]2011-11-22 16:10:25,375 WARN  xml.XmlBeanDefinitionReader         – Ignored XML validation warning
org.xml.sax.SAXParseException: schema_reference.4: Failed to read schema document ‘http://www.mulesoft.org/schema/mule/pattern/3.1/mule-pattern.xsd’, because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.warning(ErrorHandlerWrapper.java:96)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:380)[/java]

Notice that Mule ESB files are similar to classic Spring files. Of course, I first checked the pointed XSD was actually reachable.
Anyway, this error should be raised when your application, for any reason -firewall, proxies, network interruption-, cannot access the remote site where the XSD is hosted.

Fix

  • Copy the XSD to a local folder
  • Create a file spring.schemas
  • Make it available in the classpath in META-INF.
  • Add the following line
  • [java]http\://www.mulesoft.org/schema/mule/pattern/3.1/mule-pattern.xsd=WEB-INF/classes/mule-pattern.xsd[/java]

    The pattern is: missing resource (beware of escaping colon) = path in classpath of the local XSD

  • Rebuild, pack and run!

PostHeaderIcon Useful DTD

DTDs are useful when your XML editor take them in account: detecting errors, suggestions, complete statements… For instance, I save much time with IntelliJ IDEA automatic completion ; unlike, Eclipse amazingly does not implement this feature.

Here is a list of some widely used DTDs:

File DTD
weblogic-application.xml [xml]<!DOCTYPE weblogic-application PUBLIC "-//BEA Systems, Inc.//DTD WebLogic Application 7.0.0//EN" "http://www.oracle.com/technology/weblogic/weblogic-application/1.1/weblogic-application.xsd">[/xml]
weblogic-application.xml [xml]<!DOCTYPE weblogic-application PUBLIC "-//BEA Systems, Inc.//DTD WebLogic Application 7.0.0//EN" "http://www.oracle.com/technology/weblogic/weblogic-application/1.1/weblogic-application.xsd">[/xml]
web.xml [xml]<!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
"http://java.sun.com/dtd/web-app_2_3.dtd" >[/xml]
*.hbm.xml [xml]<!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 3.0//EN" "http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">[/xml]
GWT modules [xml]<!DOCTYPE module SYSTEM "http://google-web-toolkit.googlecode.com/svn/trunk/distro-source/core/src/gwt-module.dtd">[/xml]
GWT UI [xml]<!DOCTYPE ui:UiBinder SYSTEM "http://dl.google.com/gwt/DTD/xhtml.ent">[/xml]
Tangosol / Oracle Coherence [xml]<!DOCTYPE coherence SYSTEM "coherence.dtd">[/xml]
Log4J [xml]<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">[/xml]

Tangosol and Log4J DTDs are included within their distribution JARs: you have to extract them or to give their path to IntelliJ IDEA.

PostHeaderIcon Dynamic serviceUrl with Spring’s HttpInvokerProxyFactoryBean

Abstract

How to set dynamically the URL used by a HttpInvokerProxyFactoryBean in a Spring-deployed WAR?

Detailed Case

I have to deploy a GWT/GXT application, calling two distant services:
a remote EJB
a service accessed through Spring Remoting

Here is the Spring configuration file I firstly used:

[xml]
<util:properties id="jndiProperties" location="classpath:jndi.properties"/>
<jee:remote-slsb id="myRemoteEJBService" jndi-name="ejb.remote.myRemoteService"
business-interface="lalou.jonathan.myRemoteEJBService"
environment-ref="jndiProperties" cache-home="false"
lookup-home-on-startup="false" refresh-home-on-connect-failure="true" />

<bean id="mySpringRemoteService"
class="org.springframework.remoting.httpinvoker.HttpInvokerProxyFactoryBean">
<property name="serviceInterface"
value="lalou.jonathan.services.mySpringRemoteService" />
<property name="serviceUrl" value="${spring.remote.service.url}"/>
</bean>
[/xml]

Unhappily, even though the remote EJB is retrieved (which proves that the jndi file is available in the classpath and rightly loaded), the Spring Remote service is not. I had to write the URL in hard in the configuration file… This is not very efficient when you work in a large team, with different production and testings environments!

This is the log when myRemoteEJBService bean is loaded:
[java]2010-08-17 16:05:42,937 DEBUG support.DefaultListableBeanFactory – Creating shared instance of singleton bean ‘myRemoteEJBService’
2010-08-17 16:05:42,937 DEBUG support.DefaultListableBeanFactory – Creating instance of bean ‘myRemoteEJBService’
2010-08-17 16:05:42,937 DEBUG support.DefaultListableBeanFactory – Eagerly caching bean ‘myRemoteEJBService’ to allow for resolving potential circular references
2010-08-17 16:05:42,937 DEBUG support.DefaultListableBeanFactory – Returning cached instance of singleton bean ‘jndiProperties’
2010-08-17 16:05:42,937 DEBUG support.DefaultListableBeanFactory – Invoking afterPropertiesSet() on bean with name ‘myRemoteEJBService’
2010-08-17 16:05:42,937 DEBUG framework.JdkDynamicAopProxy – Creating JDK dynamic proxy: target source is EmptyTargetSource: no target class, static
2010-08-17 16:05:42,953 DEBUG support.DefaultListableBeanFactory – Finished creating instance of bean ‘myRemoteEJBService'[/java]

That is the log when mySpringRemoteService is loaded:
[java]2010-08-17 16:05:42,968 DEBUG support.DefaultListableBeanFactory – Creating shared instance of singleton bean ‘mySpringRemoteService’
2010-08-17 16:05:42,968 DEBUG support.DefaultListableBeanFactory – Creating instance of bean ‘mySpringRemoteService’
2010-08-17 16:05:42,984 DEBUG support.DefaultListableBeanFactory – Eagerly caching bean ‘mySpringRemoteService’ to allow for resolving potential circular references
2010-08-17 16:05:43,234 DEBUG support.DefaultListableBeanFactory – Invoking afterPropertiesSet() on bean with name ‘mySpringRemoteService’
2010-08-17 16:05:43,250 DEBUG framework.JdkDynamicAopProxy – Creating JDK dynamic proxy: target source is EmptyTargetSource: no target class, static
2010-08-17 16:05:43,250 DEBUG support.DefaultListableBeanFactory – Finished creating instance of bean ‘mySpringRemoteService'[/java]

You can notice that no mention to jndiProperties appears. Here is the key of the problem: jndiProperties is considered as a bean among others, which cannot be accessed easyly from the HttpInvokerProxyFactoryBean.

Fix

To fix the issue, you have to add an actual property holder in Spring XML configuration file, ie after:

[xml]<util:properties id="jndiProperties" location="classpath:jndi.properties"/>[/xml]

add an instanciation of PropertyPlaceholderConfigurer:

[xml]<bean id="propertyConfigurer" class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
<property name="location" value="classpath:jndi.properties"/>
</bean>[/xml]

PostHeaderIcon Unable to validate using XSD: Your JAXP provider does not support XML Schema

Here is the stacktrace that happened this morning:

Unable to validate using XSD: Your JAXP provider org.apache.crimson.jaxp.DocumentBuilderFactoryImpl@e6a73d does not support XML Schema. Are you running on Java 1.4 or below with Apache Crimson? Upgrade to Apache Xerces (or Java 1.5) for full XSD support.

To solve it, I had to remove the j2ee-X.X.jar from my project.xml Maven config file.

PostHeaderIcon Java, SAX, XML: About special characters encoding

Unlike what most people think, characters such as &eacute are not part of XML specification! And SAX follows rigorously this spec…

Read the rest of this entry »