- Notifications
You must be signed in to change notification settings - Fork357
An extensible Java framework for building event-driven applications that break up XML and non-XML data into chunks for data integration
License
smooks/smooks
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is the Git source code repository for theSmooks project.
The easiest way to get started with Smooks is to download and try out theexamples. The examples are the recommended base upon which to integrate Smooks into your application.
Smooks is an open-source, extensible Java framework for building event-driven applications that break up XML and non-XML data into chunks for data integration. It can be used as a lightweight framework on which to hook your own processing logic for a wide range of data formats but, out-of-the-box, Smooks ships with features that can be used individually or seamlessly together:
Java Binding: Populate POJOs from a source (CSV, EDI, XML, POJOs, etc…). Populated POJOs can either be the final result of a transformation, or serve as a bridge for further transformations like what is seen in template resources which generate textual results such as XML. Additionally, Smooks supports collections (maps and lists of typed data) that can be referenced from expression languages and templates.
Transformation: perform a wide range of data transformations and mappings. XML to XML, CSV to XML, EDI to XML, XML to EDI, XML to CSV, POJO to XML, POJO to EDI, POJO to CSV, etc…
Templating: extensible template-driven transformations, with support forXSLT,FreeMarker, andStringTemplate.
Scalable Processing: process huge payloads while keeping a small memory footprint. Split, transform and route fragments to JMS, filesystem, database, and other destinations.
Enrichment: enrich fragments with data from a database or other data sources.
Complex Fragment Validation: rule-based fragment validation.
Persistence: read fragments from, and save fragments to, a database with either JDBC, persistence frameworks (like MyBatis, Hibernate, or any JPA compatible framework), or DAOs.
Combine: leverage Smooks’s transformation, routing and persistence functionality forExtract Transform Load (ETL) operations.
Validation: perform basic or complex validation on fragment content. This is more than simple type/value-range validation.
Smooks was conceived to performfragment-based transformations on messages. Supporting fragment-based transformation opened up the possibility of mixing and matching different technologies within the context of a single transformation. This meant that one could leverage distinct technologies for transforming fragments, depending on the type of transformation required by the fragment in question.
In the process of evolving this fragment-based transformation solution, it dawned on us that we were establishing a fragment-based processing paradigm. Concretely, a framework was being built for targeting customvisitor logic at message fragments. A visitor does not need to be restricted to transformation. A visitor could be implemented to apply all sorts of operations on fragments, and therefore, the message as a whole.
Smooks supports a wide range of data structures - XML, EDI, JSON, CSV, POJOs (POJO to POJO!). A pluggable reader interface allows you to plug in a reader implementation for any data format.
The primary design goal of Smooks is to provide a framework that isolates and processes fragments in structured data (XML and non-XML) using existing data processing technologies (such as XSLT, plain vanilla Java, Groovy script).
A visitor targets a fragment with the visitor’s resourceselector value. The targeted fragment can take in as much or as little of the source stream as you like. A fragment is identified by the name of the node enclosing the fragment. You can target the whole stream using the node name of the root node as the selector or through the reserved#document
selector.
Note | The termsfragment andnode denote different meanings. It is usually acceptable to use the terms interchangeably because the difference is subtle and, more often than not, irrelevant. Anode may be the outer node of a fragment, excluding the child nodes. Afragment is the outer node and all its child nodes along with their character nodes (text, etc…). When a visitor targets a node, it typically means that the visitor can only process the fragment’s outer node as opposed to the fragment as a whole, that is, the outer node and its child nodes |
Smooks 2 introduces theDFDL cartridge and revamps itsEDI cartridge, while dropping support for Java 7 along with other notable changes:
DFDL cartridge
DFDL is a specification for describing file formats in XML. The DFDL cartridge leveragesApache Daffodil to parse files and unparse XML. This opens up Smooks to a wide array of data formats like SWIFT, ISO8583, HL7, and many more.
Added compatibility with Java 9 and later versions; retained compatibility for Java 8
Compose any series of transformations on an event outside the main execution context before directing the pipeline output to the execution result stream or to other destinations
Complete overhaul of the EDI cartridge and strengthening of EDI functionality
Rewritten to extend the DFDL cartridge and provide much better support for reading EDI documents
Added functionality to serialize EDI documents
As in previous Smooks versions, incorporated special support for EDIFACT
SAX NG filter
Replaces SAX filter and supersedes DOM filter
Brings with it a new visitor API which unifies the SAX and DOM visitor APIs
Cartridges migrated to SAX NG
Supports XSLT and StringTemplate resources unlike the legacy SAX filter
Mementos: a convenient way to stash and un-stash a visitor’s state during its execution lifecycle
Independent release cycles for all cartridges and oneMaven BOM (bill of materials) to track them all
License change
After reaching consensus among our code contributors, we’ve dual-licensed Smooks underLGPL v3.0 andApache License 2.0. This license change keeps Smooks open source while adopting a permissive stance to modifications.
New Smooks XSD schema (
xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"
)Uniform XML namespace declarations: dropped
default-selector-namespace
andselector-namespace
XML attributes in favour of declaring namespaces within the standardxmlns
attribute from thesmooks-resource-config
element.Removed
default-selector
attribute fromsmooks-resource-config
element: selectors need to be set explicitly
Dropped Smooks-specific annotations in favour of JSR annotations
Farewell
@ConfigParam
,@Config
,@AppContext
, and@StreamSinkWriter
. Welcome@Inject
.Farewell
@Initialize
and@Uninitialize
. Welcome@PostConstruct
and@PreDestroy
.
Separate top-level Java namespaces for API and implementation to provide a cleaner and more intuitive package structure: API interfaces and internal classes were relocated to
org.smooks.api
andorg.smooks.engine
respectivelyImproved XPath support for resource selectors
Functions like
not()
are now supported
Numerous dependency updates
Maven coordinates change: we are now publishing Smooks artifacts under Maven group IDs prefixed with
org.smooks
Replaced default SAX parser implementation from Apache Xerces toFasterXML’s Woodstox: benchmarks consistently showed Woodstox outperforming Xerces
Monitoring and management support with JMX
Comparing thecode examples for Smooks 1 withthose for Smooks 2 can be a useful guide in migrating to Smooks 2. While not exhaustive, we have compiled a list of notes to assist your migration:
Smooks 2 no longer supports Java 7. Your application needs to be compiled to at least Java 8 to run Smooks 2.
Replace
javax.xml.transform.Source
parameter inSmooks#filterSource(…)
method calls with:org.smooks.io.source.JavaSource
instead oforg.milyn.payload.JavaSource
org.smooks.io.source.StringSource
instead oforg.milyn.payload.StringSource
org.smooks.io.source.ByteSource
instead oforg.milyn.payload.ByteSource
org.smooks.io.source.DOMSource
instead oforg.milyn.payload.DOMSource
org.smooks.io.source.JavaSourceWithoutEventStream
instead oforg.milyn.payload.JavaSourceWithoutEventStream
org.smooks.io.source.ReaderSource
instead ofjavax.xml.transform.stream.StreamSource
when the latter is constructed fromjava.io.Reader
org.smooks.io.source.StreamSource
instead ofjavax.xml.transform.stream.StreamSource
when the latter is constructed fromjava.io.InputStream
org.smooks.io.source.URLSource
instead ofjavax.xml.transform.stream.StreamSource
when the latter is constructed from a system ID
Replace
javax.xml.transform.Result
parameter inSmooks#filterSource(…)
method calls with:org.smooks.io.sink.StringSink
instead oforg.milyn.payload.StringResult
org.smooks.io.sink.JavaSink
instead oforg.milyn.payload.JavaResult
org.smooks.io.sink.ByteSink
instead oforg.milyn.payload.ByteResult
org.smooks.io.sink.DOMSink
instead ofjavax.xml.transform.dom.DOMResult
org.smooks.io.sink.StreamSink
instead ofjavax.xml.transform.stream.StreamResult
when the latter is constructed fromjava.io.OutputStream
org.smooks.io.sink.WriterSink
instead ofjavax.xml.transform.stream.StreamResult
when the latter is constructed fromjava.io.Writer
Replace
closeResult
attribute in the XML config elementcore:filterSettings
withcloseSink
.Replace class interfaces:
org.milyn.delivery.ExecutionLifecycleInitializable
withorg.smooks.api.lifecycle.PreExecutionLifecycle
org.milyn.delivery.ExecutionLifecycleCleanable
withorg.smooks.api.lifecycle.PostExecutionLifecycle
org.milyn.delivery.VisitLifecycleCleanable
withorg.smooks.api.lifecycle.PostFragmentLifecycle
org.milyn.delivery.ConfigurationExpander
withorg.smooks.api.delivery.ResourceConfigExpander
org.milyn.event.ResourceBasedEvent
withorg.smooks.api.delivery.event.ResourceAwareEvent
Remove references to
org.milyn.util.CollectionsUtil
and write your own implementation for this class.Implement from
org.smooks.api.resource.visitor.sax.ng.SaxNgVisitor
instead oforg.milyn.delivery.sax.SAXVisitor
.Replace
Smooks#addConfiguration(…)
method calls withSmooks#addResourceConfig(…)
.Replace
Smooks#addConfigurations(…)
method calls withSmooks#addResourceConfigs(…)
.Replace references to:
org.milyn.javabean.DataDecode
withorg.smooks.api.converter.TypeConverterFactory
.org.milyn.cdr.annotation.Configurator
withorg.smooks.api.lifecycle.LifecycleManager
.org.milyn.javabean.DataDecoderException
withorg.smooks.api.converter.TypeConverterException
.org.milyn.cdr.SmooksResourceConfigurationStore
withorg.smooks.api.Registry
.org.milyn.cdr.SmooksResourceConfiguration
withorg.smooks.api.resource.config.ResourceConfig
.Replace calls to
setDefaultResource()
withsetSystem()
Replace calls to
isDefaultResource()
withisSystem()
org.milyn.delivery.sax.SAXToXMLWriter
withorg.smooks.io.DomSerializer
.org.milyn.delivery.dom.serialize.Serializer
references withorg.smooks.api.resource.visitor.SerializerVisitor
org.milyn.event.types.ConfigBuilderEvent
references withorg.smooks.api.delivery.event.ContentDeliveryConfigExecutionEvent
Replace
org.milyn.*
Java package references withorg.smooks.api
,org.smooks.engine
,org.smooks.io
ororg.smooks.support
.Change legacy document root fragment selectors from
$document
to#document
.Remove the
milyn-smooks-all
dependency from the Maven POM and import theSmooks BOM instead. Declare the corresponding dependency of each Smooks cartridge used within the project but omit the artifact version.Replace Smooks Maven coordinates to match the coordinates as described in theMaven guide.
Replace
ExecutionContext#isDefaultSerializationOn()
method calls withExecutionContext#getContentDeliveryRuntime().getDeliveryConfig().isDefaultSerializationOn()
.Replace
ExecutionContext#getContext()
method calls withExecutionContext#getApplicationContext()
.Replace
org.milyn.cdr.annotation.AppContext
annotations withjavax.inject.Inject
annotations.Replace
org.milyn.cdr.annotation.ConfigParam
annotations withjavax.inject.Inject
annotations:Substitute the
@ConfigParam
name attribute with the@javax.inject.Named
annotation.Wrap
java.util.Optional
around the field to mimic the behaviour of the@ConfigParam
optional attribute.
Replace
org.milyn.delivery.annotation.Initialize
annotations withjakarta.annotation.PostConstruct
annotations.Replace
org.milyn.delivery.annotation.Uninitialize
annotations withjakarta.annotation.PreDestroy
annotations.Follow theEDIFACT-to-Java example to migrate an implementation that binds an EDIFACT document to a POJO.
Follow theJava-to-EDIFACT example to migrate an implementation that deserialises a POJO into an EDIFACT document.
Set
ContainerResourceLocator
fromDefaultApplicationContextBuilder#setResourceLocator
instead fromApplicationContext#setResourceLocator
.
See theFAQ.
See theMaven guide for details on how to integrate Smooks into your project via Maven.
A commonly accepted definition of Smooks is of it being aTransformation Engine. Nonetheless, at its core, Smooks makes no reference todata transformation. The core codebase is designed to hook visitor logic into an event stream produced from a source of some kind. As such, in its most distilled form, Smooks is aStructured Data Event Stream Processor.
An application of a structured data event processor is transformation. In implementation terms, a Smooks transformation solution is a visitor reading the event stream from a source to produce a different representation of the input. However, Smooks’s core capabilities enable much more than transformation. A range of other solutions can be implemented based on the fragment-based processing model:
Java binding: population of a POJO from the source.
splitting & routing: perform complex splitting and routing operations on the source stream, including routing data in different formats (XML, EDI, CSV, POJO, etc…) to multiple destinations concurrently.
huge message processing: declaratively consume (transform, or split and route) huge messages without writing boilerplate code.
The following gives a 10,000 foot view of Smooks:
Smooks’s fundamental behaviour is to take an inputsource, such as CSV, and from it generate anevent stream to whichvisitors are applied to produce aresult, such as EDI. In Smooks nomenclature, this behaviour is called filtering. During filtering, you have other Smooks actors which are participating, including:
resources
application context
execution context
bean context
registry
listeners
All of these actors are explained in later sections. Several sources and result types are supported which equate to different transformation types, including but not limited to:
XML to XML
XML to POJO
POJO to XML
POJO to POJO
EDI to XML
EDI to POJO
POJO to EDI
CSV to XML
CSV to …
… to …
Smooks maps the source to the result with the help of a highly-tunable SAX event model. The hierarchical events generated from an XML source (startElement,endElement, etc…) drive the SAX event model though the event model can be just as easily applied to other structured data sources (EDI, CSV, POJO, etc…). The most important events are typically thebefore andafter visit events. The following illustration conveys the hierarchical nature of these events:
One or more ofSaxNgVisitor interfaces need to be implemented in order to consume the SAX event stream produced from the source, depending on which events are of interest.
The following is a hello world app demonstrating how to implement a visitor that is fired on thevisitBefore
andvisitAfter
events of a targeted node in the event stream. In this case, Smooks configures the visitor to target elementfoo
:
The visitor implementation is straightforward: one method implementation per event. As shown above, a Smooks config (more aboutresource-config
later on) is written to target the visitor at a node’svisitBefore
andvisitAfter
events.
The Java code executing the hello world app is a two-liner:
Smookssmooks =newSmooks("/smooks/echo-example.xml");smooks.filterSource(newStreamSource(inputStream));
Observe that in this case the program does not produce a result. The program does not even interact with the filtering process in any way because it does not provide anExecutionContext
tosmooks.filterSource(...)
.
This example illustrated the lower level mechanics of the Smooks’s programming model. In reality, most users are not going to want to solve their problems at this level of detail. Smooks ships with substantial pre-built functionality, that is, pre-built visitors. Visitors are bundled based on functionality: these bundles are calledCartridges.
A Smooks execution consumes an source of one form or another (XML, EDI, POJO, JSON, CSV, etc…), and from it, generates an event stream that fires different visitors (Java, Groovy, DFDL, XSLT, etc…). The goal of this process can be to produce a new result stream in a different format (data transformation), bind data from the source to POJOs and produce a populated Java object graph (Java binding), produce many fragments (splitting), and so on.
At its core, Smooks views visitors and other abstractions as resources. Aresource is applied when aselector matches a node in the event stream. The generality of such a processing model can be daunting from a usability perspective because resources are not tied to a particular domain. To counteract this, Smooks 1.1 introduced anExtensible Configuration Model feature that allows specific resource types to be specified in the configuration using dedicated XSD namespaces of their own. Instead of having a generic resource config such as:
<resource-configselector="order-item"> <resourcetype="ftl"><!-- <item> <id>${.vars["order-item"].@id}</id> <productId>${.vars["order-item"].product}</productId> <quantity>${.vars["order-item"].quantity}</quantity> <price>${.vars["order-item"].price}</price></item>--> </resource></resource-config>
an Extensible Configuration Model allows us to have a domain-specific resource config:
<ftl:freemarkerapplyOnElement="order-item"> <ftl:template><!-- <item> <id>${.vars["order-item"].@id}</id> <productId>${.vars["order-item"].product}</productId> <quantity>${.vars["order-item"].quantity}</quantity> <price>${.vars["order-item"].price}</price></item>--> </ftl:template></ftl:freemarker>
When comparing the above snippets, the latter resource has:
A more strongly typed domain specific configuration and so is easier to read,
Auto-completion support from the user’s IDE because the Smooks 1.1+ configurations are XSD-based, and
No need set the resource type in its configuration.
Central to how Smooks works is the concept of a visitor. A visitor is a Java class performing a specific task on the targeted fragment such as applying an XSLT script, binding fragment data to a POJO, validate fragments, etc…
Resource selectors are another central concept in Smooks. A selector chooses the node/s a visitor should visit, as well working as a simple opaque lookup value for non-visitor logic.
When the resource is a visitor, Smooks will interpret the selector as anXPath-like expression. There are a number of things to be aware of:
The order in which the XPath expression is applied is the reverse of a normal order, like what hapens in an XSLT script. Smooks inspects backwards from the targeted fragment node, as opposed to forwards from the root node.
Not all of the XPath specification is supported. A selector supports the following XPath syntax:
text()
and attribute value selectors:a/b[text() = 'abc']
,a/b[text() = 123]
,a/b[@id = 'abc']
,a/b[@id = 123]
.text()
is only supported on the last selector step in an expression:a/b[text() = 'abc']
is legal whilea/b[text() = 'abc']/c
is illegal.text()
is only supported on visitor implementations that implement theAfterVisitor
interfaceonly. If the visitor implements theBeforeVisitor
orChildrenVisitor
interfaces, an error will result.
or
&and
logical operations:a/b[text() = 'abc' and @id = 123]
,a/b[text() = 'abc' or @id = 123]
Namespaces on both the elements and attributes:
a:order/b:address[@b:city = 'NY']
.NoteThis requires the namespace prefix-to-URI mappings to be defined. A configuration error will result if not defined. Read thenamespace declaration section for more details. Supports
=
(equals),!=
(not equals),<
(less than),>
(greater than).Index selectors:
a/b[3]
.
Thexmlns
attribute is used to bind a selector prefix to a namespace:
<?xml version="1.0"?><smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:c="http://c"xmlns:d="http://d"> <resource-configselector="c:item[@c:code = '8655']/d:units[text() = 1]"> <resource>com.acme.visitors.MyCustomVisitorImpl</resource> </resource-config></smooks-resource-list>
Alternatively, namespace prefix-to-URI mappings can be declared using the legacy core confignamespace
element:
<?xml version="1.0"?><smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd"> <core:namespaces> <core:namespaceprefix="c"uri="http://c"/> <core:namespaceprefix="d"uri="http://d"/> </core:namespaces> <resource-configselector="c:item[@c:code = '8655']/d:units[text() = 1]"> <resource>com.acme.visitors.MyCustomVisitorImpl</resource> </resource-config></smooks-resource-list>
Smooks relies on aReader for ingesting a source and generating a SAX event stream. A reader is any class extendingXMLReader
. By default, Smooks uses theXMLReader
returned fromXMLReaderFactory.createXMLReader()
. You can easily implement your ownXMLReader
to create a non-XML reader that generates the source event stream for Smooks to process:
<?xml version="1.0"?><smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"> <readerclass="com.acme.ZZZZReader" /><!-- Other Smooks resources, e.g. <jb:bean> configs for binding data from the ZZZZ data stream into POJOs....--></smooks-resource-list>
Thereader
config element is referencing a user-definedXMLReader
. It can be configured with a set of handlers, features and parameters:
<readerclass="com.acme.ZZZZReader"> <handlers> <handlerclass="com.X" /> <handlerclass="com.Y" /> </handlers> <features> <setOnfeature="http://a" /> <setOnfeature="http://b" /> <setOfffeature="http://c" /> <setOfffeature="http://d" /> </features> <params> <paramname="param1">val1</param> <paramname="param2">val2</param> </params></reader>
Packaged Smooks modules, known ascartridges, provide support for non-XML readers but, by default, Smooks expects an XML source. Omit the class name from thereader
element to set features on the default XML reader:
<reader> <features> <setOnfeature="http://a" /> <setOnfeature="http://b" /> <setOfffeature="http://c" /> <setOfffeature="http://d" /> </features></reader>
Smooks can present output to the outside world in two ways:
As instances of
Sink
: client code extracts output from theSink
instance after passing an empty one toSmooks#filterSource(...)
.As side effects: during filtering, resource output is sent to web services, local storage, queues, data stores, and other locations. Events trigger the routing of fragments to external endpoints such as what happens whensplitting and routing.
Unless configured otherwise, a Smooks execution does not accumulate the input data to produce all the outputs. The reason is simple: performance! Consider a document consisting of hundreds of thousands (or millions) of orders that need to be split up and routed to different systems in different formats, based on different conditions. The only way of handing documents of these magnitudes is by streaming them.
Important | Smooks can generate output in either, or both, of the above ways, all in a single filtering pass of the source. It does not need to filter the source multiple times in order to generate multiple outputs, critical for performance. |
A look at the Smooks API reveals that Smooks can be supplied with multipleSink
instances:
publicvoidfilterSource(Sourcesource,Sink...sinks)throwsSmooksException
Smooks can work with implementation ofStreamSink
andDOMSink
sink types, as well as:
JavaSink
: sink type for capturing the contents of the Smooks JavaBean context.StringSink
:StringSink
extension wrapping aStringWriter
, useful for testing.
Important | As yet, Smooks does not support capturing output to multipleSink instances of the same type. For example, you can specify multipleStreamSink instances inSmooks.filterSource(...) but Smooks will only output to the firstStreamSink instance. |
TheStreamSink
andDOMSink
types receive special attention from Smooks. When thedefault.serialization.on
global parameter is turned on, which by default it is, Smooks serializes the stream of events to XML while filtering the source. The XML is fed to theSink
instance if aStreamSink
orDOMSink
is passed toSmooks#filterSource
.
Note | This is the mechanism used to perform a standard 1-input/1-xml-output character-based transformation. |
Smooks is also able to generate different types of output during filtering, that is, while filtering the source event stream but before it reaches the end of the stream. A classic example of this output type is when it is used to split and route fragments to different endpoints for processing by other processes.
A pipeline is a flexible, yet simple, Smooks construct that isolates the processing of a targeted event from its main processing as well as from the processing of other pipelines. In practice, this means being able to compose any series of transformations on an event outside the main execution context before directing the pipeline output to the execution sink stream or to other destinations. With pipelines, you can enrich data, rename/remove nodes, and much more.
Under the hood, a pipeline is just another instance of Smooks, made self-evident from the Smooks config element declaring a pipeline:
<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd"> <core:smooksfilterSourceOn="..."> <core:action> ... </core:action> <core:config> <smooks-resource-list> ... </smooks-resource-list> </core:config> </core:smooks></smooks-resource-list>
core:smooks
fires a nested Smooks execution whenever an event in the stream matches thefilterSourceOn
selector. The pipeline within the innersmooks-resource-list
element visits the selected event and its child events. It is worth highlighting that the innersmooks-resource-list
element behaves identically to the outer one, and therefore, it accepts resources like visitors, readers, and even pipelines (a pipeline within a pipeline!). Moreover, a pipeline is transparent to its nested resources: a resource’s behaviour remains the same whether it’s declared inside a pipeline or outside it.
The optionalcore:action
element tells the nested Smooks instance what to do with the pipeline’s output. The next sections list the supported actions.
Merges the pipeline’s output with the sink stream:
...<core:action> <core:inline> ... </core:inline></core:action>...
As described in the subsequent sections, an inline action replaces, prepends, or appends content.
Substitutes the selected fragment with the pipeline output:
...<core:inline> <core:replace/></core:inline>...
Adds the output before the selector start tag:
<core:inline> <core:prepend-before/></core:inline>
Adds the output after the selector start tag:
<core:inline> <core:prepend-after/></core:inline>
Adds the output before the selector end tag:
<core:inline> <core:append-before/></core:inline>
Binds the output to the execution context’s bean store:
...<core:action> <core:bindToid="..."/></core:action>...
Thecore:rewrite
construct is a reader designed to offer a convenient mechanism for substituting the event stream entering apipeline with one that the pipeline resources can process.
core:rewrite
enables one or more of its enclosed visitors to substitute targeted events with new events. In the example that follows, the pipeline feeds the event stream tocore:rewrite
, andcore:rewrite
in turn, feeds targeted events to the nested FreeMarker visitors:
<?xml version="1.0"?><smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd"xmlns:ftl="https://www.smooks.org/xsd/smooks/freemarker-2.0.xsd"xmlns:edifact="https://www.smooks.org/xsd/smooks/edifact-2.0.xsd"> ... ... <core:smooksfilterSourceOn="#document"> <core:action> <core:inline> <core:replace/> </core:inline> </core:action> <core:config> <smooks-resource-list> <core:rewrite> <ftl:freemarkerapplyOnElement="#document"applyBefore="true"> <ftl:template>header.xml.ftl</ftl:template> </ftl:freemarker> <core:smooksfilterSourceOn="record"maxNodeDepth="0"> <core:config> <smooks-resource-list> <ftl:freemarkerapplyOnElement="#document"> <ftl:template>body.xml.ftl</ftl:template> </ftl:freemarker> </smooks-resource-list> </core:config> </core:smooks> <ftl:freemarkerapplyOnElement="#document"> <ftl:template>footer.xml.ftl</ftl:template> </ftl:freemarker> </core:rewrite> <edifact:unparserschemaUri="/d96a/EDIFACT-Messages.dfdl.xsd"unparseOnNode="*"> <edifact:messageTypes> <edifact:messageType>ORDERS</edifact:messageType> </edifact:messageTypes> </edifact:unparser> </smooks-resource-list> </core:config> </core:smooks> ... ...</smooks-resource-list>
A visitor withincore:rewrite
writes XML fragments to the sink stream, replacing the targeted events. For example, in the config above, the FreeMarker visitors are replacing the#document
andrecord
events with materialised XML templates. More precisely,core:rewrite
converts the materialised XML into a new event stream which is then processed by the downstream pipeline resources, in this case,edifact:unparser
.
Tip | The full example isavailable in the smooks-examples repository. |
When implementing your own visitor forcore:rewrite
, callorg.smooks.io.Stream#out(org.smooks.api.ExecutionContext).write(java.lang.String)
within one of the overridden visit methods to replace the event stream as shown below:
packageorg.smooks.benchmark;......publicclassBibliographyVisitorimplementsAfterVisitor {privatefinalstaticStringTEMPLATE ="<entry><author>%s</author><title>%s</title></entry>";privateDOMXPathdomXPath;privateDOMXPathtitleXPath;@PostConstructpublicvoidpostConstruct()throwsJaxenException {this.domXPath =newDOMXPath("//author");this.titleXPath =newDOMXPath("//title"); }@OverridepublicvoidvisitAfter(Elementelement,ExecutionContextexecutionContext) {try {List<Element>authors = ((List<Element>)domXPath.evaluate(element));List<Element>titles = ((List<Element>)titleXPath.evaluate(element));Stream.out(executionContext).write(String.format(TEMPLATE,authors.isEmpty() ?"N/A" :authors.get(0).getTextContent(),"<![CDATA[" + (titles.isEmpty() ?"N/A" :titles.get(0).getTextContent())) +"]]>"); }catch (IOException |JaxenExceptione) {thrownewSmooksException(e); } }}
BibliographyVisitor
is a custom visitor which visits end events. ThevisitAfter
method evaluates the author elements together with the title elements and writes XML to the sink stream replacing the selected events.
The basic functionality of Smooks can be extended through the development of a Smooks cartridge. A cartridge is a Java archive (JAR) containing reusable resources (also known asContent Handlers). A cartridge augments Smooks with support for a specific type input source or event handling.
Visit theGitHub repositories page for the complete list of Smooks cartridges.
A Smooks filter delivers generated events from a reader to the application’s resources. Smooks 1 had the DOM and SAX filters. The DOM filter was simple to use but kept all the events in memory while the SAX filter, though more complex, delivered the events in streaming fashion. Having two filter types meant two different visitor APIs and execution paths, with all the baggage it entailed.
Smooks 2 unifies the legacy DOM and SAX filters without sacrificing convenience or performance. The new SAX NG filter drops the API distinction between DOM and SAX. Instead, the filter streams SAX events aspartial DOM elements to SAX NG visitors targeting the element. A SAX NG visitor can read the targeted node as well as any of the node’s ancestors but not the targeted node’s children or siblings in order to keep the memory footprint to a minimum.
The SAX NG filter can mimic DOM by setting itsmax.node.depth
parameter to 0 (default value is 1), allowing each visitor to process the complete DOM tree in itsvisitAfter(...)
method:
<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"> <params> <paramname="max.node.depth">0</param> </params> ...</smooks>
Amax.node.depth
value of greater than 1 will tell the filter to read and keep an node’s descendants up to the desired depth. Take the following input as an example:
<orderid="332"> <header> <customernumber="123">Joe</customer> </header> <order-items> <order-itemid="1"> <product>1</product> <quantity>2</quantity> <price>8.80</price> </order-item> <order-itemid="2"> <product>2</product> <quantity>2</quantity> <price>8.80</price> </order-item> <order-itemid="3"> <product>3</product> <quantity>2</quantity> <price>8.80</price> </order-item> </order-items></order>
Along with the config:
<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"> <params> <paramname="max.node.depth">2</param> </params> <resource-configselector="order-item"> <resource>org.acme.MyVisitor</resource> </resource-config></smooks>
At any given time, there will always be a singleorder-item in memory containingproduct becausemax.node.depth
is 2. Each neworder-item overwrites the previousorder-item to minimise the memory footprint.MyVisitor#visitAfter(...)
is invoked 3 times, each invocation corresponding to anorder-item fragment. The first invocation will process:
<order-itemid='1'> <product>2</product></order-item>
While the second invocation will process:
<order-itemid='2'> <product>2</product></order-item>
Whereas the last invocation will process:
<order-itemid='3'> <product>3</product></order-item>
Programmatically, implementingorg.smooks.api.resource.visitor.sax.ng.ParameterizedVisitor
will give you fine-grained control over the visitor’s targeted element depth:
...publicclassDomVisitorimplementsParameterizedVisitor {@OverridepublicvoidvisitBefore(Elementelement,ExecutionContextexecutionContext) { }@OverridepublicvoidvisitAfter(Elementelement,ExecutionContextexecutionContext) {System.out.println("Element: " +XmlUtil.serialize(element,true)); }@OverridepublicintgetMaxNodeDepth() {returnInteger.MAX_VALUE; }}
ParameterizedVisitor#getMaxNodeDepth()
returns an integer denoting the targeted element’s maximum tree depth the visitor can accept in itsvisitAfter(...)
method.
Filter-specific knobs are set through thesmooks-core configuration namespace (https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd
) introduced in Smooks 1.3:
<?xml version="1.0"?><smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd"> <core:filterSettingstype="SAX NG" (1)defaultSerialization="true" (2)terminateOnException="true" (3)closeSource="true" (4)closeSink="true" (5)rewriteEntities="true" (6)readerPoolSize="3"/> (7)<!-- Other visitor configs etc...--></smooks-resource-list>
type
(default:SAX NG
): the type of processing model that will be used.SAX NG
is the recommended type. TheDOM
type is deprecated.defaultSerialization
(default:true
): if default serialization should be switched on. Default serialization being turned on simply tells Smooks to locate aStreamSink
(orDOMSink
) in the Sink objects provided to theSmooks.filterSource
method and to serialize all events to thatSink
instance. This behavior can be turned off using this global configuration parameter and can be overridden on a per-fragment basis by targeting a visitor at that fragment that takes ownership of theorg.smooks.io.FragmentWriter
object.terminateOnException
(default:true
): whether an exception should terminate execution.closeSource
(default:true
): closeInputStream
instance streams passed to theSmooks.filterSource
method. The exception here isSystem.in
, which will never be closed.closeSink
: close Sink streams passed to the[Smooks.filterSource
method (default "true"). The exception here isSystem.out
andSystem.err
, which will never be closed.rewriteEntities
: rewrite XML entities when reading and writing (default serialization) XML.readerPoolSize
: reader Pool Size (default 0). Some Reader implementations are very expensive to create (e.g. Xerces). Pooling Reader instances (i.e. reusing) can result in a huge performance improvement, especially when processing lots of "small" messages. The default value for this setting is 0 (i.e. unpooled - a new Reader instance is created for each message). Configure in line with your applications threading model.
Smooks streams events that can be captured, and inspected, while in-flight or after execution.HtmlReportGenerator
is one such class that inspects in-flight events to go on and generate an HTML report from the execution:
Smookssmooks =newSmooks("/smooks/smooks-transform-x.xml");ExecutionContextexecutionContext =smooks.createExecutionContext();executionContext.getContentDeliveryRuntime().addExecutionEventListener(newHtmlReportGenerator("/tmp/smooks-report.html"));smooks.filterSource(executionContext,newStreamSource(inputStream),newStreamSink(outputStream));
HtmlReportGenerator
is a useful tool in the developer’s arsenal for diagnosing issues, or for comprehending a transformation.
An exampleHtmlReportGenerator
report can be seenonline here.
Of course you can also write and use your ownExecutionEventListener implementations.
Caution | Only use the HTMLReportGenerator in development. When enabled, the HTMLReportGenerator incurs a significant performance overhead and with large message, can even result in OutOfMemory exceptions. |
You can terminate Smooks’s filtering before it reaches the end of a stream. The following config terminates filtering at the end of the customer fragment:
<?xml version="1.0"?><smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd"><!-- Visitors...--> <core:terminateonElement="customer"/></smooks-resource-list>
The default behavior is to terminate at the end of the targeted fragment, on thevisitAfter
event. To terminate at the start of the targeted fragment, on thevisitBefore
event, set theterminateBefore
attribute totrue
:
<?xml version="1.0"?><smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd"><!-- Visitors...--> <core:terminateonElement="customer"terminateBefore="true"/></smooks-resource-list>
TheBean Context is a container for objects which can be accessed within during a Smooks execution. One bean context is created per execution context, that is, perSmooks#filterSource(...)
operation. Provide anorg.smooks.io.sink.JavaSink
object toSmooks#filterSource(...)
if you want the contents of the bean context to be returned at the end of the filtering process:
//Get the data to filterStreamSourcesource =newStreamSource(getClass().getResourceAsStream("data.xml"));//Create a Smooks instance (cachable)Smookssmooks =newSmooks("smooks-config.xml");//Create the JavaSink, which will contain the filter result after filteringJavaSinksink =newJavaSink();//Filter the data from the source, putting the result into the JavaSinksmooks.filterSource(source,sink);//Getting the Order bean which was created by the JavaBean cartridgeOrderorder = (Order)sink.getBean("order");
Resources like visitors access the bean context’s beans at runtime from theBeanContext
. TheBeanContext
is retrieved fromExecutionContext#getBeanContext()
. You should first retrieve aBeanId
from theBeanIdStore
when adding or retrieving objects from theBeanContext
. ABeanId
is a special key that ensures higher performance thenString
keys, howeverString
keys are also supported. TheBeanIdStore
must be retrieved fromApplicationContext#getBeanIdStore()
. ABeanId
object can be created by callingBeanIdStore#register(String)
. If you know that theBeanId
is already registered, then you can retrieve it by callingBeanIdStore#getBeanId(String)
.BeanId
is scoped at the application context. You normally register it in the@PostConstruct
annotated method of your visitor implementation and then reference it as member variable from thevisitBefore
andvisitAfter
methods.
Note | BeanId andBeanIdStore are thread-safe. |
A number of pre-installed beans are available in the bean context at runtime:
The following are examples of how each of these would be used in a FreeMarker template.
${PUUID.execContext}
${PUUID.random}
${PTIME.startMillis}
${PTIME.startNanos}
${PTIME.startDate}
${PTIME.nowMillis}
${PTIME.nowNanos}
${PTIME.nowDate}
Global configuration settings are, as the name implies, configuration options that can be set once and be applied to all resources in a configuration.
Smooks supports two types of globals, default properties and global parameters:
Global Configuration Parameters: Every in a Smooks configuration can specify elements for configuration parameters. These parameter values are available at runtime through the
ResourceConfig
, or are reflectively injected through the@Inject
annotation. Global Configuration Parameters are parameters that are defined centrally (see below) and are accessible to all runtime components via theExecutionContext
(vsResourceConfig
). More on this in the following sections.Default Properties: Specify default values for attributes. These defaults are automatically applied to
ResourceConfig
s when their corresponding does not specify the attribute. More on this in the following section.
Global properties differ from the default properties in that they are not specified on the root element and are not automatically applied to resources.
Global parameters are specified in a<params>
element:
<params> <paramname="xyz.param1">param1-val</param></params>
Global Configuration Parameters are accessible via theExecutionContext
e.g.:
publicvoidvisitAfter(Elementelement,ExecutionContextexecutionContext) {Stringparam1 =executionContext.getConfigParameter("xyz.param1","defaultValueABC"); ....}
Default properties are properties that can be set on the root element of a Smooks configuration and have them applied to all resource configurations in smooks-conf.xml file. For example, if you have a resource configuration file in which all the resource configurations have the same selector value, you could specify adefault-target-profile=order
to save specifying the profile on every resource configuration:
<?xml version="1.0"?><smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"default-target-profile="order"> <resource-config> <resource>com.acme.VisitorA</resource> ... </resource-config> <resource-config> <resource>com.acme.VisitorB</resource> ... </resource-config><smooks-resource-list>
The following default configuration options are available:
default-target-profile*
: Default target profile that will be applied to all resources in the smooks configuration file, where a target-profile is not defined.default-condition-ref
: Refers to a global condition by the conditions id. This condition is applied to resources that define an empty "condition" element (i.e. ) that does not reference a globally defined condition.
Smooks configurations are easily modularized through use of the<import>
element. This allows you to split Smooks configurations into multiple reusable configuration files and then compose the top level configurations using the<import>
element e.g.
<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"> <importfile="bindings/order-binding.xml" /> <importfile="templates/order-template.xml" /></smooks-resource-list>
You can also inject replacement tokens into the imported configuration by using<param>
sub-elements on the<import>
. This allows you to make tweaks to the imported configuration.
<!-- Top level configuration...--><smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"> <importfile="bindings/order-binding.xml"> <paramname="orderRootElement">order</param> </import></smooks-resource-list>
<!-- Imported parameterized bindings/order-binding.xml configuration...--><smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:jb="https://www.smooks.org/xsd/smooks/javabean-1.6.xsd"> <jb:beanbeanId="order"class="org.acme.Order"createOnElement="@orderRootElement@"> ..... </jb:bean></smooks-resource-list>
Note how the replacement token injection points are specified using@tokenname@
.
When using Smooks standalone you are in full control of the type of output that Smooks produces since you specify it by passing a certain Sink to the filter method. But when integrating Smooks with other frameworks (JBossESB, Mule, Camel, and others) this needs to be specified inside the framework’s configuration. Starting with version 1.4 of Smooks you can now declare the data types that Smooks produces and you can use the Smooks api to retrieve the Sink(s) that Smooks exports.
To declare the type of sink that Smooks produces you use the 'exports' element as shown below:
<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd"> <core:exports> <core:resulttype="org.smooks.io.sink.JavaSink"/> </core:exports></smooks-resource-list>
The newly added exports element declares the results that are produced by this Smooks configuration. A exports element can contain one or more result elements. A framework that uses Smooks could then perform filtering like this:
// Get the Exported types that were configured.Exportsexports =Exports.getExports(smooks.getApplicationContext());if (exports.hasExports()){// Create the instances of the Sink types.// (Only the types, i.e the Class type are declared in the 'type' attribute.Sink[]sinks =exports.createSinks();smooks.filterSource(executionContext,getSource(exchange),sinks);// The Sink(s) will now be populate by Smooks filtering process and// available to the framework in question.}
There might also be cases where you only want a portion of the result extracted and returned. You can use the ‘extract’ attribute to specify this:
<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd"> <core:exports> <core:resulttype="org.smooks.io.sink.JavaSink"extract="orderBean"/> </core:exports></smooks-resource-list>
The extract attribute is intended to be used when you are only interested in a sub-section of a produced result. In the example above we are saying that we only want the object named orderBean to be exported. The other contents of the JavaSink will be ignored. Another example where you might want to use this kind of extracting could be when you only want a ValidationSink of a certain type, for example to only return validation errors.
Below is an example of using the extracts option from an embedded framework:
// Get the Exported types that were configured.Exportsexports =Exports.getExports(smooks.getApplicationContext());if (exports.hasExports()){// Create the instances of the Sink types.// (Only the types, i.e the Class type are declared in the 'type' attribute.Sink[]sinks =exports.createSinks();smooks.filterSource(executionContext,getSource(exchange),sinks);List<object>objects =Exports.extractSinks(sinks,exports);// Now make the object available to the framework that this code is running:// Camel, JBossESB, Mule, etc...}
Like with any Software, when configured or used incorrectly, performance can be one of the first things to suffer. Smooks is no different in this regard.
Cache and reuse the Smooks Object. Initialization of Smooks takes some time and therefore it is important that it is reused.
Pool reader instances where possible. This can result in a huge performance boost, as some readers are very expensive to create.
If possible, useSAX NG filtering. However, you need to check that all Smooks cartridges in use are SAX NG compatible. SAX NG processing is faster than DOM processing and has a consistently small memory footprint. It is especially recommended for processing large messages. See theFiltering Process Selection (DOM or SAX?) section. SAX NG is the default filter since Smooks 2.
Turn off debug logging. Smooks performs some intensive debug logging in parts of the code. This can result in significant additional processing overhead and lower throughput. Also remember that NOT having your logging configured (at all) may result in debug log statements being executed!!
Contextual selectors can obviously have a negative effect on performance e.g. evaluating a match for a selector like "a/b/c/d/e" will obviously require more processing than that of a selector like "d/e". Obviously there will be situations where your data model will require deep selectors, but where it does not, you should try to optimize them for the sake of performance.
Unit testing with Smooks is simple:
publicclassMyMessageTransformTest {@Testpublicvoidtest_transform()throwsException {Smookssmooks =newSmooks(getClass().getResourceAsStream("smooks-config.xml"));try {Sourcesource =newStreamSource(getClass().getResourceAsStream("input-message.xml" ) );StringSinksink =newStringSink();smooks.filterSource(source,sink);// compare the expected xml with the transformation result.XMLUnit.setIgnoreWhitespace(true);XMLAssert.assertXMLEqual(newInputStreamReader(getClass().getResourceAsStream("expected.xml")),newStringReader(sink.getResult())); }finally {smooks.close(); } }}
The test case above usesXMLUnit.
The following maven dependency was used for xmlunit in the above test:
<dependency> <groupId>xmlunit</groupId> <artifactId>xmlunit</artifactId> <version>1.1</version></dependency>
One of the main features introduced in Smooks v1.0 is the ability to process huge messages (Gbs in size). Smooks supports the following types of processing for huge messages:
One-to-One Transformation: This is the process of transforming a huge message from its source format (e.g. XML), to a huge message in a target format e.g. EDI, CSV, XML etc.
Splitting & Routing: Splitting of a huge message into smaller (more consumable) messages in any format (EDI, XML, Java, etc…) andRouting of those smaller messages to a number of different destination types (filesystem, JMS, database).
Persistence: Persisting the components of the huge message to a database, from where they can be more easily queried and processed. Within Smooks, we consider this to be a form of Splitting and Routing (routing to a database).
All of the above is possible without writing any code (i.e. in a declarative manner). Typically, any of the above types of processing would have required writing quite a bit of ugly/unmaintainable code. It might also have been implemented as a multi-stage process where the huge message is split into smaller messages (stage #1) and then each smaller message is processed in turn to persist, route, etc… (stage #2). This would all be done in an effort to make that ugly/unmaintainable code a little more maintainable and reusable. With Smooks, most of these use-cases can be handled without writing any code. As well as that, they can also be handled in a single pass over the source message, splitting and routing in parallel (plus routing to multiple destinations of different types and in different formats).
Note | Be sure to read the section onJava Binding. |
If the requirement is to process a huge message by transforming it into a single message of another format, the easiest mechanism with Smooks is to apply multiple FreeMarker templates to the Source message Event Stream, outputting to aSmooks.filterSource
sink stream.
This can be done in one of 2 ways with FreeMarker templating, depending on the type of model that’s appropriate:
Using FreeMarker + NodeModels for the model.
Using FreeMarker + a Java Object model for the model. The model can be constructed from data in the message, using the Javabean Cartridge.
Option #1 above is obviously the option of choice, if the tradeoffs are OK for your use case. Please see the FreeMarker Templating docs for more details.
The following images shows an message, as well as the message to which we need to transform the message:
Imagine a situation where the message contains millions of elements. Processing a huge message in this way with Smooks and FreeMarker (using NodeModels) is quite straightforward. Because the message is huge, we need to identify multiple NodeModels in the message, such that the runtime memory footprint is as low as possible. We cannot process the message using a single model, as the full message is just too big to hold in memory. In the case of the message, there are 2 models, one for the main data (blue highlight) and one for the data (beige highlight):
So in this case, the most data that will be in memory at any one time is the main order data, plus one of the order-items. Because the NodeModels are nested, Smooks makes sure that the order data NodeModel never contains any of the data from the order-item NodeModels. Also, as Smooks filters the message, the order-item NodeModel will be overwritten for every order-item (i.e. they are not collected). SeeSAX NG.
Configuring Smooks to capture multiple NodeModels for use by the FreeMarker templates is just a matter of configuring theDomModelCreator visitor, targeting it at the root node of each of the models. Note again that Smooks also makes this available to SAX filtering (the key to processing huge message). The Smooks configuration for creating the NodeModels for this message are:
<?xml version="1.0"?><smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd"xmlns:ftl="https://www.smooks.org/xsd/smooks/freemarker-2.0.xsd"><!-- Create 2 NodeModels. One high level model for the "order" (header, etc...) and then one for the "order-item" elements...--> <resource-configselector="order,order-item"> <resource>org.smooks.engine.resource.visitor.dom.DomModelCreator</resource> </resource-config><!-- FreeMarker templating configs to be added below...-->
Now the FreeMarker templates need to be added. We need to apply 3 templates in total:
A template to output the order "header" details, up to but not including the order items.
A template for each of the order items, to generate the elements in the .
A template to close out the message.
With Smooks, we implement this by defining 2 FreeMarker templates. One to cover #1 and #3 (combined) above, and a seconds to cover the elements.
The first FreeMarker template is targeted at the element and looks as follows:
<ftl:freemarkerapplyOnElement="order-items"> <ftl:template><!--<salesorder> <details> <orderid>${order.@id}</orderid> <customer> <id>${order.header.customer.@number}</id> <name>${order.header.customer}</name> </customer> </details> <itemList> <?TEMPLATE-SPLIT-PI?> </itemList></salesorder>--> </ftl:template></ftl:freemarker>
You will notice the `<?TEMPLATE-SPLIT-PI?>` processing instruction. This tells Smooks where to split the template, outputting the first part of the template at the start of the element, and the other part at the end of the element. The element template (the second template) will be output in between.
The second FreeMarker template is very straightforward. It simply outputs the elements at the end of every element in the source message:
<ftl:freemarkerapplyOnElement="order-item"> <ftl:template><!-- <item> <id>${.vars["order-item"].@id}</id> <productId>${.vars["order-item"].product}</productId> <quantity>${.vars["order-item"].quantity}</quantity> <price>${.vars["order-item"].price}</price></item>--> </ftl:template> </ftl:freemarker></smooks-resource-list>
Because the second template fires on the end of the elements, it effectively generates output into the location of the<?TEMPLATE-SPLIT-PI?> Processing Instruction in the first template. Note that the second template could have also referenced data in the "order" NodeModel.
And that’s it! This is available as a runnable example in the Tutorials section.
This approach to performing a One-to-One Transformation of a huge message works simply because the only objects in memory at any one time are the order header details and the current details (in the Virtual Object Model).? Obviously it can’t work if the transformation is so obscure as to always require full access to all the data in the source message e.g. if the messages needs to have all the order items reversed in order (or sorted).? In such a case however, you do have the option of routing the order details and items to a database and then using the database’s storage, query and paging features to perform the transformation.
Smooks supports a number of options when it comes to splitting and routing fragments. The ability to split the stream into fragments and route these fragments to different endpoints (File, JMS, etc…) is a fundamental capability. Smooks improves this capability with the following features:
Basic Fragment Splitting: basic splitting means that no fragment transformation happens prior to routing. Basic splitting and routing involves defining the XPath of the fragment to be split out and defining a routing component (e.g., Apache Camel) to route that unmodified split fragment.
Complex Fragment Splitting: basic fragment splitting works for many use cases and is what most splitting and routing solutions offer. Smooks extends the basic splitting capabilities by allowing you to perform transformations on the split fragment data before routing is applied. For example, merging in the customer-details order information with each order-item information before performing the routing order-item split fragment routing.
In-Flight Stream Splitting & Routing (Huge Message Support): Smooks is able to process gigabyte streams because it can perform in-flight event routing; events are not accumulated when the
max.node.depth
parameter is left unset.Multiple Splitting and Routing: conditionally split and route multiple fragments (different formats XML, EDI, POJOs, etc…) to different endpoints in a single filtering pass of the source. One could route anOrderItem Java instance to theHighValueOrdersValidation JMS queue for order items with a value greater than $1,000 and route all order items as XML/JSON to an HTTP endpoint for logging.
All existing Smooks functionality (Java Binding, EDI processing, etc…) is built through extension of a number of well-defined APIs. We will look at these APIs in the coming sections.
The main extension points/APIs in Smooks are:
Reader APIs: Those for processing Source/Input data (Readers) so as to make it consumable by other Smooks components as a series of well defined hierarchical events (based on the SAX event model) for all of the message fragments and sub-fragments.
Visitor APIs: Those for consuming the message fragment SAX events produced by a source/input reader.
Another very important aspect of writing Smooks extensions is how these components are configured. Because this is common to all Smooks components, we will look at this first.
All Smooks components are configured in exactly the same way. As far as the Smooks Core code is concerned, all Smooks components are "resources" and are configured via a ResourceConfig instance, which we talked about in earlier sections.
Smooks provides mechanisms for constructing namespace (XSD) specific XML configurations for components, but the most basic configuration (and the one that maps directly to the ResourceConfig class) is the basic XML configuration from the base configuration namespace (https://www.smooks.org/xsd/smooks-2.0.xsd).
<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"> <resource-configselector=""> <resource></resource> <paramname=""></param> </resource-config></smooks-resource-list>
Where:
The
selector
attribute is the mechanism by which the resource is "selected" e.g. can be an XPath for a visitor. We’ll see more of this in the coming sections.The
resource
element is the actual resource. This can be a Java Class name or some other form of resource (such as a template). For the purposes of this section however, lets just assume the resource to by a Java Class name.The
param
elements are configuration parameters for the resource defined in the resource element.
Smooks takes care of all the details of creating the runtime representation of the resource (e.g. constructing the class named in the resource element) and injecting all the configuration parameters. It also works out what the resource type is, and from that, how to interpret things like the selector e.g., if the resource is a visitor instance, it knows the selector is an XPath, selecting a Source message fragment.
After your component has been created, you need to configure it with the element details. This is done using the@Inject
annotation.
TheInject annotation reflectively injects the named parameter (from the elements) having the same name as the annotated property itself (the name can actually be different, but by default, it matches against the name of the component property).
Suppose we have a component as follows:
publicclassDataSeeder {@InjectprivateFileseedDataFile;publicFilegetSeedDataFile() {returnseedDataFile; }// etc...}
We configure this component in Smooks as follows:
<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"> <resource-configselector="dataSeeder"> <resource>com.acme.DataSeeder</resource> <paramname="seedDataFile">./seedData.xml</param> </resource-config></smooks-resource-list>
This annotation eliminates a lot of noisy code from your component because it:
Handles decoding of the value before setting it on the annotated component property. Smooks provides type converters for all the main types (Integer, Double, File, Enums, etc…), but you can implement and use a custom TypeConverter where the out-of-the-box converters don’t cover specific decoding requirements. Smooks will automatically use your custom converter if it is registered. See the TypeConverter Javadocs for details on registering a TypeConverter implementation such that Smooks will automatically locate it for converting a specific data type.
Supports enum constraints for the injected property, generating a configuration exception where the configured value is not one of the defined choice values. For example, you may have a property which has a constrained value set of "ON" and "OFF". You can use an enum for the property type to constrain the value, raise exceptions, etc…:
@InjectprivateOnOffEnumfoo;
Can specify default property values:
@InjectprivateBooleanfoo =true;
Can specify whether the property is optional:
@Injectprivatejava.util.Optional<Boolean>foo;
By default, all properties are required but setting a default implicitly marks the property as being optional.
TheInject annotation is great for configuring your component with simple values, but sometimes your component needs more involved configuration for which we need to write some "initialization" code. For this, Smooks provides@PostConstruct
.
On the other side of this, there are times when we need to undo work performed during initialization when the associated Smooks instance is being discarded (garbage collected) e.g. to release some resources acquired during initialization, etc… For this, Smooks provides the@PreDestroy
.
The basic initialization/un-initialization sequence can be described as follows:
smooks =newSmooks(..);// Initialize all annotated components@PostConstruct// Use the smooks instance through a series of filterSource invocations...smooks.filterSource(...);smooks.filterSource(...);smooks.filterSource(...); ...etc ...smooks.close();// Uninitialize all annotated components@PreDestroy
In the following example, lets assume we have a component that opens multiple connections to a database on initialization and then needs to release all those database resources when we close the Smooks instance.
publicclassMultiDataSourceAccessor {@InjectprivateFiledataSourceConfig;Map<String,Datasource>datasources =newHashMap<String,Datasource>();@PostConstructpublicvoidcreateDataSources() {// Add DS creation code here....// Read the dataSourceConfig property to read the DS configs... }@PreDestroypublicvoidreleaseDataSources() {// Add DS release code here.... }// etc...}
Notes:
@PostConstruct
and@PreDestroy
methods must be public, zero-arg methods.@Inject
properties are all initialized before the first@PostConstruct
method is called. Therefore, you can use@Inject
component properties as input to the initialization process.@PreDestroy
methods are all called in response to a call to theSmooks.close
method.
Smooks supports a mechanism for defining custom configuration namespaces for components. This allows you to support custom, XSD based (validatable), configurations for your components Vs treating them all as vanilla Smooks resources via the base configuration.
The basic process involves:
Writing an configuration XSD for your component that extends the basehttps://www.smooks.org/xsd/smooks-2.0.xsd configuration namespace. This XSD must be supplied on the classpath with your component. It must be located in the/META-INF folder and have the same path as the namespace URI. For example, if your extended namespace URI ishttp://www.acme.com/schemas/smooks/acme-core-1.0.xsd, then the physical XSD file must be supplied on the classpath in "/META-INF/schemas/smooks/acme-core-1.0.xsd".
Writing a Smooks configuration namespace mapping configuration file that maps the custom namespace configuration into a
ResourceConfig
instance. This file must be named (by convention) based on the name of the namespace it is mapping and must be physically located on the classpath in the same folder as the XSD. Extending the above example, the Smooks mapping file would be "/META-INF/schemas/smooks/acme-core-1.0.xsd-smooks.xml". Note the "-smooks.xml" postfix.
The easiest way to get familiar with this mechanism is by looking at existing extended namespace configurations within the Smooks code itself. All Smooks components (including e.g. the Java Binding functionality) use this mechanism for defining their configurations. Smooks Core itself defines a number of extended configuration namesaces,as can be seen in the source.
Implementing and configuring a new Source Reader for Smooks is straightforward. The Smooks specific parts of the process are easy and are not really the issue. The level of effort involved is a function of the complexity of the Source data format for which you are implementing the reader.
Implementing a Reader for your custom data format immediately opens all Smooks capabilities to that data format e.g. Java Binding, Templating, Persistence, Validation, Splitting & Routing, etc… So a relatively small investment can yield a quite significant return. The only requirement, from a Smooks perspective, is that the Reader implements the standardorg.xml.sax.XMLReader
interface from the Java JDK. However, if you want to be able to configure the Reader implementation, it needs to implement theorg.smooks.api.resource.reader.SmooksXMLReader
interface (which is just an extension oforg.xml.sax.XMLReader
). So, you can easily use (or extend) an existingorg.xml.sax.XMLReader
implementation, or implement a new Reader from scratch.
Let’s now look at a simple example of implementing a Reader for use with Smooks. In this example, we will implement a Reader that can read a stream of Comma Separated Value (CSV) records, converting the CSV stream into a stream of SAX events that can be processed by Smooks, allowing you to do all the things Smooks allows (Java Binding, etc…).
We start by implementing the basic Reader class:
publicclassMyCSVReaderimplementsSmooksXMLReader {// Implement all of the XMLReader methods...}
Two methods from theXMLReader
interface are of particular interest:
setContentHandler(ContentHandler): This method is called by Smooks Core. It sets the
ContentHandler
instance for the reader. TheContentHandler
instance methods are called from inside theparse(InputSource) method.parse(InputSource): This is the method that receives the Source data input stream, parses it (i.e. in the case of this example, the CSV stream) and generates the SAX event stream through calls to the
ContentHandler
instance supplied in thesetContentHandler(ContentHandler)
method.
We need to configure our CSV reader with the names of the fields associated with the CSV records. Configuring a custom reader implementation is the same as for any Smooks component, as described in theConfiguring Smooks Components section above.
So focusing a little more closely on the above methods and our fields configuration:
publicclassMyCSVReaderimplementsSmooksXMLReader {privateContentHandlercontentHandler;@InjectprivateString[]fields;// Auto decoded and injected from the "fields" <param> on the reader config.publicvoidsetContentHandler(ContentHandlercontentHandler) {this.contentHandler =contentHandler; }publicvoidparse(InputSourcecsvInputSource)throwsIOException,SAXException {// TODO: Implement parsing of CSV Stream... }// Other XMLReader methods...}
So now we have our basic Reader implementation stub. We can start writing unit tests to test the new reader implementation.
First thing we need is some sample CSV input. Lets use a simple list of names:
Tom,FennellyMike,FennellyMark,Jones
Second thing we need is a test Smooks configuration to configure Smooks with our MyCSVReader. As stated before, everything in Smooks is a resource and can be configured with the basic configuration. While this works fine, it’s a little noisy, so Smooks provides a basic configuration element specifically for the purpose of configuring a reader. The configuration for our test looks like the following:
<?xml version="1.0"?><smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"> <readerclass="com.acme.MyCSVReader"> <params> <paramname="fields">firstname,lastname</param> </params> </reader></smooks-resource-list>
And of course we need the JUnit test class:
publicclassMyCSVReaderTestextendsTestCase {publicvoidtest() {Smookssmooks =newSmooks(getClass().getResourceAsStream("mycsvread-config.xml"));StringSinkserializedCSVEvents =newStringSink();smooks.filterSource(newStreamSource(getClass().getResourceAsStream("names.csv")),serializedCSVEvents);System.out.println(serializedCSVEvents);// TODO: add assertions, etc... }}
So now we have a basic setup with our custom Reader implementation, as well as a unit test that we can use to drive our development. Of course, our readerparse
method is not doing anything yet and our test class is not making any assertions, etc… So lets start implementing theparse
method:
publicclassMyCSVReaderimplementsSmooksXMLReader {privateContentHandlercontentHandler;@InjectprivateString[]fields;// Auto decoded and injected from the "fields" <param> on the reader config.publicvoidsetContentHandler(ContentHandlercontentHandler) {this.contentHandler =contentHandler; }publicvoidparse(InputSourcecsvInputSource)throwsIOException,SAXException {BufferedReadercsvRecordReader =newBufferedReader(csvInputSource.getCharacterStream());StringcsvRecord;// Send the start of message events to the handler...contentHandler.startDocument();contentHandler.startElement(XMLConstants.NULL_NS_URI,"message-root","",newAttributesImpl());csvRecord =csvRecordReader.readLine();while(csvRecord !=null) {String[]fieldValues =csvRecord.split(",");// perform checks...// Send the events for this record...contentHandler.startElement(XMLConstants.NULL_NS_URI,"record","",newAttributesImpl());for(inti =0;i <fields.length;i++) {contentHandler.startElement(XMLConstants.NULL_NS_URI,fields[i],"",newAttributesImpl());contentHandler.characters(fieldValues[i].toCharArray(),0,fieldValues[i].length());contentHandler.endElement(XMLConstants.NULL_NS_URI,fields[i],""); }contentHandler.endElement(XMLConstants.NULL_NS_URI,"record","");csvRecord =csvRecordReader.readLine(); }// Send the end of message events to the handler...contentHandler.endElement(XMLConstants.NULL_NS_URI,"message-root","");contentHandler.endDocument(); }// Other XMLReader methods...}
If you run the unit test class now, you should see the following output on the console (formatted):
<message-root> <record> <firstname>Tom</firstname> <lastname>Fennelly</lastname> </record> <record> <firstname>Mike</firstname> <lastname>Fennelly</lastname> </record> <record> <firstname>Mark</firstname> <lastname>Jones</lastname> </record></message-root>
After this, it is just a case of expanding the tests, hardening the reader implementation code, etc…
Now you can use your reader to perform all sorts of operations supported by Smooks. As an example, the following configuration could be used to bind the names into a List of PersonName objects:
<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:jb="https://www.smooks.org/xsd/smooks/javabean-1.6.xsd"> <readerclass="com.acme.MyCSVReader"> <params> <paramname="fields">firstname,lastname</param> </params> </reader> <jb:beanbeanId="peopleNames"class="java.util.ArrayList"createOnElement="message-root"> <jb:wiringbeanIdRef="personName" /> </jb:bean> <jb:beanbeanId="personName"class="com.acme.PersonName"createOnElement="message-root/record"> <jb:valueproperty="first"data="record/firstname" /> <jb:valueproperty="last"data="record/lastname" /> </jb:bean></smooks-resource-list>
And then a test for this configuration could look as follows:
publicclassMyCSVReaderTestextendsTestCase {publicvoidtest_java_binding() {Smookssmooks =newSmooks(getClass().getResourceAsStream("java-binding-config.xml"));JavaSinkjavaSink =newJavaSink();smooks.filterSource(newStreamSource(getClass().getResourceAsStream("names.csv")),javaSink);List<PersonName>peopleNames = (List<PersonName>)javaSink.getBean("peopleNames");// TODO: add assertions etc }}
For more on Java Binding, see theJava Binding section.
Tips:
Reader instances are never used concurrently. Smooks Core will create a new instance for every message, or, will pool and reuse instances as per thereaderPoolSize FilterSettings property.
If your Reader requires access to the Smooks ExecutionContext for the current filtering context, your Reader needs to implement the
SmooksXMLReader
interface.If your Source data is a binary data stream your Reader must implement the
StreamReader
interface. See next section.You can programmatically configure your reader (e.g. in your unit tests) using a
GenericReaderConfigurator
instance, which you then set on the Smooks instance.While the basic configuration is fine, it’s possible to define a custom configuration namespace (XSD) for your custom CSV Reader implementation. This topic is not covered here. Review the source code to see the extended configuration namespace for the Reader implementations supplied with Smooks (out-of-the-box) e.g. the EDIReader, CSVReader, JSONReader, etc… From this, you should be able to work out how to do this for your own custom Reader.
Prior to Smooks v1.5, binary readers needed to implement theStreamReader
interface. This is no longer a requirement. AllXMLReader
instances receive anInputSource
(to their parse method) that contains anInputStream
if theInputStream
was provided in theStreamSource
passed in theSmooks.filterSource
method call. This means that allXMLReader
instance are guaranteed to receive anInputStream
if one is available, so no need to mark theXMLReader
instance.
In Smooks v1.5 we tried to make it a little easier to implement a custom reader for reading flat file data formats. By flat file we mean "record" based data formats, where the data in the message is structured in flat records as opposed to a more hierarchical structure. Examples of this would be Comma Separated Value (CSV) and Fixed Length Field (FLF). The new API introduced in Smooks v1.5 should remove the complexity of the XMLReader API (as outlined above).
The API is composed of 2 interfaces plus a number of support classes.These interfaces work as a pair. They need to be implemented if you wish to use this API for processing a custom Flat File format not already supported by Smooks.
/** * {@link RecordParser} factory class. * <p/> * Configurable by the Smooks {@link org.smooks.cdr.annotation.Configurator} */publicinterfaceRecordParserFactory {/** * Create a new Flat File {@link RecordParser} instance. * @return A new {@link RecordParser} instance. */RecordParsernewRecordParser();}/** * Flat file Record Parser. */publicinterfaceRecordParser<TextendsRecordParserFactory> {/** * Set the parser factory that created the parser instance. * @param factory The parser factory that created the parser instance. */voidsetRecordParserFactory(Tfactory);/** * Set the Flat File data source on the parser. * @param source The flat file data source. */voidsetDataSource(InputSourcesource);/** * Parse the next record from the message stream and produce a {@link Record} instance. * @return The records instance. * @throws IOException Error reading message stream. */RecordnextRecord()throwsIOException;}
Obviously theRecordParserFactory
implementation is responsible for creating theRecordParser
instances for the Smooks runtime. TheRecordParserFactory
is the class that Smooks configures, so it is in here you place all your@Inject
details. The createdRecordParser
instances are supplied with a reference to theRecordParserFactory
instance that created them, so it is easy enough the provide them with access to the configuration via getters on theRecordParserFactory
implementation.
TheRecordParser
implementation is responsible for parsing out each record (aRecord
contains a set ofFields
) in thenextRecord()
method. Each instance is supplied with theReader
to the message stream via thesetReader(Reader)
method. TheRecordParser
should store a reference to thisReader
and use it in thenextRecord()
method. A new instance of a givenRecordParser
implementation is created for each message being filtered by Smooks.
Configuring your implementation in the Smooks configuration is as simple as the following:
<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:ff="https://www.smooks.org/xsd/smooks/flatfile-1.6.xsd"> <ff:readerfields="first,second,third"parserFactory="com.acme.ARecordParserFactory"> <params> <paramname="aConfigParameter">aValue</param> <paramname="bConfigParameter">bValue</param> </params> </ff:reader><!-- Other Smooks configurations e.g. <jb:bean> configurations--></smooks-resource-list>
The Flat File configuration also supports basic Java binding configurations, inlined in the reader configuration.
<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:ff="https://www.smooks.org/xsd/smooks/flatfile-1.6.xsd"> <ff:readerfields="firstname,lastname,gender,age,country"parserFactory="com.acme.PersonRecordParserFactory"><!-- The field names must match the property names on the Person class.--> <ff:listBindingbeanId="people"class="com.acme.Person" /> </ff:reader></smooks-resource-list>
To execute this configuration:
Smookssmooks =newSmooks(configStream);JavaSinksink =newJavaSink();smooks.filterSource(newStreamSource(messageReader),sink);List<Person>people = (List<Person>)sink.getBean("people");
Smooks also supports creation of Maps from the record set:
<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:ff="https://www.smooks.org/xsd/smooks/flatfile-1.6.xsd"> <ff:readerfields="firstname,lastname,gender,age,country"parserFactory="com.acme.PersonRecordParserFactory"> <ff:mapBindingbeanId="people"class="com.acme.Person"keyField="firstname" /> </ff:reader></smooks-resource-list>
The above configuration would produce a Map of Person instances, keyed by the "firstname" value of each Person. It would be executed as follows:
Smookssmooks =newSmooks(configStream);JavaSinksink =newJavaSink();smooks.filterSource(newStreamSource(messageReader),sink);Map<String,Person>people = (Map<String,Person>)sink.getBean("people");Persontom =people.get("Tom");Personmike =people.get("Mike");
Virtual Models are also supported, so you can define theclass attribute as a java.util.Map and have the record field values bound into Map instances, which are in turn added to a List or a Map.
VariableFieldRecordParser and VariableFieldRecordParserFactory are abstract implementations of theRecordParser
andRecordParserFactory
interface. They provide very useful base implementations for a Flat File Reader, providing base support for:
The utility java binding configurations as outlined in the previous section.
Support for "variable field" records i.e. a flat file message that contains multiple record definitions. The different records are identified by the value of the first field in the record and are defined as follows:
fields="book[name,author] | magazine[*]"
. Note the record definitions are pipe separated. "book" records will have a first field value of "book" while "magazine" records will have a first field value of "magazine". Astrix ("*") as the field definition for a record basically tells the reader to generate the field names in the generated events (e.g. "field_0", "field_1", etc…).The ability to read the next record chunk, with support for a simple record delimiter, or a regular expression (regex) pattern that marks the beginning of each record.
The CSV and Regex readers are implemented using these abstract classes. See thecsv-variable-record andflatfile-to-xml-regex examples. TheRegex Reader implementation is also a good example that can be used as a basis for your own custom flat file reader.
Visitors are the workhorse of Smooks. Most of the out-of-the-box functionality in Smooks (Java binding, templating, persistence, etc…) was created by creating one or more visitors. Visitors often collaborate through theExecutionContext
andApplicationContext
objects, accomplishing a common goal by working together.
Important | Smooks treats all visitors as stateless objects. A visitor instance must be usable concurrently across multiple messages, that is, across multiple concurrent calls to theSmooks.filterSource method.All state associated with the currentSmooks.filterSource execution must be stored in theExecutionContext . For more details see theExecutionContext and ApplicationContex section. |
The SAX NG visitor API is made up of a number of interfaces. These interfaces are based on theSAX events that a SaxNgVisitor implementation can capture and processes. Depending on the use case being solved with the SaxNgVisitor implementation, you may need to implement one or all of these interfaces.
BeforeVisitor
: Captures thestartElement SAX event for the targeted fragment element:
publicinterfaceBeforeVisitorextendsVisitor {voidvisitBefore(Elementelement,ExecutionContextexecutionContext);}
ChildrenVisitor
: Captures thecharacter based SAX events for the targeted fragment element, as well as Smooks generated (pseudo) events corresponding to thestartElement events of child fragment elements:
publicinterfaceChildrenVisitorextendsVisitor {voidvisitChildText(CharacterDatacharacterData,ExecutionContextexecutionContext)throwsSmooksException,IOException;voidvisitChildElement(ElementchildElement,ExecutionContextexecutionContext)throwsSmooksException,IOException;}
AfterVisitor
: Captures theendElement SAX event for the targeted fragment element:
publicinterfaceAfterVisitorextendsVisitor {voidvisitAfter(Elementelement,ExecutionContextexecutionContext);}
As a convenience for those implementations that need to capture all the SAX events, the above three interfaces are pulled together into a single interface in theElementVisitor
interface.
Illustrating these events using a piece of XML:
<message> <target-fragment> <--- BeforeVisitor.visitBefore Text!! <--- ChildrenVisitor.visitChildText <child> <--- ChildrenVisitor.visitChildElement </child> </target-fragment> <--- AfterVisitor.visitAfter</message>
Note | Of course, the above is just an illustration of a Source message event stream and it looks like XML, but could be EDI, CSV, JSON, etc… Think of this as just an XML serialization of a Source message event stream, serialized as XML for easy reading. |
Element: As can be seen from the above SAX NG interfaces,Element
type is passed in all method calls. This object contains details about the targeted fragment element, including attributes and their values. We’ll discuss text accumulation andStreamSink
writing in the coming sections.
SAX is a stream based processing model. It doesn’t create a Document Object Model (DOM) of any form. It doesn’t "accumulate" event data in any way. This is why it is a suitable processing model for processing huge message streams.
TheElement
will always contain attributes associated with the targeted element, but will not contain the fragment child text data, whose SAX events (ChildrenVisitor.visitChildText
) occur between theBeforeVisitor.visitBefore
andAfterVisitor.visitAfter
events (see above illustration). The filter does not accumulate text events on theElement
because, as already stated, that could result in a significant performance drain. Of course the downside to this is the fact that if yourSaxNgVisitor
implementation needs access to the text content of a fragment, you need to explicitly tell Smooks toaccumulate text for the targeted fragment. This is done by stashing the text into a memento from within theChildrenVisitor.visitChildText
method and then restoring the memento from within theAfterVisitor.visitAfter
method implementation of yourSaxNgVisitor
as shown below:
publicclassMyVisitorimplementsChildrenVisitor,AfterVisitor {@OverridepublicvoidvisitChildText(CharacterDatacharacterData,ExecutionContextexecutionContext) {executionContext.getMementoCaretaker().stash(newTextAccumulatorMemento(newNodeVisitable(characterData.getParentNode()),this),textAccumulatorMemento ->textAccumulatorMemento.accumulateText(characterData.getTextContent())); }@OverridepublicvoidvisitChildElement(ElementchildElement,ExecutionContextexecutionContext) { }@OverridepublicvoidvisitAfter(Elementelement,ExecutionContextexecutionContext) {TextAccumulatorMementotextAccumulatorMemento =newTextAccumulatorMemento(newNodeVisitable(element),this);executionContext.getMementoCaretaker().restore(textAccumulatorMemento);StringfragmentText =textAccumulatorMemento.getTextContent();// ... etc ... }}
It is a bit ugly having to implementChildrenVisitor.visitChildText
just to tell Smooks to accumulate the text events for the targeted fragment. For that reason, we have the@TextConsumer
annotation that can be used to annotate yourSaxNgVisitor
implementation, removing the need to implement theChildrenVisitor.visitChildText
method:
@TextConsumerpublicclassMyVisitorimplementsAfterVisitor {publicvoidvisitAfter(Elementelement,ExecutionContextexecutionContext) {StringfragmentText =element.getTextContent();// ... etc ... }}
Note that the complete fragment text will not be available until theAfterVisitor.visitAfter
event.
TheSmooks.filterSource(Source, Sink)
method can take one or more of a number of differentSink
type implementations, one of which is theStreamSink
class (seeMultiple results/sinks). By default, Smooks will always serialize the full Source event stream as XML to anyStreamSink
instance provided to theSmooks.filterSource(Source, Sink)
method.
So, if the Source provided to theSmooks.filterSource(Source, Sink)
method is an XML stream and aStreamSink
instance is provided as one of theSink
instances, the Source XML will be written out to theStreamSink
unmodified, unless the Smooks instance is configured with one or moreSaxNgVisitor
implementations that modify one or more fragments. In other words, Smooks streams the Source in and back out again through theStreamSink
instance. Default serialization can be turned on/off byconfiguring the filter settings.
If you want to modify the serialized form of one of the message fragments (i.e. "transform"), you need to implement aSaxNgVisitor
to do so and target it at the message fragment using an XPath-like expression.
Note | Of course, you can also modify the serialized form of a message fragment using one of the out-of-the-boxTemplating components. These components are alsoSaxNgVisitor implementations. |
The key to implementing aSaxNgVisitor
geared towards transforming the serialized form of a fragment is telling Smooks that theSaxNgVisitor
implementation in question will be writing to theStreamSink
. You need to tell Smooks this because Smooks supports targeting of multipleSaxNgVisitor
implementations at a single fragment, but only oneSaxNgVisitor
is allowed to write to theStreamSink
, per fragment. If a secondSaxNgVisitor
attempts to write to theStreamSink
, aSAXWriterAccessException
will result and you will need to modify your Smooks configuration.
In order to be "the one" that writes to theStreamSink, theSaxNgVisitor needs toacquire ownership of theWriter to theStreamSink. It does this by simply making a call to theExecutionContext.getWriter().write(…) method from inside theBeforeVisitor.visitBefore methods implementation:
publicclassMyVisitorimplementsElementVisitor {@OverridepublicvoidvisitBefore(Elementelement,ExecutionContextexecutionContext) {Writerwriter =executionContext.getWriter();// ... write the start of the fragment... }@OverridepublicvoidvisitChildText(CharacterDatacharacterData,ExecutionContextexecutionContext) {Writerwriter =executionContext.getWriter();// ... write the child text... }@OverridepublicvoidvisitChildElement(ElementchildElement,ExecutionContextexecutionContext) { }@OverridepublicvoidvisitAfter(Elementelement,ExecutionContextexecutionContext) {Writerwriter =executionContext.getWriter();// ... close the fragment... }}
Note | If you need to control serialization of sub-fragments you need to reset theWriter instance so as to divert serialization of the sub-fragments. You do this by callingExecutionContext.setWriter . |
Sometimes you know that the target fragment you are serializing/transforming will never have sub-fragments. In this situation, it’s a bit ugly to have to implement theBeforeVisitor.visitBefore
method just to make a call to theExecutionContext.getWriter().write(...)
method to acquire ownership of theWriter
. For this reason, we have the@StreamSinkWriter
annotation. Used in combination with the@TextConsumer
annotation, we can remove the need to implement all but theAfterVisitor.visitAfter
method:
@TextConsumer@StreamSinkWriterpublicclassMyVisitorimplementsAfterVisitor {publicvoidvisitAfter(Elementelement,ExecutionContextexecutionContext) {Writerwriter =executionContext.getWriter();// ... serialize to the writer ... }}
Smooks provides theDomSerializer
class to make serializing of element data, as XML, a little easier. This class allows you to write aSaxNgVisitor
implementation like:
@StreamSinkWriterpublicclassMyVisitorimplementsElementVisitor {privateDomSerializerdomSerializer =newDomSerializer(true,true);@OverridepublicvoidvisitBefore(Elementelement,ExecutionContextexecutionContext) {try {domSerializer.writeStartElement(element,executionContext.getWriter()); }catch (IOExceptione) {thrownewSmooksException(e); } }@OverridepublicvoidvisitChildText(CharacterDatacharacterData,ExecutionContextexecutionContext) {try {domSerializer.writeText(characterData,executionContext.getWriter()); }catch (IOExceptione) {thrownewSmooksException(e); } }@OverridepublicvoidvisitChildElement(Elementelement,ExecutionContextexecutionContext)throwsSmooksException,IOException { }@OverridepublicvoidvisitAfter(Elementelement,ExecutionContextexecutionContext)throwsSmooksException,IOException {try {domSerializer.writeEndElement(element,executionContext.getWriter()); }catch (IOExceptione) {thrownewSmooksException(e); } }}
You may have noticed that the arguments in theDomSerializer
constructor are boolean. This is thecloseEmptyElements
andrewriteEntities
args which should be based on thecloseEmptyElements
andrewriteEntities
filter setting, respectively. Smooks provides a small code optimization/assist here. If you annotate theDomSerializer
field with@Inject
, Smooks will create theDomSerializer
instance and initialize it with thecloseEmptyElements
andrewriteEntities
filter settings for the associated Smooks instance:
@TextConsumerpublicclassMyVisitorimplementsAfterVisitor {@InjectprivateDomSerializerdomSerializer;publicvoidvisitAfter(Elementelement,ExecutionContextexecutionContext)throwsSmooksException,IOException {try {domSerializer.writeStartElement(element,executionContext.getWriter());domSerializer.writeText(element,executionContext.getWriter());domSerializer.writeEndElement(element,executionContext.getWriter()); }catch (IOExceptione) {thrownewSmooksException(e); } }}
SaxNgVisitor
configuration works in exactly the same way as any other Smooks component. SeeConfiguring Smooks Components.
The most important thing to note with respect to configuring visitor instances is the fact that theselector
attribute is interpreted as an XPath (like) expression. For more on this see the docs onSelectors.
Also note that visitors can be programmatically configured on a Smooks instance. Among other things, this is very useful for unit testing.
Let’s assume we have a very simpleSaxNgVisitor
implementation as follows:
@TextConsumerpublicclassChangeItemStateimplementsAfterVisitor {@InjectprivateDomSerializerdomSerializer;@InjectprivateStringnewState;publicvoidvisitAfter(Elementelement,ExecutionContextexecutionContext) {element.setAttribute("state",newState);try {domSerializer.writeStartElement(element,executionContext.getWriter());domSerializer.writeText(element,executionContext.getWriter());domSerializer.writeEndElement(element,executionContext.getWriter()); }catch (IOExceptione) {thrownewSmooksException(e); } }}
Declaratively configuringChangeItemState
to fire on fragments having a status of "OK" is as simple as:
<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"> <resource-configselector="order-items/order-item[@status = 'OK']"> <resource>com.acme.ChangeItemState </resource> <paramname="newState">COMPLETED</param> </resource-config></smooks-resource-list>
Of course it would be really nice to be able to define a cleaner and more strongly typed configuration for theChangeItemState
component, such that it could be configured something like:
<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:order="http://www.acme.com/schemas/smooks/order.xsd"> <order:changeItemStateitemElement="order-items/order-item[@status = 'OK']"newState="COMPLETED" /></smooks-resource-list>
For details on this, see the section onDefining Custom Configuration Namespaces.
This visitor could also be programmatically configured on a Smooks as follows:
Smookssmooks =newSmooks();smooks.addVisitor(newChangeItemState().setNewState("COMPLETED"),"order-items/order-item[@status = 'OK']");smooks.filterSource(newStreamSource(inReader),newStreamSink(outWriter));
One aspect of the visitor lifecycle has already been discussed in the general context of Smooks componentinitialization and un-initialization.
Smooks supports two additional component lifecycle events, specific to visitor components, via thePostExecutionLifecycle
andPostFragmentLifecycle
interfaces.
Visitor components implementing this lifecycle interface will be able to perform postSmooks.filterSource
lifecycle operations.
publicinterfacePostExecutionLifecycle {voidonPostExecution(ExecutionContextexecutionContext);}
The basic call sequence can be described as follows (note the onPostExecution calls):
smooks =newSmooks(..);smooks.filterSource(...); **VisitorXX.onPostExecution **smooks.filterSource(...); **VisitorXX.onPostExecution **smooks.filterSource(...); **VisitorXX.onPostExecution ** ...etc ...
This lifecycle method allows you to ensure that resources scoped around theSmooks.filterSource
execution lifecycle can be cleaned up for the associatedExecutionContext
.
Visitor components implementing this lifecycle interface will be able to perform postAfterVisitor.visitAfter
lifecycle operations.
publicinterfacePostFragmentLifecycleextendsVisitor {voidonPostFragment(Fragment<?>fragment,ExecutionContextexecutionContext);}
The basic call sequence can be described as follows (note the onPostFragment calls):
smooks.filterSource(...); <message> <target-fragment> <--- VisitorXX.visitBefore Text!! <--- VisitorXX.visitChildText <child> <--- VisitorXX.visitChildElement </child> </target-fragment> <--- VisitorXX.visitAfter ** VisitorXX.onPostFragment ** <target-fragment> <--- VisitorXX.visitBefore Text!! <--- VisitorXX.visitChildText <child> <--- VisitorXX.visitChildElement </child> </target-fragment> <--- VisitorXX.visitAfter ** VisitorXX.onPostFragment ** </message> VisitorXX.onPostExecutionsmooks.filterSource(...); <message> <target-fragment> <--- VisitorXX.visitBefore Text!! <--- VisitorXX.visitChildText <child> <--- VisitorXX.visitChildElement </child> </target-fragment> <--- VisitorXX.visitAfter ** VisitorXX.onPostFragment ** <target-fragment> <--- VisitorXX.visitBefore Text!! <--- VisitorXX.visitChildText <child> <--- VisitorXX.visitChildElement </child> </target-fragment> <--- VisitorXX.visitAfter ** VisitorXX.onPostFragment ** </message> VisitorXX.onPostExecution
This lifecycle method allows you to ensure that resources scoped around a single fragment execution of a SaxNgVisitor implementation can be cleaned up for the associatedExecutionContext
.
ExecutionContext
is scoped specifically around a single execution of aSmooks.filterSource
method.All Smooks visitors must be stateless within the context of a single execution. A visitoris created once in Smooks and referenced across multiple concurrent executions of theSmooks.filterSource
method. All data stored in anExecutionContext
instance will be lost on completion of theSmooks.filterSource
execution.ExecutionContext
is a parameter in all visit invocations.
ApplicationContext
is scoped around the associated Smooks instance: only oneApplicationContext
instance exists per Smooks instance. This context object can be used to store data that needs to be maintained (and accessible) across multipleSmooks.filterSource
executions. Components (any component, includingSaxNgVisitor
components) can gain access to their associatedApplicationContext
instance by declaring anApplicationContext
class property and annotating it with@Inject
:
publicclassMySmooksResource {@InjectprivateApplicationContextappContext;// etc...}
Note | Smooks instrumentation is available starting from v2.0.0-RC4. |
A Smooks application can be instrumented withJMX which allows IT operations to monitor and manage Smooks from tools such as JConsole, Grafana, and NewRelic. Apart from the MBeans provided by the Java runtime and third-party libraries, Smooks provides MBeans for:
resources (e.g., resource config parameters)
visitors (e.g., performance metrics, failure visit count, events visited)
reader pools (e.g., no. of active readers)
To activate instrumentation in Smooks, you need to:
Add the
smooks-management
module to your Java class path. The Maven coordinates for this module are:<dependency> <groupId>org.smooks</groupId> <artifactId>smooks-management</artifactId> <version>2.1.0</version></dependency>
Declare
management:instrumentationResource
in the Smooks config to activate instrumentation:<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd"xmlns:management="https://www.smooks.org/xsd/smooks/management-1.0.xsd"> <management:instrumentationResource/> ... ...</smooks-resource-list>
management:instrumentationResource
supports the following attributes:Name Description Default usePlatformMBeanServer
Whether to use the MBeanServer from this JVM.
true
mBeanServerDefaultDomain
The default JMX domain of the MBeanServer.
org.smooks
mBeanObjectDomainName
The JMX domain that all object names will use.
org.smooks
includeHostName
Whether to include the hostname in the MBean naming.
false
Optionally, declare the
org.smooks.management.InstrumentationInterceptor
interceptor to instrument visitors:<smooks-resource-listxmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"xmlns:core="https://www.smooks.org/xsd/smooks/smooks-core-1.6.xsd"xmlns:management="https://www.smooks.org/xsd/smooks/management-1.0.xsd"> <management:instrumentationResource/> <core:interceptors> <core:interceptorclass="org.smooks.management.InstrumentationInterceptor"/> </core:interceptors> ... ...</smooks-resource-list>
In addition to the Smooks classes that are already instrumented, you can instrument your own classes with the help of annotations located in theorg.smooks.management.annotation
package which are listed below:
Annotating a class withorg.smooks.management.annotation.ManagedResource
effectively turns the class into an MBean which allows an instance of the class to be registered with an MBean server. Here is an example of a class annotated with@ManagedResource
:
......importorg.smooks.api.ApplicationContext;importorg.smooks.api.management.InstrumentationAgent;importorg.smooks.api.management.InstrumentationResource;importorg.smooks.management.annotation.ManagedResource;importjavax.management.ObjectName;@ManagedResource(description ="My Class")publicclassMyManagedClass {publicMyManagedClass(ApplicationContextapplicationContext) {InstrumentationResourceinstrumentationResource =applicationContext.getRegistry().lookup(InstrumentationResource.INSTRUMENTATION_RESOURCE_TYPED_KEY);InstrumentationAgentinstrumentationAgent =instrumentationResource.getInstrumentationAgent();instrumentationAgent.register(this,newObjectName("org.smooks:type=app,name=MyClass")); } ... ...}
Line 17 from the snippet above usesInstrumentationAgent
to register the@ManagedResource
annotated object with the MBean server so that it can be queried and invoked by JMX clients.InstrumentationAgent
is obtained from an instance oforg.smooks.api.management.InstrumentationResource
. The singletonInstrumentationResource
can be looked up from the Smooks registry as shown in line 14.
Important | Instrumentation needs to be activated in order to obtain a non-nullInstrumentationResource from the lookup. |
Annotating a method withorg.smooks.management.annotation.ManagedAttribute
exposes the method as an MBean attribute. Here is an example of a method annotated with@ManagedAttribute
:
......importorg.smooks.management.annotation.ManagedAnnotation;importorg.smooks.management.annotation.ManagedResource;@ManagedResourcepublicclassMyManagedClass {privateStringmyAttribute; ... ...@ManagedAttribute(description ="My attribute")publicStringgetMyAttribute() {returnmyAttribute; }}
Annotating a method withorg.smooks.management.annotation.ManagedOperation
exposes the method and its parameters as an MBean operation. Here is an example of a method annotated with@ManagedOperation
:
......importorg.smooks.management.annotation.ManagedOperation;importorg.smooks.management.annotation.ManagedResource;@ManagedResourcepublicclassMyManagedClass { ... ...@ManagedOperation(description ="My operation")publicvoidmyOperation(StringmyFirstParam,IntegermySecondParam) { ... ... }}
Annotating a class withorg.smooks.management.annotation.ManagedNotification
allows JMX clients to subscribe to notifications broadcasted from instances of this class. Here is an example of a method annotated with@ManagedNotification
:
......importorg.smooks.api.ApplicationContext;importorg.smooks.api.management.InstrumentationAgent;importorg.smooks.api.management.InstrumentationResource;importorg.smooks.management.ModelMBeanAssembler;importorg.smooks.management.annotation.ManagedNotification;importorg.smooks.management.annotation.ManagedResource;importjavax.management.MBeanException;importjavax.management.Notification;importjavax.management.ObjectName;importjavax.management.modelmbean.ModelMBeanInfo;importjavax.management.modelmbean.RequiredModelMBean;importjava.util.concurrent.atomic.AtomicLong;@ManagedResource@ManagedNotification(name ="My Notification",notificationTypes = {"javax.management.Notification"})publicclassMyManagedClass {privatefinalRequiredModelMBeanrequiredModelMBean;privatefinalAtomicLongsequenceNumber =newAtomicLong();publicMyManagedClass(ApplicationContextapplicationContext) {InstrumentationResourceinstrumentationResource =applicationContext.getRegistry().lookup(InstrumentationResource.INSTRUMENTATION_RESOURCE_TYPED_KEY);InstrumentationAgentinstrumentationAgent =instrumentationResource.getInstrumentationAgent();ModelMBeanAssemblermodelMBeanAssembler =newModelMBeanAssembler();ModelMBeanInfomodelMBeanInfo =modelMBeanAssembler.getModelMbeanInfo(this.getClass());requiredModelMBean =instrumentationAgent.register(this,newObjectName("org.smooks:type=app,name=MyClass"),modelMBeanInfo,false); }publicvoidsendNotification(Stringmessage)throwsMBeanException {Notificationnotification =newNotification("My Notification",requiredModelMBean,sequenceNumber.addAndGet(1),message);requiredModelMBean.sendNotification(message); }}
As illustrated above in lines 29-30, to register a@ManagedNotification
instance, the MBean descriptor should be assembled dynamically withorg.smooks.management.ModelMBeanAssembler
. The assembledjavax.management.modelmbean.ModelMBeanInfo
is then registered usingInstrumentationAgent
as shown in line 31.InstrumentationAgent.register(Object, ObjectName, ModelMBeanInfo, boolean)
returns aRequiredModelMBean
object that can be referenced to broadcast notifications, seen in lines 35-36.
Adding more than one notification type can be achieved withorg.smooks.management.annotation.ManagedNotifications
. Here is an example of a method annotated with@ManagedNotifications
:
......importorg.smooks.api.ApplicationContext;importorg.smooks.api.management.InstrumentationAgent;importorg.smooks.api.management.InstrumentationResource;importorg.smooks.management.ModelMBeanAssembler;importorg.smooks.management.annotation.ManagedNotification;importorg.smooks.management.annotation.ManagedNotifications;importorg.smooks.management.annotation.ManagedResource;importjavax.management.MBeanException;importjavax.management.Notification;importjavax.management.ObjectName;importjavax.management.modelmbean.ModelMBeanInfo;importjavax.management.modelmbean.RequiredModelMBean;importjava.util.concurrent.atomic.AtomicLong;@ManagedResource@ManagedNotifications(value = {@ManagedNotification(name ="My Notification",notificationTypes = {"javax.management.Notification"}),@ManagedNotification(name ="Another Notification",notificationTypes = {"javax.management.Notification"})})publicclassMyManagedClass {privatefinalRequiredModelMBeanrequiredModelMBean;privatefinalAtomicLongsequenceNumber =newAtomicLong();publicMyManagedClass(ApplicationContextapplicationContext) {InstrumentationResourceinstrumentationResource =applicationContext.getRegistry().lookup(InstrumentationResource.INSTRUMENTATION_RESOURCE_TYPED_KEY);InstrumentationAgentinstrumentationAgent =instrumentationResource.getInstrumentationAgent();ModelMBeanAssemblermodelMBeanAssembler =newModelMBeanAssembler();ModelMBeanInfomodelMBeanInfo =modelMBeanAssembler.getModelMbeanInfo(this.getClass());requiredModelMBean =instrumentationAgent.register(this,newObjectName("org.smooks:type=app,name=MyClass"),modelMBeanInfo,false); }publicvoidsendMyNotification(Stringmessage)throwsMBeanException {Notificationnotification =newNotification("My Notification",requiredModelMBean,sequenceNumber.addAndGet(1),message);requiredModelMBean.sendNotification(message); }publicvoidsendAnotherNotification(Stringmessage)throwsMBeanException {Notificationnotification =newNotification("Another Notification",requiredModelMBean,sequenceNumber.addAndGet(1),message);requiredModelMBean.sendNotification(message); }}
Please see the following guidelines if you’d like to contribute code to Smooks.
Smooks is open source and licensed under the terms of the Apache License Version 2.0, or the GNU Lesser General Public License version 3.0 or later. You may use Smooks according to either of these licenses as is most appropriate for your project.
SPDX-License-Identifier: Apache-2.0 OR LGPL-3.0-or-later
About
An extensible Java framework for building event-driven applications that break up XML and non-XML data into chunks for data integration