- Notifications
You must be signed in to change notification settings - Fork2
Sugar for building and running context-free transducers in Java
License
jrte/ribose
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
TLDR;To skip this long screed and learn how to build and work with ribose jump to theDisclaimer at the end. I built my first transducers withINR, mapping ASCII* into a semiring of C function (effector) pointers at Bell Northern Research in Ottawa, in the late 1980s. Been watching the world stumble by without it ever since, encumbering serialized forms of even the simplest object models with all manners ofill-fitting suits
Why don't information architects use idiomaticUnicode* semiring expressions, tailored expertly to their specific domains and intentions, to describe basic domain artifacts and combine them in more complex forms for persisting and communicating domain-specific information with entities interacting with their domain? Why don't modern programming languages and computing machines present robust support for semirings and automata? Will XML ever go away?
I don't know.
Ribose (formerly jrte) is about inversion of control for high-volume text analysis and information extraction and transformation in general. Many stream-oriented tasks, such as cleaning and extraction for data analytic workflows, involve recognizing and acting upon features embedded, more or less sparsely, within a larger context. Software developers receive some onerous help in that regard from generic software libraries that support common document standards (e.g., XML, JSON, MS Word, etc.), but dependency on these libraries adds complexity, vulnerability and significant runtime costs to software deployments. And these libraries are of no use at all when information is presented in idiomatic formats that require custom software to deserialize.
Ribose specializesginr, an industrial strength open source compiler for multidimensional regular patterns, to produce finite state transducers (FSTs) that map syntactic features to semantic effector methods expressed by a target class. Ribose transduction patterns are composed and manipulated using algebraic (semiring) operators and compiled to FSTs for runtime deployment. Ginr admits arbitrary bytes (\xHH) in ribose patterns and transcodes Unicode glyphs to UTF-8 byte sequences; ribose compiles FSTs to operate in thebyte* domain regardless of UTF-8 or binary origin. Out of band (>255) signals may also be embedded in ribose transducer patterns to guide stream processing in compiled FSTs. Regular patterns may be nested to cover context-free inputs, and the ribose runtime supports unbounded lookahead to resolve ambiguities or deal with context-sensitive inputs. Inputs are presented to ribose runtime transducers as streams of byte-encoded information and regular or context-free inputs are transduced in linear time.
There is quite a lot of byte-encoded information being passed around these days (right-click in any browser window and "View Page Source" to see a sample) and it is past time to think of better ways to process this type of data than crunching it on instruction-driven calculator machines. Ribose and ginr promote a pattern-oriented, data driven approach to designing, developing and processing information workflows. Ribose is a ship-in-a-bottle showpiece put together to shine a spotlight on ginr and to demonstrate what computing might be if finite state transduction, augmented with a transducer stack and coupled with a classical CPU/RAM computer, was a common modality for processing sequential information (i.e., almost everything except arithmetic).
Regular patterns and automata are to computing ecosystems what soil and microbiota are to the stuff living above ground. Strange that we don't see explicit support for their construction and runtime use in modern programming languages and computing machines. Ribose is presented only to demonstrate the general idea of pattern-oriented design and development. As is, it successfully runs a limited suite of test cases and can be used to build domain-specific ribose models, but it is not regularly maintained and not suitable for general use. Others are encouraged to clone and improve it or implement more robust expressions of the general idea. Or be my hero and get clean and simple support for compiling semiring pattern expressions to runnable FSTs on the roadmap for Java or Rust.
The general idea is to show how to make information work for you rather than you having to work to instruct a computer about how to work with information. Or, at least, how to reduce costs associated with information workflows. Ribose viewsinformation as the instructional component in streaming contexts, providing a highly workable alternative to WIP This idea outlined below and explored, a bit snarkily, in the stories posted in theribose wiki. This has no connection whatsoever with POSIX and Perl 'regular expressions' (regex) or 'pattern expression grammars' (PEGs), that are commonly used for ad hoc pattern matching. In the following I refer to the algebraic expressions used to specify ribose transducers as 'regular patterns' to distinguish them from regex and PEG constructs.
Ginr is the star of the ribose circus. It was developed by J Howard Johnson at the University of Waterloo in the early 1980s. One of its first applications was totransduce the typesetting code for the Oxford English Dictionary from an archaic layout to SGML. I first used it at Bell Northern Research to implement a ribose-like framework to serve in a distributed database mediation system involving diverse remote services and data formats. The involved services were all driven by a controller transducing conversational scripts from a control channel. The controller drove a serial data link, transmitting queries and commands to remote services on the output channel and switching context-specific response transducers onto the input channel. Response transducers reduced query responses to SQL statements for the mediator and reduced command responses to guidance to be injected into the control channel to condition the course of the ongoing conversation.
Ginr subsequently disappeared from the public domain and has only recently beenpublished with an open source license on GitHub. It has been upgraded with substantial improvements, including 32-bit state and symbol enumerators and compiler support for transcoding Unicode symbols to UTF-8 bytes. It provides afull complement of algebraic operators that can be applied to reliably produce very complex (and very large) automata. Large and complex patterns can be decomposed into smaller and simpler patterns, compiled to FSTs, and reconstituted on ribose runtime stacks, just as complex procedural algorithms in Java are decomposed into simpler methods that Java threads orchestrate on call stacks in the JVM runtime.
Ribose is a proof of concept exercise intended to demonstrate the general idea of pattern oriented information processing. It specializes ginr to express transducers using terms of the form(A b, X[`Y` ...] ...), whereA is a pattern involving input symbols,b is an input symbol that is not a prefix ofA,X is the first effector invoked whenb is read, and `Y ...` is a list of parameters bound toX.
Ribose suggests a pattern-oriented approach to information that minimizes dependency on external libraries and could reduce complexity, vulnerability and development and runtime costs in information workflows. Ribose generalizes thetransducer design pattern that is commonly applied tofilter,map andreduce collections of data in functional programming paradigms. Common usage of this design pattern treats the presentation of inputs as a simple seriesT* without structure. Ribose extends and refines this design pattern, allowing transducers to precisely navigate (filter), map (select effect vector) and reduce (execute effect vector) complex information under the direction of syntactic cues in the input.
Here thefilter component is expressed as a collection of nested regular patterns describing an input source, using symbolic algebraic expressions to articulate the syntactic structure of the input. These unary input patterns are then extended to binary transduction patterns thatmap syntactic features to effect vectors that incrementallyreduce information extracted from the input. The syntactic structure provides a holistic navigable map of the input and exposes cut points where semantic actions should be applied. This presents a clear separation of syntactic and semantic concerns: Syntax is expressed in a purely symbolic domain where patterns are described and manipulated algebraically, while semantics are expressed poetically in a native programming language as effectors in a domain-specific target class. Syntactic patterns are articulated without concern for target semantics and effector implementation of semantic actions is greatly simplified in the absence of syntactic concerns.
The ribose runtime transducesbyte* streams simply and only becausebyte is the least common denominator for data representation in most networks and computing machines. Ginr compiles complex Unicode glyphs in ribose patterns to multiple UTF-8 byte transitions, so all ribose transductions are effected in thebyte* domain and only extracted textual features are decoded and widened to 16-bit Unicode code points. Bytes0xF8-0xFF, which are never expressed in UTF-8 encodings, are available for in-band signalling and can be used to express syntactic structure (push/pop object stack, open/close array) and semantic guidance (data names, types), obviating concern about embedded "special characters". Binary data can be embedded using self-terminating binary patterns or prior length information, as long as they are distinguishable from other artifacts in the information stream. Raw binary encodings may be especially useful in domains, such as online gaming or real-time process control, that demand compact and efficient messaging protocols with relaxed readability requirements. Semantic effectors may also inject previously captured bytes or out-of-band signals, such as countdown termination, into the input stream to direct the course of transductions.
The ribose runtime operates multiple concurrent transductions, each encapsulated in aTransductor object that provides a set of core compositing and control effectors and coordinates a transduction process. Nested FSTs are pushed and popped on transductor stacks, with two-way communication between caller and callee effected by injecting information for immediate transduction. Incremental effects are applied synchronously as each input symbol is read, culminating in complete reduction and assimilation of the input into the target domain. For regular and most context-free input patterns transduction is effected in a single pass without lookahead. Context-sensitive or ambiguous input patterns can be classified and resolved with unbounded lookahead(select clear paste* in) or backtracking(mark reset) using core transductor effectors.
Here is a simple example, taken from the ribose transducer that reduces the serialized form of compiled ginr automata to construct ribose transducers. The input pattern is simple and unambiguous and is easily expressed:
header = 'INR' (digit+ tab):4 digit+ nl; # a fixed alphabetic constant, 4 tab-delimited unsigned integers and a final unsigned integer delimited by newlinetransition = (digit+ tab):4 byte* nl; # 4 tab-delimited unsigned integers followed by a sequence of bytes of length indicated by the 4th integer, ending with newlineautomaton = header transition*; # the complete automatonTheautomaton input pattern above is extended to theAUtomaton transducer pattern below, which checks for a specific tag and marshals integer fields into an immutableHeader record and an array ofTransition records. Fields are extracted to rawbyte[] arrays using theclear,select andpaste effectors until a newline triggers a domain-specificheader ortransition effector to decode and marshal them intoHeader andTransition records. Finally the transitions are reduced in theautomaton effector to a 259x79 transition matrix, which the ribose compiler will reduce to a 13x27 transition matrix by coalescing equivalent inputs symbols (e.g., digits in this scenario) using the equivalence relation on the input domain induced by the ginr transition matrix.
Number = (digit, paste)+;Symbol = (byte, paste count)* eol;eol = cr? nl;inr = 'INR';Automaton = nil? (# header (inr, select[`~version`] clear) Number (tab, select[`~tapes`] clear) Number (tab, select[`~transitions`] clear) Number (tab, select[`~states`] clear) Number (tab, select[`~symbols`] clear) Number (eol, header (select[`~from`] clear))# transitions ( Number (tab, select[`~to`] clear) Number (tab, select[`~tape`] clear) Number (tab, select[`~length`] clear) Number (tab, select[`~symbol`] clear count[`~length` `!eol`]) Symbol (eol, transition (select[`~from`] clear)) )*# automaton (eos, automaton stop)):dfamin;Automaton$(0,1 2):prsseq `build/compiler/Automaton.pr`;The finalprsseq operator verifies that theAutomaton$(0, 1 2) automaton is a single-valued partialrational function mapping recognizable sequences from the input semiring into the semiring of effectors and effector parameters. The branching and repeating patterns expressed in the input syntax drive the selection of non-branching effect vectors, obviating much of the fine-grained control logic that would otherwise be expressed in line with effect in a typical programming language, without support from an external parsing library. Most of the work is performed by transductor effectors that latch bytes into named fields that, when complete, are decoded and assimilated into the target domain by a tightly focused domain-specific effector.
Expressions such as this can be combined with other expressions using concatenation, union, repetition and composition operators to construct more complex patterns. More generally, ribose patterns are amenable to algebraic manipulation in the semiring, and ginr enables this to be exploited to considerable advantage. For example,Transducer = Header Transition* eos covers a complete serialized automaton,Transducer210 = ('INR210' byte* eos) @@ Transducer restrictsTransducer to accept only version 210 automata (ginr's@ composition operator absorbs matching input and reduces pattern arity, the@@ join operator retains matching input and preserves arity).
In a nutshell,algorithms are congruent topatterns. Thelogic is in thesyntax.
Ginr operates in a symbolic domain involving a finite set of symbols and algebraic semiring operators that recombine symbols to express syntactic patterns. Support for Unicode symbols and binary data is built in, and Unicode in ginr source patterns is rendered as UTF-8 byte sequences in compiled automata. UTF-8 text is transduced without decoding and extracted bytes are decoded only in target effectors. Ribose transducer patterns may introduce additional atomic symbols as tokens representing out-of-band (>255) control signals.
Input patterns are expressed in{byte,signal}* semirings, and may involve UTF-8 and binary bytes from an external source as well as control signals interjected by target effectors. Ribose transducer patterns are expressed in(input,effector,parameter)* semirings, mapping input patterns onto parametric effectors expressed by domain-specific target classes. They identify syntactic features of interest in the input and apply target effectors to extract and assimilate features into the target domain.
A ribose model is associated with a target class and is a container for related collections of transducers, target effectors, static effector parameters, control signals and field registers for accumulating extracted bytes. TheITransductor implementation that governs ribose transductions provides a base set of effectors to
- extract and compose data in selected fields(
select, paste, copy, cut, clear), - count down from preset value and signal end of countdown(
count) - push/pop transducers on the transduction stack(
start, stop), - mark/reset at a point in the input stream(
mark, reset), - inject input for immediate transduction(
in, signal), - or write extracted data to an output stream(
out).
All ribose models implicitly inherit the transductor effectors, along with an extensible set of control signals{nul,nil,eol,eos} and an anonymous field that is preselected for every transduction and reselected whenselect is invoked with no parameter. New signals and fields referenced in transducer patterns implicitly extend the base signal and field collections. Additional effectors may be defined in specializedITarget implementation classes.
The ribose transductor implementsITarget and its effectors are sufficient for most ribose models that transduce input to standard output via theout[...] effector. Domain-specific target classes may extendSimpleTarget to express additional effectors, typically as inner classes specializingBaseEffector<Target> orBaseParametricEffector<Target,ParameterType>. All effectors are provided with a reference to the containing target instance and anIOutput view for extracting fields asbyte[], integer, floating point or Unicodechar[] values, typically for inclusion in immutable value objects that are incorporated into the target model.
Targets need not be monolithic. In fact, every ribose transduction involves a composite target comprised of the transductor and at least one other target class (e.g.,SimpleTarget). In a composite target one target class is selected as the representative target, which instantiates and gathers effectors from subordinate targets to merge with its own effectors into a single collection to merge with the transductor effectors. Composite targets allow separable concerns within complex semantic domains to be encapsulated in discrete interoperable and reusable targets. For example, a validation model containing a collection of transducers that syntactically recognize domain artifacts would be bound to a target expressing effectors to support semantic validation. The validation model and target, supplied by the service vendor, can then be combined with specialized models in receiving domains to obtain a composite model including validation and reception models and effectors. With some ginr magic receptor patterns can be joined with corresponding validator patterns to obtain receptors that validate in stepwise synchrony with reception and assimilation into the receiving domain.
Ribose as it stands is rough but ready for use in a wide range of use cases. Simple tasks that extract and composite recognized features for output to the standard output stream (e.g., as SQL or CSV) can be effected without any Java coding as noted above. More complex transduction use cases, such as rendering complete or partial object models from web service responses, can be realized by coding a specialized target class that presents custom effectors to assimilate extracted fields into the service data model. Very large or continuous inputs can be presented as serial segments of arbitrary size and are typically transduced with very low memory overhead.
Perhaps not today but Java service vendors with a pattern orientation could do a lot to encourage and streamline service uptake by providing transductive validation models containingpatterns describing exported domain artifacts along with validation models and targets. In consumer domains, the vendor patterns would provide concise and highly readable syntactic and semantic maps of the artifacts in the vendor's domain. Here they can serve as starting points for preparing specialized receptor patterns that call out to effectors in consumer target models, and these patterns can be joined with the service validation patterns as described above. Service vendors could also include in their validation models transducers for rendering domain artifacts in other forms, e.g. structured text or some or other markup language, allowing very concisely serialized artifacts to be comprehensible without sacrificing brevity and efficient parsing.
In computing ecosystems regular patterns and their equivalent automata, like microbiota in biological ecosystems, are ubiquitous and do almost all of the work. String them out on another construct like a stack or a spine and they can perform new tricks.
Consider ribonucleic acid (RNA), a strip of sugar (ribose) molecules strung together, each bound to one of four nitrogenous bases (A|T|G|C), encoding genetic information. Any ordered contiguous group of three bases constitutes acodon, and 61 of the 64 codons are mapped deterministically onto the 21 amino acids used in protein synthesis (the other three are control codons). This mapping is effected by a remarkable molecular machine, theribosome, which ratchets messenger RNA (mRNA) through an aperture to align the codons for translation and build a protein molecule, one amino acid at a time (click on the image below to see a real-time animation of this process). Over aeons, nature has programmed myriad mRNA scripts and compiled them into DNA libraries to be distributed among the living. So this trick of using sequential information from one domain (e.g., mRNA->codons) to drive a process in another domain (amino acids->protein) is not new.
For a more recent example, consider aC function compiled to a sequence of machine instructions with an entry point (call) and maybe one or more exit points (return). This can be decomposed into a set of vectors of non-branching instructions, each terminating with a branch (or return) instruction. These vectors are ratcheted through the control unit of a CPU and each sequential instruction is decoded and executed to effect specific local changes in the state of the machine. Branching instructions evaluate machine state to select the next vector for execution. All of this introspective navel gazing and running around is effected by a von Neumann CPU, chasing an instruction pointer. As long as the stack pointer is fixed on the frame containing the function the instruction pointer will trace a regular pattern within the bounds of the compiled function. This regularity would be obvious in the source code for the function as implemented in a procedural programming language likeC, where the interplay of concatenation (';'), union (if/else/switch) and repetition (while/do/for) is apparent. It may not be so obvious in source code written in other, e.g. functional, programming languages, but it all gets compiled down to machine code to run on von Neumann CPUs, on the ground or in the cloud.
Programming instruction-driven machines to navigate complex patterns in sequential data or asynchronous workflows is an arduous task in any modern programming language, requiring a mess of fussy, fine-grained twiddling that is error prone and difficult to compose and maintain. Refactoring the twiddling into a nest of regular input patterns leaves a simplified collection of code snippets that just need to be sequenced correctly as effectors, and extending input patterns to orchestrate effector sequencing via transduction seems like a natural thing to do. Transducer patterns expressed in symbolic terms can be manipulated using well-founded and wide-ranging algebraic techniques, often without impacting effector semantics. Effector semantics are very specific and generally expressed in a few lines of code, free from syntactic concerns, in a procedural programming language. Their algebraic properties also enable regular patterns to be reflected in other mathematical domains where they may be amenable to productive analysis.
Ribose presents a pattern-oriented, transductive approach to sequential information processing, factoring syntactic concerns into nested patterns that coordinate the application of tightly focused effector functions of reduced complexity. Patterns are expressed algebraically as regular expressions in the byte semiring and extended as rational functions into effector semirings, and effectors are implemented as tightly focused methods expressed by a target object in the receiver's domain. This approach is not new; IBM produced an FST-drivenXML Accelerator to transduce complex data schemata (XML, then JSON) at wire speed. They did it the hard way, sweating over lists of transitions, apparently still unaware of semiring algebra. They deployed it alongsideWebSphere but the range of acceptable input formats is solely selected by the vendor. We know that sequential data can be processed at wire speeds using transduction technology, and we have"Universal Turing Machines" capable of running any computable"algorithm". Where is the"Universal Transduction Machine" that can recognize any nesting of regular"patterns" constructed from thebyte* semiring and transduce conformant data into the receiving domain?
From an extreme but directionally correct perspective it can be said that almost all software processes operating today are running on programmable calculators loaded with zigabytes of RAM. Modern computing machines are the multigenerational inheritors of von Neumann's architecture, which was originally developed to support numeric use cases like calculating ballistic trajectories. These machines are "Turingtarpits complete" (likeorigami), so all that is required to accommodate textual data is a numeric encoding of text characters.Programmers can do the rest. Since von Neumann's day we've seen lots of giddy-up but the main focus inmachine development has been on miniaturization and optimizations to compensate for RAM access lag (John Backus' 'von Neumann bottleneck'). Sadly, when the first text character enumerations were implemented, their designers failed to note that their text ordinals constituted the basis for a text semiring wherein syntactic patterns in textual media could be extended todirect effects within a target semantic domain.
It is great mystery why support for semiring algebra is nonexistent in almost all programming languages and why hardware support for finite state transduction is absent from commercial computing machinery, even though a much greater proportion of computing bandwidth is now consumed to process sequential byte-encoded information. It may have something to do with money and the vaunted market forces that drive continuous invention and refinement. The folks that design and develop computing hardware and compilers are heavily invested in the von Neumann status quo, and may directly or indirectly extract rents for CPU and RAM resources. They profit enormously as, globally, the machines arduously generate an ever-increasing volume of data to feed back into themselves. So the monetary incentive to improve support for compute-intensive tasks like parsing reams of text may be weak. Meanwhile, transduction technology has been extensively developed and widely deployed. It is the basis for lexical analysis in compiler technology and natural language processing, among other things. But it is buried withinproprietary or specialized software and is inaccessible to most developers.
Unfortunately I can only imagine what commercial hardware and software engineering tools would be like today if they had evolved with FST logic and pattern algebra built in from the get go. But it's a sure bet that the machines would burn a lot less oil and software development workflows would be more streamlined and productive. Information architects would work with domain experts to design serialized representations for domain artifacts and recombine these in nested regular patterns to realize more complex forms for internal persistence and transmission between processing nodes. These data representations would be designed and implemented simply, directly and efficiently without involving external data representation schemes like XML or JSON. There's aBig Use Case for that.
Ribose encouragespattern-oriented design and development, which is based almost entirely on semiring algebra. Input patterns in text and other symbolic domains are expressed and manipulated algebraically and extended to map syntactic features onto vectors of machine instructions. A transducer stack extends the range of transduction to cover context-free input structures that escape semiring confinement. Effector access to RAM and an input stack support transductions involving context-sensitive inputs.
In this way the notion of aprogram, whereby a branching and looping series ofinstructions select data in RAM and mutate machine state, is replaced by apattern that extends a description of an input source so that thedata select instructions (effectors) that merge the input data into target state. Heretarget,effector andpattern are analogues ofmachine,instruction,program. Atransducer is a compiledpattern and atransduction is aprocess that applies a specific input sequence to a stack of nested transducers to direct the application of effectors to a target model in RAM.
Ribose suggests that the von Neumann CPU model would benefit from inclusion of finite state transduction logic to coordinate the sequencing of instructions under the direction of nested transducers driven by streams of numerically encoded sequential media, and that programming languages should express robust support for semiring algebra to enable construction of multidimensional regular patterns and compilation to FSTs as first-order objects. Transduction of data from input channel (file, socket, etc.) interfaces into user space should be supported by operating system kernels. Runtime support for transduction requires little more than an transducer stack and a handful ofbyte[] buffers.
TheBurroughs Corporation B5000, produced in 1961, was first to present a stack-oriented instruction set to support emergent compiler technology for stack-centric programming languages (e.g., Algol). The call stack then became the locus of control in runtime process execution of code as nested regular patterns of machine instructions. Who will be first, in the 21st century, to introduce robust compiler support for regular patterns and automata? Will hardware vendors follow suit and introduce pattern-oriented instruction sets to harness their blazing fast calculators to data-driven transductors of sequential information? Will it take 75 years to learn how to work effectively in pattern-oriented design and development environments?
Architects who want a perfect zen koan to break their minds on should contemplate the essential value of abstract data representation languages that can express everything without knowing anything. Developers might want to bone up on semiring algebra. It may look intimidating at first but it is just like arithmetic with nonnegative integers and addition and multiplication and corresponding identity elements0 and1, but with sets of strings and union and concatenation with identity elementsØ (empty set) andϵ (empty string). The '*' semiring operator is defined as the union of all concatenation powers of its operand. Analogous rules apply identically in both domains, although concatenation does not commute in semirings. SeeMath is Hard? for a scolding.
Best of all, semiring algebra is stable and free from bugs and vulnerabilities. Ginr is mature and stable, although it needs much more testing in diverse symbolic domains, and its author should receive some well-deserved recognition and kudos. Ribose is a hobby horse, created only to lend some support to my claims about pattern-oriented design and development.
Good luck with all that.
Ribose is presented for demonstration only and is not regularly maintained. You may use it to compile and run the included examples, or create your own transducers to play with. Or clone and refine it and make it available to the rest of the world. Transcode it toC and wrap it in a Python thing. Do what you will, it's open source.
Binary executable copies ofginr (for Linux) andginr.exe (for Windows) are included inetc/ginr for personal use (with the author's permission); ginr guidance isreposted in the sidebar in the ribose wiki. You are encouraged to clone or download and build ginr directly from theginr repo.
Ribose has been developed and tested with OpenJDK 11 and 17 in Ubuntu 18 and Windows 10. It should build on any unix-ish platform, includinggit bash,Msys2\mingw or Windows Subsystem for Linux (WSL) for Windows, withant,java,bash,cat,wc,grep in the executable search path. TheJAVA_HOME andANT_HOME environment variables must be set properly, e.g.export JAVA_HOME=$(readlink ~/jdk-17.0.7).
Clone the ribose repo and runant clean package to percolate the ribose and test libraries and API documentation into thejars/ andjavadoc/ directories. This will also build the ribose compiler and test models from transducer patterns in thepatterns/ directory. The defaultci-test target performs a clean build and runs the CI tests.
-: # set home paths for java and ant-: export JAVA_HOME="$(realpath ./jdk-17.0.7)"-: export ANT_HOME="$(realpath ./ant-1.10.12)"-: # clone ribose-: git clone https://github.com/jrte/ribose.gitCloning into 'ribose'......Resolving deltas: 100% (2472/2472), done.-: # build ribose, test and javadoc jars and compiler, test models-: cd ribose-: ant packageBuildfile: F:\Ubuntu\git\jrte\build.xml...BUILD SUCCESSFUL-: # list build products-: ls jarsribose-0.0.2.jar ribose-0.0.2-api.jar ribose-0.0.2-test.jar-: jar -tvf jars/ribose-0.0.2.jar|grep -oE '[a-z/]+TCompile.model'com/characterforming/jrte/engine/TCompile.model-: find . -name '*.model' -o -name '*.map'./build/Test.map./build/Test.model./TCompile.map./TCompile.model-: # run the CI tests-: antInstructions for building ribose models and running transductors from the shell and in the Java VM are included the ribose API documentation (javadoc). Thecom.characterforming.ribose package documentation specifies the arguments for the runnableRibose class and presents the ribose runtime interfaces. The main interfaces areIRuntime,ITransductor andITarget. The runnableRibose class can be executed directly or from the shell scripts in the project root:
- rinr: compile ginr patterns from a folder containing ginr source files (*.inr) to DFAs (*.dfa)
- ribose compile | run | decompile:
- compile: compile a collection of DFAs into a ribose model for a specific target class
- run: run a transduction from a byte stream onto a target instance
- decompile: decompile a transducer
The shell scripts are tailored to work within the ribose repo environment but can serve as templates for performing equivalent operations in other environments. Other than ginr ribose has no dependencies and is contained entirely within the ribose jar file.
See the javadoc overview, package and interface documentation for information regarding use of the ribose compiler and transduction runtime API in the JVM.
For some background reading and a historical perspective visit theribose wiki.
SeeLICENSE for ribose licensing details.
Please somebody burn this into an FPGA and put it in a box likethis and sell it. But don't bind the sweetness to a monster and hide it in the box; be sure to provide and maintain a robust compiler for generalized rational functions and relations (hint). Show information architects that they can encode basic domain artifacts as patterns in text semirings (e.g.,UNICODE*) and combine patterns to represent more complex artifacts for persistence and transmission and decoding off the wire. Know that they understand their domains far better than you and can do this without heavy handed guidance from externalities like IBM, Microsoft, Amazon, Google or yourself.
You never know. Folks who process vast volumes of byte encoded textual data (e.g.,UTF-8*) off the wire to feed their search engines, or from persistent stores to feed their giant AI brains, might find unimagined ways to repurpose your box. And you won't have to tweak your FPGA one bit, if you've done it right, because these novel adaptations will be effected outside the box, in the pattern domain. A thriving pattern-oriented community will share libraries of patterns to cover common artifacts and everyone will love you.
Then XML will go away. JSON too.Think about it.
Thank you.
About
Sugar for building and running context-free transducers in Java
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.

