- Notifications
You must be signed in to change notification settings - Fork3
Yamaha YM7128B Surround Processor emulation library
License
TexZK/YM7128B_emu
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This library aims to emulate theYamaha YM7128B Surround Processor.
The original goal is to contribute to the emulation of theAdLib Gold soundcard, which used such integrated circuit to add surround effects to the audiooutput, as heard in the beautiful soundtrack of theDune videogame, madespecifically for that rare sound card.
As I do not know how the actual circuits are made, I cannot emulate the chipperfectly. Instead, my goal is to simulate the overall sound, based on theshort information available, and with some assumptions based on other Yamahachips of the same age, as far as I know.
If anybody can analyze a decapped chip, that would of course be the bestinformation, as already done for many vintage sound chips.
I am willing to get some real chips, as long as they are available forpurchase, at least for black-box comparisons with dedicated test vectors.
The library provides chip emulation with the following engines:
Fixed: it should emulate an actual chip, using fixed point arithmetics.It features all the actual chip constraints, like sample rates, thepresence of an output oversampler, and saturated arithmetics. Algorithmsare rather slow to compute, and resampling is typically heavy.
Float: likeFixed, using floating point arithmetics instead.It increases the sound quality, while keeping the architecture similar tothat of an actual chip. Algorithms are fast to compute, but resamplingstill has a significant impact on performance.
Ideal: it represents an ideal implementation of the algorithms.Arithmetics are in floating point, there are no saturations, and no samplerate conversions, not even at the output stage. Algorithms are fast,despite the loss in accuracy with respect to an actual chip.
Short: somewhat an hybrid ofFixed andIdeal, in that it usesfull-range 16-bit fixed point arithmetics with saturations, but withoutsample rate conversions. Algorithms are fast, despite the loss in accuracywith respect to an actual chip.
In the following table, an overview of the features of each engine:
Feature | Fixed | Float | Ideal | Short |
---|---|---|---|---|
Input filter | suggested: 6th order 15 kHz | suggested: 6th order 15 kHz | not required | not required |
Input signal | Q1.13 | double, normalized | double | Q1.15 |
Input rate | 23550 Hz | 23550 Hz | suggested: above 40 kHz | suggested: above 40 kHz |
Saturated arithmetics | yes | yes, normalized | no | yes |
Signal operand | Q1.15 | double | double | Q1.15 |
Gain operand | Q1.11 | double | double | Q1.15 |
Feedback operand | Q1.5 | double | double | Q1.5 |
Oversampler operand | Q1.11 | double | no oversampling | no oversampling |
Output rate | 47100 Hz | 47100 Hz | same as input | same as input |
Output signal | Q1.13 | double, normalized | double | Q1.15 |
Output filter | suggested: 3rd order 15 kHz | suggested: 3rd order 15 kHz | not required | not required |
Status memory | allocated by the user | allocated by the user | allocated by the user | allocated by the user |
Delay memory | part of the status | part of the status | dynamic heap allocation | dynamic heap allocation |
Performance | very slow | slow | fast | fast |
Accuracy | best? | good | poor | poor |
To use this library, just includeYM7128B_emu.c
andYM7128B_emu.h
into your project.
All the engines implement the same conceptual flow:
- Status memory allocation.
- Call
Ctor()
method to invalidate internal data. - Call
Reset()
method to clear emulated registers. - Call
Setup()
to allocate internal delay memory(only forIdeal engine). - Call
Start()
to start the algorithms. - Processing loop:
- Filter input samples.
- Resample input samples.
- For each sample:
- Call
Process()
method, with appropriate data types.
- Call
- Resample output samples.
- Filter output samples.
- Call
Stop()
method to stop the algorithms. - Call
Dtor()
method to deallocate and invalidate internal data. - Status memory deallocation.
Register access timings are not emulated.
The biggest concern is about the weird sampling rates of the original chip:23.6 kHz for input, and 47.1 kHz for output.
Since modern common audio sample rates are based either on audio CD (44.1 kHz),or legacy professional audio (48 kHz), the sampling rate of most late '80sYamaha products need sample rate conversion both at input and output to beemulated by software.
Quality realtime conversions of such unusual sample rates intrinsicallyrequire more CPU time than the chip emulation itself, so expect them to be theactual bootleneck of the emulation.
Also, sample rate conversions add delays to their outputs, which are notwelcome to realtime processing.
This library does not provide sample rate conversion itself, because properconversion (without audible distortion) is not trivial at all. Instead, thereare many libraries available, each with its quality rating, performance,and licensing.You can find a comprehensive comparisonat Infinite Wave's website.
Personally, I have tried the following open source libraries with success:
secret rabbit code 0.1.9(download -github):A nice mature C library, easy to use, licensed as2-clause BSD.Not the most lightweight for realtime, nor in performance, nor in code size.
r8brain free 4.6(github):Free version of a commercial C++ library, licensed as2-clause BSD.Suitable for realtime, and with small footprint.
zita-resampler 1.6.2(download -github):A nice modern C++ library, licensed asGPL3.Suitable for realtime, and with small footprint.
The datasheet suggests some input and output analog filtering, to reducealiasing effects, more specifically a 6th order low-pass input filter, and a3rd order low-pass output filter.Such filters should be considered into the system emulation, because an actualphysical system needs them. I do not know how theAdLib Gold filtered them.
I do not have specifications about them, but they should beButterworthfilters (common analog audio filters), and I guess the cut-off frequency isaround 15 kHz (common for FM radio,Sound Blaster, etc.).
Again, analog filters are not provided by the library, to give the user freedomabout their implementation (e.g.Robert Bristow-Johnson's Audio EQ Cookbook-text).
Here you can find some descriptions and discussions about implementationdetails.
I choseC89 with a bit ofC99. I was going to useC++20 for my ownpleasure, but instead I find good old and mature C to be the best for such atiny library. C is also easier to integrate with other languages, and themighty features of a colossal language like C++ are more of a burden than foractual use in such case.
The code itself should be cross-platform and clean enough not to givecompilation errors or ambiguity.
Currently the library is developed and tested underWindows withMSVC++ 14.Of course, I will at least provide support forgcc andclang underLinux.I do not have access tomacOS, but I guess theLinux support should fit.
I chose the path of verbosity for variable declaration, to help debugging fixedpoint math and buffering. Indeed, when compiled for performance, all thosevariable declarations get optimized away easily.
I did not write the code for explicit vectoring, preferring aKISS approachat this stage of development. Actually I am not satisfied about how theMSVC++ compiler is generating machine code, and I guess that optimizing thecode for vectoring should improve the performance by some margin, especiallythe parts for parallel 8-tap delay and output oversampling.
The datasheet claims 14-bitfloating point sampling for both input andoutput. There is no information about such esoteric format itself, but thereare datasheets of Yamaha products of the same age that do.I know only theYM3014B for reference, which claims 16-bit (linear) dynamicrange, against floating point samples with 3-bit exponent and 10-bit mantissa.
Anyway, the datasheet clearly states that the analog input is converted to14-bit digital signal, so we can assume that samples are 14-bit wide.There is no mention of sign bits, as the A/D converter is monopolar (Vdd/2center voltage bias), but I think the 14-bit samples are unbiased (signed) bythe A/D converter itself, to agree with all the two-complement computations.The reverse operation is done by the D/A converter.
I am actually concerned about the bit size of each sample. Its sounds like the14-bit multiplication results sound pretty awful for small signals, the way Iam emulating the system right now.Tests with 16-bit sample emulation sound less worse, so I am wondering whethersignals are actually processed as 14-bit or more.
Being a DSP, one of the most common operations is multiplication. Since thisoperation requires many logic gates, I guess there is only one multiplier inthe chip, reused within the hundreds of clock ticks per sample.
The only information about multiplication comes from the description offeedback coefficients in the datasheet, which states that the 6 data bits ofthe registers are mapped onto the 6 most significant bits of the 12-bittwo-complement operand, sign bit included.I think that the remaining 6 bits are not wasted only for such operation, butinstead the very same multiplier is shared also for gain (decibel)multiplications.
So, we have a multiplier that has a signal operand 14-bit wide, and a gainoperand 12-bit wide. The result should be a signal itself, 14-bit wide.
The multiplication is implemented as a classic 16-bit x 16-bit signedfixed-point multiplication (akaQ15), keeping only the 14 most significantbits.
Actually, since 14-bit operand sounds awful, unlike the few recordingsfound on the internet, I am using 16-bit operands right now.
The datasheet is very clear about feedback coefficients: they are 6-bit valuesmapped directly from the register data to the 12-bit two-complement operand ofthe multiplier, padded with zeros at the least significant operand bits.
Feedback coefficients seem to be a special case of gains, being mapped directlyas operands, while gains are remapped with a lookup table.
The datasheet mentions the gain coefficients in terms of decibels, tabulatedinto a table with 32 entries.Entries go from unity gain (0 dB) at the highest index, down to -60 dB at thesecond-lowest index, in steps of 2 dB. The lowest index (zero) is reserved forsilence. This gives a good granularity to volume levels (2 dB is a commonstep size in incremental volume control).
I think that such entries are saved into a single table with non-negative11-bit linear coefficients, from silence (all bits 0) to maximum volume(all bits 1).
I guess the negative coefficients are actually loaded as the one-complement(bit flip) instead of the two-complement, as often seen in Yamaha'ssynthesizers of the same age, to save silicon area despite a tiny gain error.
The digital delay line allows up to 100 ms of delay at the nominal input rate.This leads to a buffer of at least 2355 samples.Such buffer should be a shifting FIFO with 32 pre-determined tap positions.
The delay line emulation is actually implemented as a random-access ringbuffer, as shifting the whole buffer at each input sample is a waste of time.
The chip includes a 2x oversampling interpolator at the output stage, to helpreduce the analog circuitry to reconstruct the output from the 47.1 kHzoversampled D/A data.
I guess the interpolator is a FIR filter that reuses the same DSP circuitry forgain and coefficient fixed-point multiplication, to save silicon area.
The datasheet shows only themagnitude versus freuquency response.So, based on such information, I tried to match the reponse withIowa Hills FIR Filter Designer 7.0by trial and error. The parameters I found do not look exacly as per thediagram of the datasheet, but the actual response should not differ too much:
I also think that the kernel is notminimum-phase, to save further siliconarea, thanks to the mirrored coefficient values.I am no expert, but it looks like minimum-phase is also not welcome to audio,because of phase-incoherence among frequencies, despite having shorter delays.I left the possibility to choose the minimum-phase feature by configuring theYM7128B_USE_MINPHASE
preprocessor symbol.
It is possible to configure the floating point data type used for processing,via theYM7128B_FLOAT
preprocessor symbol. It defaults todouble
(double precision).
Please note that, contrary to common beliefs, thedouble
data type isactually very fast on machines with hardware support for it.
I think that you should switch tofloat
(single precision) only if:
- the hardware is limited to it, or
- if machine code vectorization gets faster, or
- conversion from/to buffer data to/from double precision is slower.
This repository provides a fully-featured example. It is a stream processor, inthat it processes sample data coming fromstandard input, elaborates it, andgenerates outputs on thestandard output.
Please refer to its own help page, by calling the canonicalYM7128B_pipe --help
, or reading it embedded inits source code.
- Ensure the following packages are installed:
sudo apt install alsa-utils build-essential
- Enterexample folder and runmake_gcc.sh:
cd examplebash make_gcc.sh
You should find the generated executable file as
YM7128B_pipe
.Play some audio directly with
aplay
(the\
is for command linecontinuation), using thedune/warsong
preset:
./YM7128B_pipe -r 23550 -f S16_LE --preset dune/warsong< sample_mono_23550Hz_S16LE.raw \| aplay -c 2 -r 47100 -f S16_LE
About
Yamaha YM7128B Surround Processor emulation library