Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Yamaha YM7128B Surround Processor emulation library

License

NotificationsYou must be signed in to change notification settings

TexZK/YM7128B_emu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This library aims to emulate theYamaha YM7128B Surround Processor.

Block diagram

The original goal is to contribute to the emulation of theAdLib Gold soundcard, which used such integrated circuit to add surround effects to the audiooutput, as heard in the beautiful soundtrack of theDune videogame, madespecifically for that rare sound card.

As I do not know how the actual circuits are made, I cannot emulate the chipperfectly. Instead, my goal is to simulate the overall sound, based on theshort information available, and with some assumptions based on other Yamahachips of the same age, as far as I know.

If anybody can analyze a decapped chip, that would of course be the bestinformation, as already done for many vintage sound chips.

I am willing to get some real chips, as long as they are available forpurchase, at least for black-box comparisons with dedicated test vectors.


Engines

The library provides chip emulation with the following engines:

  • Fixed: it should emulate an actual chip, using fixed point arithmetics.It features all the actual chip constraints, like sample rates, thepresence of an output oversampler, and saturated arithmetics. Algorithmsare rather slow to compute, and resampling is typically heavy.

  • Float: likeFixed, using floating point arithmetics instead.It increases the sound quality, while keeping the architecture similar tothat of an actual chip. Algorithms are fast to compute, but resamplingstill has a significant impact on performance.

  • Ideal: it represents an ideal implementation of the algorithms.Arithmetics are in floating point, there are no saturations, and no samplerate conversions, not even at the output stage. Algorithms are fast,despite the loss in accuracy with respect to an actual chip.

  • Short: somewhat an hybrid ofFixed andIdeal, in that it usesfull-range 16-bit fixed point arithmetics with saturations, but withoutsample rate conversions. Algorithms are fast, despite the loss in accuracywith respect to an actual chip.

In the following table, an overview of the features of each engine:

FeatureFixedFloatIdealShort
Input filtersuggested: 6th order 15 kHzsuggested: 6th order 15 kHznot requirednot required
Input signalQ1.13double, normalizeddoubleQ1.15
Input rate23550 Hz23550 Hzsuggested: above 40 kHzsuggested: above 40 kHz
Saturated arithmeticsyesyes, normalizednoyes
Signal operandQ1.15doubledoubleQ1.15
Gain operandQ1.11doubledoubleQ1.15
Feedback operandQ1.5doubledoubleQ1.5
Oversampler operandQ1.11doubleno oversamplingno oversampling
Output rate47100 Hz47100 Hzsame as inputsame as input
Output signalQ1.13double, normalizeddoubleQ1.15
Output filtersuggested: 3rd order 15 kHzsuggested: 3rd order 15 kHznot requirednot required
Status memoryallocated by the userallocated by the userallocated by the userallocated by the user
Delay memorypart of the statuspart of the statusdynamic heap allocationdynamic heap allocation
Performancevery slowslowfastfast
Accuracybest?goodpoorpoor

Usage

To use this library, just includeYM7128B_emu.c andYM7128B_emu.hinto your project.

All the engines implement the same conceptual flow:

  1. Status memory allocation.
  2. CallCtor() method to invalidate internal data.
  3. CallReset() method to clear emulated registers.
  4. CallSetup() to allocate internal delay memory(only forIdeal engine).
  5. CallStart() to start the algorithms.
  6. Processing loop:
    1. Filter input samples.
    2. Resample input samples.
    3. For each sample:
      1. CallProcess() method, with appropriate data types.
    4. Resample output samples.
    5. Filter output samples.
  7. CallStop() method to stop the algorithms.
  8. CallDtor() method to deallocate and invalidate internal data.
  9. Status memory deallocation.

Register access timings are not emulated.


Sample rate conversion

The biggest concern is about the weird sampling rates of the original chip:23.6 kHz for input, and 47.1 kHz for output.

Since modern common audio sample rates are based either on audio CD (44.1 kHz),or legacy professional audio (48 kHz), the sampling rate of most late '80sYamaha products need sample rate conversion both at input and output to beemulated by software.

Quality realtime conversions of such unusual sample rates intrinsicallyrequire more CPU time than the chip emulation itself, so expect them to be theactual bootleneck of the emulation.

Also, sample rate conversions add delays to their outputs, which are notwelcome to realtime processing.

Libraries

This library does not provide sample rate conversion itself, because properconversion (without audible distortion) is not trivial at all. Instead, thereare many libraries available, each with its quality rating, performance,and licensing.You can find a comprehensive comparisonat Infinite Wave's website.

Personally, I have tried the following open source libraries with success:

Analog filtering

The datasheet suggests some input and output analog filtering, to reducealiasing effects, more specifically a 6th order low-pass input filter, and a3rd order low-pass output filter.Such filters should be considered into the system emulation, because an actualphysical system needs them. I do not know how theAdLib Gold filtered them.

I do not have specifications about them, but they should beButterworthfilters (common analog audio filters), and I guess the cut-off frequency isaround 15 kHz (common for FM radio,Sound Blaster, etc.).

System block diagram

Again, analog filters are not provided by the library, to give the user freedomabout their implementation (e.g.Robert Bristow-Johnson's Audio EQ Cookbook-text).


Implementation details

Here you can find some descriptions and discussions about implementationdetails.

Language

I choseC89 with a bit ofC99. I was going to useC++20 for my ownpleasure, but instead I find good old and mature C to be the best for such atiny library. C is also easier to integrate with other languages, and themighty features of a colossal language like C++ are more of a burden than foractual use in such case.

Cross-platform support

The code itself should be cross-platform and clean enough not to givecompilation errors or ambiguity.

Currently the library is developed and tested underWindows withMSVC++ 14.Of course, I will at least provide support forgcc andclang underLinux.I do not have access tomacOS, but I guess theLinux support should fit.

Code style

I chose the path of verbosity for variable declaration, to help debugging fixedpoint math and buffering. Indeed, when compiled for performance, all thosevariable declarations get optimized away easily.

I did not write the code for explicit vectoring, preferring aKISS approachat this stage of development. Actually I am not satisfied about how theMSVC++ compiler is generating machine code, and I guess that optimizing thecode for vectoring should improve the performance by some margin, especiallythe parts for parallel 8-tap delay and output oversampling.

Sample format

The datasheet claims 14-bitfloating point sampling for both input andoutput. There is no information about such esoteric format itself, but thereare datasheets of Yamaha products of the same age that do.I know only theYM3014B for reference, which claims 16-bit (linear) dynamicrange, against floating point samples with 3-bit exponent and 10-bit mantissa.

Anyway, the datasheet clearly states that the analog input is converted to14-bit digital signal, so we can assume that samples are 14-bit wide.There is no mention of sign bits, as the A/D converter is monopolar (Vdd/2center voltage bias), but I think the 14-bit samples are unbiased (signed) bythe A/D converter itself, to agree with all the two-complement computations.The reverse operation is done by the D/A converter.

I am actually concerned about the bit size of each sample. Its sounds like the14-bit multiplication results sound pretty awful for small signals, the way Iam emulating the system right now.Tests with 16-bit sample emulation sound less worse, so I am wondering whethersignals are actually processed as 14-bit or more.

Multiplication

Being a DSP, one of the most common operations is multiplication. Since thisoperation requires many logic gates, I guess there is only one multiplier inthe chip, reused within the hundreds of clock ticks per sample.

The only information about multiplication comes from the description offeedback coefficients in the datasheet, which states that the 6 data bits ofthe registers are mapped onto the 6 most significant bits of the 12-bittwo-complement operand, sign bit included.I think that the remaining 6 bits are not wasted only for such operation, butinstead the very same multiplier is shared also for gain (decibel)multiplications.

So, we have a multiplier that has a signal operand 14-bit wide, and a gainoperand 12-bit wide. The result should be a signal itself, 14-bit wide.

The multiplication is implemented as a classic 16-bit x 16-bit signedfixed-point multiplication (akaQ15), keeping only the 14 most significantbits.

Actually, since 14-bit operand sounds awful, unlike the few recordingsfound on the internet, I am using 16-bit operands right now.

Feedback coefficients

The datasheet is very clear about feedback coefficients: they are 6-bit valuesmapped directly from the register data to the 12-bit two-complement operand ofthe multiplier, padded with zeros at the least significant operand bits.

Feedback coefficient operand

Feedback coefficients seem to be a special case of gains, being mapped directlyas operands, while gains are remapped with a lookup table.

Gain coefficients

The datasheet mentions the gain coefficients in terms of decibels, tabulatedinto a table with 32 entries.Entries go from unity gain (0 dB) at the highest index, down to -60 dB at thesecond-lowest index, in steps of 2 dB. The lowest index (zero) is reserved forsilence. This gives a good granularity to volume levels (2 dB is a commonstep size in incremental volume control).

I think that such entries are saved into a single table with non-negative11-bit linear coefficients, from silence (all bits 0) to maximum volume(all bits 1).

I guess the negative coefficients are actually loaded as the one-complement(bit flip) instead of the two-complement, as often seen in Yamaha'ssynthesizers of the same age, to save silicon area despite a tiny gain error.

Digital delay line

The digital delay line allows up to 100 ms of delay at the nominal input rate.This leads to a buffer of at least 2355 samples.Such buffer should be a shifting FIFO with 32 pre-determined tap positions.

The delay line emulation is actually implemented as a random-access ringbuffer, as shifting the whole buffer at each input sample is a waste of time.

Oversampler

The chip includes a 2x oversampling interpolator at the output stage, to helpreduce the analog circuitry to reconstruct the output from the 47.1 kHzoversampled D/A data.

I guess the interpolator is a FIR filter that reuses the same DSP circuitry forgain and coefficient fixed-point multiplication, to save silicon area.

The datasheet shows only themagnitude versus freuquency response.So, based on such information, I tried to match the reponse withIowa Hills FIR Filter Designer 7.0by trial and error. The parameters I found do not look exacly as per thediagram of the datasheet, but the actual response should not differ too much:

Oversampler emulation versus datasheet

I also think that the kernel is notminimum-phase, to save further siliconarea, thanks to the mirrored coefficient values.I am no expert, but it looks like minimum-phase is also not welcome to audio,because of phase-incoherence among frequencies, despite having shorter delays.I left the possibility to choose the minimum-phase feature by configuring theYM7128B_USE_MINPHASE preprocessor symbol.

Floating-point

It is possible to configure the floating point data type used for processing,via theYM7128B_FLOAT preprocessor symbol. It defaults todouble(double precision).

Please note that, contrary to common beliefs, thedouble data type isactually very fast on machines with hardware support for it.

I think that you should switch tofloat (single precision) only if:

  • the hardware is limited to it, or
  • if machine code vectorization gets faster, or
  • conversion from/to buffer data to/from double precision is slower.

YM7128B_pipe example

This repository provides a fully-featured example. It is a stream processor, inthat it processes sample data coming fromstandard input, elaborates it, andgenerates outputs on thestandard output.

Please refer to its own help page, by calling the canonicalYM7128B_pipe --help, or reading it embedded inits source code.

Usage example with Lubuntu 20.04

  1. Ensure the following packages are installed:
sudo apt install alsa-utils build-essential
  1. Enterexample folder and runmake_gcc.sh:
cd examplebash make_gcc.sh
  1. You should find the generated executable file asYM7128B_pipe.

  2. Play some audio directly withaplay (the\ is for command linecontinuation), using thedune/warsong preset:

./YM7128B_pipe -r 23550 -f S16_LE --preset dune/warsong< sample_mono_23550Hz_S16LE.raw \| aplay -c 2 -r 47100 -f S16_LE

[8]ページ先頭

©2009-2025 Movatter.jp