Sunday, 17 February 2019

Mod player in Rust - part 4. Finally some music

In this post we will finally have some music. Below a quick preview of the player in action. 

In this post I will spend some time discussing how the Amiga sound hardware worked. The original mod file format and how it is played back is intimately linked with the Amiga hardware.
Tracking the play position

The speed at which the mod is played is based on counting vertical blank interrupts ( the interrupt triggered at each hardware screen refresh https://en.wikipedia.org/wiki/Vertical_blank_interrupt ) On PAL systems the VBI frequency is 50hz and on the NTSC systems it is 60hz. The song's playing speed is defined as the number of vertical blanks between two lines in a pattern. ( This means that songs developed on a PAL system will play slightly too fast on an NTSC system. )
Mod files also have an option to specify the playback speed in beats per minute (BPM) but this was not used often because many of the mod songs were intended to be used with demos. Demos often ran all of their code on the vertical blank interrupt but for many values of BPM the player could not have run on the same interrupt, making it much harder to manage the processor time. ( There also seems to be a lot variability between players and mods over how exactly the BPM should be handled ) For now I am going to concentrate on mods that used VBI based speed.
When playing a mod with a VBI based speed on an Amiga, keeping track of the speed was just a matter of checking how many times the interrupt had been called since playing the last line. This is not an option for my Rust player as it does not run on an interrupt. Instead I can use the device sample rate to track theoretical vblanks using the following formula
samplesPerVblank = \frac {deviceSampleRate}{vblankFrequency}
If the playback device has a sample rate of 48000 and I am simulating the playback on a PAL device, the samplesPerVblank is 48000/50 = 960. That means that after every 960 samples it is time to increment the vertical blank counter by one, and if it matches the playback speed it is time to play a new line.
Once we have played all the lines from the current pattern it is time to move to the next pattern.
The PlayerState keeps track of all the above information.
struct PlayerState{
    song_pattern_position: u32,             // where in the pattern table are we currently
    current_line: u32,                      // current position in the pattern
    song_speed: u32,                        // in vblanks
    current_vblank : u32,                   // how many vblanks since last play line
    current_vblank_sample : u32,            // how many device samples have we played for the current 'vblank'
    samples_per_vblank: u32,                // how many device samples per 'vblank'
}

Playing samples

The mod files use frequency shifting to player different notes of the same sound. If the original sound effect was sampled at 10kh, playing it back at double the original speed increases its frequency to 20kh and plays it one octave higher.
On the original Amiga hardware the DMA (Direct Memory Access) is responsible for transferring audio data from memory and sending it to the DAC ( Digital to Analog Converter ). The DMA speed is controlled by setting the number of system clock ticks between samples. On a PAL system each clock tick is 281.937 nanoseconds, thus there are 3,546,895 clock ticks per second ( on a NTSC system there a tick is 279.365 nanoseconds and there are 3,579,545 clock ticks per second).
For hardware design reasons the smallest interval between samples is 123 system clock ticks on a PAL system so this works out as 34.678 micro seconds per sample or about 28.8 khz. (On NTSC systems the minimum clock counts between samples is 124)
The note data in mod files specify the number of system clocks ticks each sample byte is played for before fetching a new one. This is the period in the note data. When the note states a period of 428 it means that means that each sample should be held for 120 microseconds (428 clocks * 0.281397 microsecs per clock = 120 microsecs) yielding a sample rate of about 8kHz. To play the same sample one octave higher a period of 214 would be used.
We can emulate this by behaviour by resampling the sounds at a speed that matches intended playback speed. We just need to map device sample duration into Amiga clock ticks; If the playback device has a sample rate of 48,000 we know that each device sample corresponds to 73.89 clock ticks (3,546,895/48,000 = 73.89). So each time we calculate a new sample we advance the playback position by;
\frac {clockTicksPerSecond}{period*deviceSampleRate} = \frac{clockTicksPerDeviceSample}{period}
To keep track of clockTicksPerDeviceSampleI add an additional variable to the PlayerState;
    clock_ticks_per_device_sample : f32,    // how many amiga hardware clock ticks per device sample
Some of the mod effects are created by varying the channel's period value while the note is playing. This is why the channel stores the period value rather than the value for clockTicksPerDeviceSample/period
In addition to the period. Each channel need to track what it's sample number, position within the sample and playback volume;
struct ChannelInfo {
    sample_num: u8,         // which sample is playing 
    sample_pos: f32,         
    period : f32,           
    volume: f32, 
}

Putting it together

I am going to quickly run through the main blocks of the playing code. This is all pretty straight forward Rust code.
First, I replace the instrument playing logic from previous version of the code with a call to next_sample which returns a tuple that has the left and right samples;
        cpal::StreamData::Output { buffer: cpal::UnknownTypeOutputBuffer::F32(mut buffer) } => {
            for sample in buffer.chunks_mut(format.channels as usize) {
                let ( left, right ) = next_sample(&song, &mut player_state);
                sample[0] = left;
                sample[1] = right;
            }
        }
The above uses destructuring to assign the members of the returned tuple into left and right. Sadly, Rust doesn't seem to allow for destructuring directly into the destination array.
The code for next_sample tracks how many samples it has played for the current vblank and whether enough vblanks have passed to play a new line. The code keeps track of vblanks and lines separately because many of the effects require updates that occur on the vblank.
fn next_sample(song: &Song, player_state: &mut PlayerState) -> (f32, f32) {
    let mut left = 0.0;
    let mut right = 0.0;

    // Have we reached a new vblank
    if player_state.current_vblank_sample >= player_state.samples_per_vblank {
        player_state.current_vblank_sample = 0;

        // Is it time to play a new note line
        if player_state.current_vblank >= player_state.song_speed {
            player_state.current_vblank = 0;
            play_line( song, player_state );
        }
        player_state.current_vblank += 1;
    }
    player_state.current_vblank_sample += 1;
    // .. handle sample play back
The next interesting bit is in play_line which updates the current line and pattern position and plays a note for each channel;
fn play_line(song: &Song, player_state: &mut PlayerState ) {
    // use the pattern table to work out which pattern we are in    
    let pattern_idx = song.pattern_table[player_state.song_pattern_position as usize];
    let pattern = &song.patterns[ pattern_idx as usize];

    pattern.print_line(player_state.current_line as usize);
    let line = &pattern.lines[ player_state.current_line as usize ];
    for channel_number in 0..line.len(){
        play_note(&line[ channel_number as usize ], player_state, channel_number, song);
    }
    player_state.current_line += 1;
    if player_state.current_line >= 64 {
        // end of the pattern
        player_state.song_pattern_position += 1;
        player_state.current_line = 0;
        println!("");
    }
}
The play_note function sets up the new sample ( if one is set ) and handles effects. For now, this code will only handle a minimal set of effects to keep the code size and complexity down. The function needs access to the PlayerState structure instead of just the channel data because some of the effects, like setting the speed, are global.
fn play_note(note: &Note, player_state: &mut PlayerState, channel_num: usize, song: &Song) {
    if note.sample_number > 0 {
        // sample number 0, means that the sample keeps playing. The sample indices starts at one, so subtract 1 to get to 0 based index
        player_state.channels[channel_num].volume = song.samples[(note.sample_number - 1) as usize].volume as f32 / 64.0;    // Get volume from sample
        player_state.channels[channel_num].sample_num = note.sample_number;
        player_state.channels[channel_num].sample_pos = 0.0;
        if note.period != 0 {
            player_state.channels[channel_num].period = note.period as f32;
        }
    }

    match note.effect {
        Effect::SetSpeed{ speed } => {
            player_state.song_speed = speed as u32;
        }
        Effect::SetVolume{ volume } => {
            player_state.channels[channel_num].volume = volume as f32;
        }
        Effect::None => {}
        _ => {
            println!("Unhandled effect" );
        }
    }
}
Finally, the second half of the next_sample performs the sample retrieval and mixing into device channels
for channel_number in 0..player_state.channels.len() {
    let channel_info: &mut ChannelInfo = &mut player_state.channels[channel_number];
    if channel_info.sample_num != 0 {
        let current_sample: &Sample = &song.samples[(channel_info.sample_num - 1) as usize];
        let mut channel_value: f32 = current_sample.samples[(channel_info.sample_pos as u32) as usize] as f32;             
        // max channel vol (64), sample range [ -128,127] scaled to [-1,1] 
        channel_value *= channel_info.volume / (128.0*64);

        // update position and check if we have reached the end of the sample
        channel_info.sample_pos +=  player_state.clock_ticks_per_device_sample / channel_info.period;
        if channel_info.sample_pos >= current_sample.size as f32 {
            channel_info.sample_num = 0;
        }
        let channel_selector = ( channel_number as u8 ) & 0x0003; 
        if channel_selector == 0 || channel_number as u32 == 0 || channel_number == 3 {
            left += channel_value;
        } else {
            right += channel_value;
        }
    }
}
I need to control the range of the sound output from the mixer so it has a range that works with the playback device. As cpal uses the range [-1,1] for devices supporting floating point channels I have decided to use the same range for mixer output.
Mod files only support stereo. The mod channels 0 and 3 should be played on the left channel and 1 and 2 on the right. If there are more channels, the same allocation scheme is used ( 4 and 7 are let, 5 and 6 are right)

Success

The code we have developed so far will play simple mod files that have minimal effects. The mod file I have included ( axelf.mod ) is an example of such a mod file. Most mods make heavy use of effects and cannot be played with this code.
In my next post I will look into supporting more effects so I can support a larger set of mod files. The code is also growing to be quite large so it is time to look at a Rust modules as a better of organizing code.
All the code for the current article is at
https://github.com/janiorca/articles/tree/master/mod_player-4

No comments:

Post a Comment

glium - instancing

I have spent a bit of time looking into instancing with  glium  and OpenGL. It turns out that  glium  makes it very easy to make use of th...