Code Slow: April 2019

Monday, 29 April 2019

Mod player in Rust - part 7. Regression tests ( and discovering Serde )

While working on the mod player I many of my changes have had unintended side effects and sometimes broke songs that were working fine. Sometimes there breakages took some time to notice and required digging through changes to work out what happened.

To stop these unintended changes to the audio output I am setting up regression tests that compare current output to expected output ( expected output is the output produced by the latest production version of the code). With regression tests in place I can work with the code safe in the knowledge that the audio output only changes when I intend for it to.

Using cargo for tests

Cargo has support for two kinds of tests

Unit tests cover small pieces of code, such as a single function or structure. The unit tests typically exist in the same file as the functionality they test. ( so they live in src.)
Integration tests test the code like an external user would. Only public APIs are used by the integration test. Integration tests live in their own tests folder.

The regression tests I want to write are clearly integration tests.

Cargo has nice support for running tests. All tests can be run with the command;

cargo test

Always running all the tests can be slow when developing code. Cargo has the --test option to specify the integration test to run. So to only run the test in tests\regression.rs, use the command;

cargo test --test regression

The other useful option is --release to run the release version of the code. This is an absolute must when you work on optimizing the code. To run my regression test in release mode I use the command

cargo test --test regression --release

The final option I want to mention is --nocapture. By default, all print statements from the test captured and supressed by the test runner. The --nocapture turns off this behaviour and lets the print statements get to the output. To enable to the --nocapture option for my regression test running in release mode I use the command;

cargo test --test regression --release -- --nocapture

The funny looking -- before the --nocapture is required because there are actually two executables involved in running the test;

cargo, which manages the compilation,
the test runner which contains the actual compiled test,

any option before the -- is for cargo and any option after is for the test runner.

Designing the tests

First I needed to decide how to compare the current output to the expected output.

The naïve approach would be to store the entire output and compare the old and new outputs to detect any changes. This would require a lot of storage for the validated output and is clearly impractical for a reasonable set of music files.

I did consider some clever schemes of storing every nth sample but this would miss changes that fall between n ( very possible given the nature of the music data ). I also thought about using a variable step size to avoid size periodicity issues. This was getting far too clever and my experience is that cleverness tends to have large hidden costs.

I ultimately decided that a simple checksum is good enough for what I am trying to do. The downside is that it wont show the precise location of the change but on the upside it is easy to implement and to store the validate output. I used the crc crate to calculate my checksum. ( Because the tests use dev-dependencies for all the test code without burdening the final library )

Once I knew what test data I needed to store; a song and its expected checksum. I had to decide how to store it.

The simplest solution would be to simply store the expected checksums in the test code. This is perfectly workable for a small set of tests but can become tedious if I want to test a large number of files and especially if I need to rebase my tests.

I decided to store them in a json file that contains each music file and its expected checksum. While figuring out how to read and write JSON in Rust I came across Serde which is very cool.

Discovering Serde

Serde is a serialization and deserialization frame work for Rust. It consists of the Serde framework and plug-ins for handling different data formats. To bring json serialization with Serde to my test I add the following lines to my cargo.toml files [dev-dependencies] section

serde = { version = "1.0.90", features = ["derive"] }
serde_json = "1.0.39"

The first line brings in the framework. The derive feature needs to be enabled to use the Rust derive macros for creating the Serde serializer traits.

The second line adds a dependency for the json plug-in.

To declare a structure that can be serialized and deserialized by Serde I write the code

#[derive(Serialize, Deserialize)]
struct ExpectedChecksums {
    song_checksums: HashMap<String, u64>,
}

The first line tells the compiler to use the procedural macros to generate the Serialize and Deserialize traits. Rest of the structure is just a vanilla structure.

To use read and write the new structure into a json file I can use the following code;

    let actual_checksums = ExpectedChecksums { song_checksums };
    ...
    let serialized = serde_json::to_string_pretty(&actual_checksums).unwrap();
    fs::write("reg_actual.json", &serialized).expect("Cant write actual results");

The actual work of turning the the structure into a string ready to be written to disk is just one line of code! To read my structure back from disk is equally simple

    let expected_json = fs::read_to_string("reg_expected.json")?;
    let result: ExpectedChecksums = serde_json::from_str(&expected_json)?;

Note that serde directly creates the ExpectedChecksums structure I want. Based on my very brief experience of using serde, it is one of the most impressive serialization frameworks I have used and I will be making more use of it in the future.

Measuring performance

With a set of standardized tests in place it was easy to add measurements for performance. There are several crates for accurate timing but I found that the functionality in std::time was good enough for my use case.

The code for measuring the time it took to play the song distils to;

    let before_play = time::Instant::now();
    ... //play the song
    let after_play = time::Instant::now();
    let play_time = after_play.duration_since(before_play);

I did have to split my test code into two stages;

First stage plays the song into memory buffer ( Vec)
The second stage calculates the checksum for the buffer and compares it to the expected value

The split was necessary because the checksum calculation was taking a long time. In debug mode it was most of the time was spent on the checksum and note the music playing.

Analysing Code Performance in Debug and Release

I have been wondering about the performance of my code. The main issue is that it produces one sample pair at a time, incurring quite a bit of overhead that could be optimized away if it produced more samples per call. Until the regression tests were in place I did not have a systematic way of evaluating how fast or slow the code was.

I ran all my regression tests in Release and Debug

song	Release (usecs)	Debug (usecs)	Debug/Release	Play speed
CHIP_SLAYER!.MOD	234,854	5,313,997	22.63	940
BUBBLE_BOBBLE.MOD	59,327	1,075,331	18.13	755
cream_of_the_earth.mod	291,041	6,161,283	21.17	862
switchback.mod	284,893	5,436,910	19.08	755
stardstm.MOD	285,210	6,273,368	22.00	908
overload.mod	469,587	10,906,507	23.23	951
sarcophaser.MOD	309,964	6,947,591	22.41	916
BOG_WRAITH.mod	33,078	800,037	24.19	1045
wasteland.mod	155,086	3,413,873	22.01	891

The columns Release (usecs) and Debug (usecs), measure how many microseconds it took to play the entire song into a memory buffer in Release and Debug mode. The column; Debug/Release is the ratio between the two and a good indication of how much slower the debug mode is compared to the release mode. Roughly, the debug mode is 20X slower than the release mode.

The column Play speed shows how much faster than real time the play speed is in Release mode. A play speed of 10 would mean that the player can play back 10 seconds in 1 second. The play speed for all the tests is between 750 and 1050. The other way to think about it is that it take around 0.1% - 0.13% of one core to play the music in real time. For debug mode this would be roughly 2%-3%.

Before making these measurements I was thinking about optimizing the code but given these measurements I am going heed Donald Knuth's famous quote "premature optimization is the root of all evil."

Wrapping up

I have updated all the code on https://github.com/janiorca/mod_player and will update the crate.

Tuesday, 2 April 2019

Mod player in Rust - part 6. Creating and publishing the crate

The module music player has progressed quite well. It now handles most of the effects and unusual storage types.

I think it is now good enough to create and publish a crate so that anyone can use the mod player functionality. In order to publish a crate there are a couple of areas I need to address;

documentation
examples
creating and publishing the crate

Creating documentation

I like crates that come with good documentation that make it easy to understand what the crate does and explains how to use it. I should try to do the same for my crate.

Rust comes with all the functionality for building good documentation. Typing the following command in the project folder ( the same folder the .toml) builds the html documentation from the source files and opens it in a browser.

cargo doc --open

The doc compiler looks for special comments that start with /// or //! to use for building the documentation. Any lines starting with /// are assumed to document the item ( function, enum, structure etc.) that follows the comment.

So for example the following code adds documentation for the Note structure.

/// Describes what sound sample to play and an effect (if any) that should be applied. 
pub struct Note{
    sample_number: u8,
    period: u32,            // how many clock ticks each sample is held for
    effect: Effect,
}

In the above example the second comment does not get included in the documentation for two reasons; it uses normal // comment and it is not public. Only public items can be documented. It would make very little sense to have documentation for items that are not visible to the library users.

The other type of comment, //! is documentation for the surrounding module. When I add the following lines to the too of my lib.rs file they get added to documentation's index page.

//! # mod_player
//! 
//! mod_player is a crate that reads and plays back mod audio files. The mod_player decodes the audio one sample pair (left,right) at a time
//! that can be streamed to an audio device or a file.

Documentation examples

Good libraries should make it easy to find working example code. Rust lets you insert tested example code right into the documentation.

Inserting ``` after the /// in the comment section tells Rust's doc compiler to treats it like a code block. So inserting the code following comment

//! To use the library to decode the a mod file and save it to disk
//! 
//! ```rust
//! fn main() {
//!     //...
//! }

produces the following documentation:

To use the library to decode the a mod file and save it to disk
fn main() {
   //..
} 

The really cool thing is that the included example code can be fully tested code. So if you ever change the way the library should be used, the tests will catch any outdated documentation examples.

Running these tests could not be simpler. Typing

cargo test

on the command line will instruct cargo to run all the tests in the folder, including the code examples in the documentation.

You can also use rustdoc directly but I found it much harder to get this to work correctly when the examples have external dependencies. For now, I have decided to let cargo handle this for me.

The example compiler does its best to make it easy to create and maintain examples. It will automatically wrap the example code inside a fn main() {} block if one does not exist.

Examples

In addition to the examples in the documentation I want to include a couple of more fully fleshed out examples. The examples are placed into the examples folder.

My examples use several dependencies to illustrate how to use the library; I use hound for writing WAV files and cpal for playing to an audio device. The library itself does not care about how the audio is used so I do not want my library to have a dependency on either of these crates. This is where the [dev-dependencies] section in the toml becomes useful.

I have the following section in my toml file

[dev-dependencies]
cpal = "0.8.2"
hound = "3.4.0"

This tells Rust that the examples and tests have a dependency on these crates. The crate itself does not have these dependencies and these will not be picked up by any project using the crate.

To run an example I use the cargo command run

cargo run --example streaming_player

The above commmand tells cargo to go into the examples folder and compiled and run the file streaming_player.rs.

The examples replace the various versions of main.rs I used for testing the library. By turning that code into examples I have converted previous throw-away code into a useful part of the library documentation.

Publishing the crate

This too turned out to be fairly easy. First I had to create an account at crates.io. I needed to use my github account to create a crates.io account and grant it access to my github account. It was not terribly clear exactly why the access was needed and what it would be used for. I wish crates.io would have a clearer explanation of this.

I also needed to create an a access token for crates.io that my installed cargo can use to access my crates. This just involved going to my crates.io Account settings and creating a new token buy giving it a name, clicking on create and copying the resulting command line into my local session

cargo login

my cargo can access crates.io

Now I need to package up my crate. Unsurprisingly, the command line for this is

cargo package

This builds the package, runs all the tests and checks that the .toml file has all the relevant sections filled in correctly.

One of the required fields is license, so I needed to pick the license for my crate. The spdx.org has a long list of standard open source licenses and their identifiers. I picked the MIT license because it is widely used and permits the use of the code in open source and proprietary projects.

Once the package builds it can be published. The command for this is;

cargo publish

This uploads the crate to crates.io and makes the crate visible to all.

Crate documentation

This gets the crate published but the its page looks a bit barren. The only text is the description tag from the .toml file.

Turns out that crates.io does not extract the documentation from the source code comments but expects to find a separate readme.md file. This is a bit of a pain because the top level module documentation would be ideally suited for the crate description.

The crate is available at crates.io and the source code is at github

Next Steps

There are parts of the code that have become quite messy and could do with refactoring. Before I do any such thing I need to makes sure my changes don't break the functionality so I need to add tests.

As I am nor publishing the crate I think I should add options for different playing modes and what to do when encountering unhandled effects

Code Slow