Monday, 29 April 2019

Mod player in Rust - part 7. Regression tests ( and discovering Serde )

While working on the mod player I many of my changes have had unintended side effects and sometimes broke songs that were working fine. Sometimes there breakages took some time to notice and required digging through changes to work out what happened.
To stop these unintended changes to the audio output I am setting up regression tests that compare current output to expected output ( expected output is the output produced by the latest production version of the code). With regression tests in place I can work with the code safe in the knowledge that the audio output only changes when I intend for it to.

Using cargo for tests

Cargo has support for two kinds of tests
  • Unit tests cover small pieces of code, such as a single function or structure. The unit tests typically exist in the same file as the functionality they test. ( so they live in src.)
  • Integration tests test the code like an external user would. Only public APIs are used by the integration test. Integration tests live in their own tests folder.
The regression tests I want to write are clearly integration tests.
Cargo has nice support for running tests. All tests can be run with the command;
cargo test
Always running all the tests can be slow when developing code. Cargo has the --test option to specify the integration test to run. So to only run the test in tests\regression.rs, use the command;
cargo test --test regression
The other useful option is --release to run the release version of the code. This is an absolute must when you work on optimizing the code. To run my regression test in release mode I use the command
cargo test --test regression --release
The final option I want to mention is --nocapture. By default, all print statements from the test captured and supressed by the test runner. The --nocapture turns off this behaviour and lets the print statements get to the output. To enable to the --nocapture option for my regression test running in release mode I use the command;
cargo test --test regression --release -- --nocapture
The funny looking -- before the --nocapture is required because there are actually two executables involved in running the test;
  • cargo, which manages the compilation,
  • the test runner which contains the actual compiled test,
any option before the -- is for cargo and any option after is for the test runner.

Designing the tests

First I needed to decide how to compare the current output to the expected output.
The naïve approach would be to store the entire output and compare the old and new outputs to detect any changes. This would require a lot of storage for the validated output and is clearly impractical for a reasonable set of music files.
I did consider some clever schemes of storing every nth sample but this would miss changes that fall between n ( very possible given the nature of the music data ). I also thought about using a variable step size to avoid size periodicity issues. This was getting far too clever and my experience is that cleverness tends to have large hidden costs.
I ultimately decided that a simple checksum is good enough for what I am trying to do. The downside is that it wont show the precise location of the change but on the upside it is easy to implement and to store the validate output. I used the crc crate to calculate my checksum. ( Because the tests use dev-dependencies for all the test code without burdening the final library )
Once I knew what test data I needed to store; a song and its expected checksum. I had to decide how to store it.
The simplest solution would be to simply store the expected checksums in the test code. This is perfectly workable for a small set of tests but can become tedious if I want to test a large number of files and especially if I need to rebase my tests.
I decided to store them in a json file that contains each music file and its expected checksum. While figuring out how to read and write JSON in Rust I came across Serde which is very cool.

Discovering Serde

Serde is a serialization and deserialization frame work for Rust. It consists of the Serde framework and plug-ins for handling different data formats. To bring json serialization with Serde to my test I add the following lines to my cargo.toml files [dev-dependencies] section
serde = { version = "1.0.90", features = ["derive"] }
serde_json = "1.0.39"
The first line brings in the framework. The derive feature needs to be enabled to use the Rust derive macros for creating the Serde serializer traits.
The second line adds a dependency for the json plug-in.
To declare a structure that can be serialized and deserialized by Serde I write the code
#[derive(Serialize, Deserialize)]
struct ExpectedChecksums {
    song_checksums: HashMap<String, u64>,
}
The first line tells the compiler to use the procedural macros to generate the Serialize and Deserialize traits. Rest of the structure is just a vanilla structure.
To use read and write the new structure into a json file I can use the following code;
    let actual_checksums = ExpectedChecksums { song_checksums };
    ...
    let serialized = serde_json::to_string_pretty(&actual_checksums).unwrap();
    fs::write("reg_actual.json", &serialized).expect("Cant write actual results");
The actual work of turning the the structure into a string ready to be written to disk is just one line of code! To read my structure back from disk is equally simple
    let expected_json = fs::read_to_string("reg_expected.json")?;
    let result: ExpectedChecksums = serde_json::from_str(&expected_json)?;
Note that serde directly creates the ExpectedChecksums structure I want. Based on my very brief experience of using serde, it is one of the most impressive serialization frameworks I have used and I will be making more use of it in the future.

Measuring performance

With a set of standardized tests in place it was easy to add measurements for performance. There are several crates for accurate timing but I found that the functionality in std::time was good enough for my use case.
The code for measuring the time it took to play the song distils to;
    let before_play = time::Instant::now();
    ... //play the song
    let after_play = time::Instant::now();
    let play_time = after_play.duration_since(before_play);
I did have to split my test code into two stages;
  • First stage plays the song into memory buffer ( Vec)
  • The second stage calculates the checksum for the buffer and compares it to the expected value
The split was necessary because the checksum calculation was taking a long time. In debug mode it was most of the time was spent on the checksum and note the music playing.

Analysing Code Performance in Debug and Release

I have been wondering about the performance of my code. The main issue is that it produces one sample pair at a time, incurring quite a bit of overhead that could be optimized away if it produced more samples per call. Until the regression tests were in place I did not have a systematic way of evaluating how fast or slow the code was.
I ran all my regression tests in Release and Debug
songRelease (usecs)Debug (usecs)Debug/ReleasePlay speed
CHIP_SLAYER!.MOD234,8545,313,99722.63940
BUBBLE_BOBBLE.MOD59,3271,075,33118.13755
cream_of_the_earth.mod291,0416,161,28321.17862
switchback.mod284,8935,436,91019.08755
stardstm.MOD285,2106,273,36822.00908
overload.mod469,58710,906,50723.23951
sarcophaser.MOD309,9646,947,59122.41916
BOG_WRAITH.mod33,078800,03724.191045
wasteland.mod155,0863,413,87322.01891
The columns Release (usecs) and Debug (usecs), measure how many microseconds it took to play the entire song into a memory buffer in Release and Debug mode. The column; Debug/Release is the ratio between the two and a good indication of how much slower the debug mode is compared to the release mode. Roughly, the debug mode is 20X slower than the release mode.
The column Play speed shows how much faster than real time the play speed is in Release mode. A play speed of 10 would mean that the player can play back 10 seconds in 1 second. The play speed for all the tests is between 750 and 1050. The other way to think about it is that it take around 0.1% - 0.13% of one core to play the music in real time. For debug mode this would be roughly 2%-3%.
Before making these measurements I was thinking about optimizing the code but given these measurements I am going heed Donald Knuth's famous quote "premature optimization is the root of all evil."

Wrapping up

I have updated all the code on https://github.com/janiorca/mod_player and will update the crate.

No comments:

Post a Comment

glium - instancing

I have spent a bit of time looking into instancing with  glium  and OpenGL. It turns out that  glium  makes it very easy to make use of th...