Generating Music Notation Videos (for the iPod)

Dr. Thomas Tensi
Paganinistraße 60, D-81247 München

Introduction

When transferring some music to my mobile MP3-player for practising, I wondered whether it would be possible to have the musical notation available at the same time. Since the modern MP3-players support videos, I had the idea to produce a video displaying the musical score while playing the music.

The whole method is based on the notation program lilypond which can produce single score pages as bitmaps and a MIDI file at the same time.

Of course, the whole thing is not the same as a MIDI sequencer where you can dynamically select the voices to be shown or vary the tempo for practice. Nevertheless it was okay for me, because you can, for example, have all the other voices play in the video while your part of the score is shown on the display. And besides I have not yet met a genuine MIDI sequencer for my video iPod...

All this video production for any MPEG4-capable portable player can be completely done with open-source software and the resulting videos are only slightly larger than an MP3 audio file without score (at least with an MPEG-4 codec). But a word of caution: when doing this you may not be afraid of command-line programs, because we shall use some of them. I shall show the commands needed one at a time to illustrate the process, but it is highly recommended to put them into a shell file in Unix or a bat-file in Windows (an example BAT-file is also in the download archive).

As a little motivation for reading on figure 1 shows one frame of Bach's chorale BWV639 on the iPod.

Fig. 1: iPod video frame showing measure 6 from BWV639

The full video can be seen here.

Prerequisites

For the description of the process I assume that we are starting from scratch, i.e., no music exists so far and no score representation whatsoever. The process may be varied accordingly, e.g. when you wanted to generate a score for a music file already existing you would skip any MIDI file generation and encoding.

In the assumed context the process for getting a video in AVI format is as follows:

write a score text file in lilypond format with your favourite text editor (notepad is fine),
have the music notation program lilypond produce single score pages as bitmaps and a MIDI file,
convert the MIDI file into a WAV file with the player timidity,
write a two-line config file with your favourite text editor, and
generate a script file for the avisynth frame server (where the script is already the desired video in AVI format)

Once you have an AVI video, it can be encoded specifically for your target MP3 video player. In phase two below I shall describe the process for the video iPod.

So for the plain generation of the music notation video you need:

a text editor,
the music notation program lilypond,
the MIDI player and converter timidity++ plus instrument samples (so-called soundfonts available for free at Hammersound),
the avisynth frame server

For the iPod video encoding you need either MeGUI (which contains all the necessary audio and video codecs) or roll on your own with

x264 for encoding video in MPEG4-ASP format,
faac for encoding audio in MPEG-4 AAC, and
NicMP4Box.zip for intertwining the audio and video stream into an MPEG-4 video.

Once you have this all installed and have put the binary directories in your system's search path, we are ready to go. In this tutorial I can only show a toy example for space reasons, but of course the idea works for bigger files, too. The download archive contains the bach example.

Phase 1: Generating an AVI video

Step 1.1: Preparing the lilypond score file

The first step is to write a lilypond text file, which contains the notes of the score. You can also generate that file from a MIDI file via tools included in the lilypond distribution, but let's assume we have to write it from scratch.

Lilypond is a very technical text file format for scores and I shall not go into details, but the main idea is that notes are described by their names (like "c") together with durations forming sequences. Those sequences can be given symbolic names and combined into systems within a score.

The score for a simple c-major scale up and down in quarter notes and a trailing half note will look like this:

  myMajorScale = \relative c' {
    c4 d e f |
    g a b c  |
    b a g f  |
    e d c2   |
  }

The digits behind the note pitches are the durations: "4" means a quarter, "2" means a half note and the notes without duration have the same duration as the note before.

To make the file complete we add a reference to an include file setting up the output size, some command to define the font-size and the score section which results in:

  \version "2.10.0"
  \include "iPodNotationVideo.ly"

  #(set-global-staff-size 20)

  myMajorScale = \relative c' { c4 d e f | g a b c | b a g f | e d c2 | }

  \score{
    <<
      \new Staff { \clef "treble"  \key c \major
                   \tempo 4 = 60 \
		   \myMajorScale }
    >>
    \layout{}
    \midi{}
  }

As you can see there is an \include-directive for a file setting the output page properties which are specific for the desired output device. In our case it defines the page size to be 64mm×48mm (which is fine for the iPod), adjusts some space dimensions on the page and suppresses all headers. Make sure that this file called ipodNotationVideo.ly is in the same directory as the lilypond file above. You can download both in an archive here.

Step 1.2: Typesetting the lilypond file

Now - assuming that lilypond is in the command search path of your system - we can let lilypond typeset the file into bitmap files and generate a midi file. The command is

  lilypond --png demo.ly

This should produce four new files: demo.ps, demo-page1.png, demo-page2.png and demo.midi. The png-files are the key frames of our video, the midi file will later on be converted to some form suitable for a video (e.g. an MP3-file). The PostScript file is the basis for generating the PNG files and is not needed further on.

Let's have a look at one of the frames (figure 2). It shows the last two measures of the demo song.

Fig. 2: Measures 3 and 4 of the demo song as PNG file

Step 1.3: Generating the audio file

The next step is to generate some WAV file with the audio of the demo song. We will use the timidity player for that:

  timidity demo.midi -Ow -o demo.wav

The result is the WAV-file demo.wav which will be later processed by some audio codec (like MP3lame).

Step 1.4: Setting up and generating the AviSynth script

Now comes the most complicated part of the whole video generation: We have to generate a script for the AviSynth frame server from a configuration file and an AviSynth include file.

The configuration file has to set several variables: the frame rate, the pattern for the PNG-files, the name of the WAV file, the number of pictures and the list of picture durations. Let us look at the variables in detail:

picturePattern: a pattern for the PNG file names where the running count is encoded as %d; in our example the pattern is "demo-page%d.png"
wavFileName: the path name of the wav file generated above; in our case it is "demo.wav"
pictureCount: the count of pictures in the video; we only have 2 in the demo
timeList: a comma-separated list of durations [in seconds] how long each picture is shown; the first entry is the time the first picture is shown and so on; note that decimal points are used for introducing fractions of a second

Several of those parameters are easy to define, once the lilypond generation has been done. A bit tricky is the time list: how do we get at the times each picture must be shown?

Fortunately it is simple arithmetics, because we have been defining the tempo of the track in the lilypond file.

Assume for example that we have three pictures showing 2 measures, 1 measure and 3 measures, a tempo of 90 quarters per minute and a 3/4 time signature. Picture 1 has to be shown for

2 measure × 3 quarter/measure × 1/90 min/quarter × 60 s/min = 4s

By the same logic the times for the other picture pages would be 2s and 6s and the resulting time list is "4,2,6".

In our demo example both pictures show two measures and with a tempo of 60 quarters per minute and a 4/4 time signature this leads to a time list of "8,8".

Musicians typically prefer that notes are visible a little bit before they are played, so we have to shorten the first picture duration by some amount and add this to the duration of the last. Let's assume 0.4s is fine which leads to a time list of "7.6,8.4".

Unfortunately timidity also adds a long note decay to the end of the audio file depending on the last notes and instruments. In our case it is about two seconds; so we either have to add another 2 seconds to the last entry in the time list or accept that no score is shown for that period. For simplicity we'll do the latter.

The complete configuration file demo.cfg for AviSynth looks like that:

  picturePattern = "demo-page%d.png"
  wavFileName = "demo.wav"
  pictureCount = 2
  timeList = "7.6,8.4"

All the above configuration information has to be concatenated with an AviSynth script fragment @framePicImport.avsinc from the download archive to form a complete AviSynth script. The fragment defines some helper routines for generating the video stream from the pictures and the timing parameters and combines it with the audio stream.

I shall only show the core part of the generic AviSynth fragment without going into the details how the frames get duplicated appropriately. The core of the fragment looks like that:

# -- device specific settings --
videoFrameRate = 25 #fps
deviceWidth  = 640 # pixels
deviceHeight = 480 # pixels

# -- video creation and adaption --
imageSource(picturePattern, 1, pictureCount, videoFrameRate, \
            pixel_type="rgb32")
lanczosResize(deviceWidth, deviceHeight)
Util_expandPictureVideo(timeList)
video = convertToYV12

# -- audio import --
audio = wavSource(wavFileName)

# -- combining the streams --
RETURN audioDub(video, audio)

The first part sets device-specific parameters like framerate, width and heigth of the device in pixels. Then the video is created by importing the pictures - one frame per picture - with the desired frame rate and resized to the resolution of the output device. By a utility function each frame is replicated an appropriate number of times to have it displayed according to the time list given. Finally the audio is imported and video and audio streams are combined.

The concatenation is done depending on your operating system. In Windows it is

  copy /B demo.cfg+@framePicImport.avsinc demo.avs

in Unix it is

  cat demo.cfg @framePicImport.avsinc >demo.avs

Hooray!! When looking at the video with some player capable of showing AviSynth videos (like e.g. the Windows Media Player) it looks like the result we intended to have!

But we're not done yet: we have to encode the video for the target device (using a standard tool chain)...

Phase 2: Encoding the AVI video for the iPod

In the following section the conversion process for MPEG4 is described (which applies to many video-capable players). When your player does not support MPEG4 videos, you have to adapt the process accordingly.

Step 2.1: Encoding the video file

Encoding the video is done via a single pass with the MPEG4-encoder x264. Any multipass encoding is useless because the required video bitrate will be quite low: score pages do not change very much...

Here is the command for encoding the video in H.264-ASP (everything in a single line, please!):

  x264 --bitrate 700 --level 3 --nf --no-cabac --subme 6 --analyse none
       --qpmin 16 --vbv-maxrate 1000 --me umh --merange 12 --thread-input
       --progress --no-psnr --output demo-(VIDEO).mp4 demo.avs

The resulting video file demo-(VIDEO).mp4 can already be checked, but of course it is silent.

Step 2.2: Encoding the audio file

Encoding the audio is done via the AAC-encoder faac.

Here is the command for encoding the audio:

  faac -b 128 -P -X -R 44100 -B 16 -C 2 --mpeg-vers 4 -o demo.aac demo.wav

The resulting audio file demo.aac can already be checked. It should sound similar to the WAV-file, but if you have very good ears you can hear the difference...

Step 2.3: Combining the audio and video stream

Both streams must be combined into a single MPEG4-file by the multiplexing program nicmp4box as follows:

  nicmp4box -add demo.aac -add demo-(VIDEO).mp4 -new demo.mp4

The resulting file demo.mp4 here in an archive can be transferred to the iPod and it should work.

We're done!!!!

Automating the process

Of course, typing the above commands manually is nonsense. In the following I have written down a Windows script for automating the process. It assumes that the path of the lilypond file directory is given as the first parameter, the file name without path and the extension .ly is given as the second parameter. A nonempty third parameter flags that the temporary files should be kept (for inspection).

The configuration file with extension .cfg has to be in the same directory as the lilypond file. As the patterns and wav file names are always the same the corresponding configuration settings are also generated by the script and need not occur in the configuration file.

  REM ## music notation video generator from lilypond file %1\%2.ly
  REM ## and its configuration file %1\%2.cfg; resulting video is in
  REM ## %1\%2.mp4; a non-empty %3 parameter flags that temporary files
  REM ## should be kept

  SET avsFragmentFile=C:\video_files\@framePicImport.avsinc
  SET keepTemporaryFiles=%3

  PUSHD %1
  REM -- preparation of a functional AviSynth script --
  lilypond --png %2.ly
  timidity %2.midi -Ow -o %2.wav
  ECHO picturePattern = "%2-page%%d.png"  >%2.tmp
  ECHO wavFileName = "%2.wav"  >>%2.tmp
  COPY /b %2.tmp+%2.cfg+%avsFragmentFile% %2.avs >NUL

  REM -- encoding for iPod --
  SET x264options=--bitrate 700 --level 3 --nf --no-cabac --subme 6
  SET x264options=%x264options% --analyse none --qpmin 16 --vbv-maxrate 1000
  SET x264options=%x264options% --me umh --merange 12 --thread-input
  SET x264options=%x264options% --progress --no-psnr

  x264 %x264options% --output %2-(VIDEO).mp4 %2.avs
  faac -b 128 -P -X -R 44100 -B 16 -C 2 --mpeg-vers 4 -o %2.aac %2.wav
  nicmp4box -add %2.aac -add %2-(VIDEO).mp4 -new %2.mp4

  REM -- cleanup --
  IF NOT %keepTemporaryFiles%x==x GOTO :ENDIF
    DEL %2.tmp
    DEL %2.ps
    DEL %2-*.png
    DEL %2.midi
    DEL %2.wav
    DEL %2.avs
    DEL %2.aac
    DEL %2-(VIDEO).mp4
  :ENDIF

  POPD

You have to adapt the path of the AviSynth fragment file according to your setup. When you store that BAT file (e.g. under the name makeNotationVideo.bat, you can do a generation for the lilypond file in C:\documents\test\fred.ly by the command:

  makeNotationVideo C:\documents\test fred

Variations

By modifying the tool chain several variations are possible: One can

take an existing audio file (a MIDI or an MP3) and add a score with a manually-tuned time list as the video stream,
even use a score with lyrics to produce a karaoke file,
take a music video, strip the video stream and overlay the video with the appropriately timed score, or
...?

So there are many possibilities how to expand the basic idea shown here in your direction.

Summary

The article has demonstrated how to make music notation videos for portable video players with only a little effort. The necessary tools are all open-source, nevertheless the important step to master is the textual music notation language lilypond.

All other processing steps are easy to be done and the resulting notation video shows high quality score pages only confined by the dimensions and the resolution of the target device.

Download

You can download an archive with this page (as the manual), the demo and Bach639 lilypond and configuration files, the lilypond include, the AviSynth fragment file and the Windows batch file for your own use.

The files and the method described are put into the public domain, but both are unsupported. If you want to comment, you can contact me by electronic mail (see below), but I cannot promise an immediate answer or even any answer at all.

Acknowledgements

Thanks to Rüdiger Murach for posing that challenge to me in a coffee-break discussion at work and to my partner Ulrike Gröttrup for her patience when always showing her another new boring iPod notation video...

This page hosted by Get your own Free Home Page.