Generating Music Notation Videos (for the iPod) |
Dr. Thomas Tensi Paganinistraße 60, D-81247 München |
When transferring some music to my mobile MP3-player for practising, I wondered whether it would be possible to have the musical notation available at the same time. Since the modern MP3-players support videos, I had the idea to produce a video displaying the musical score while playing the music.
The whole method is based on the notation program lilypond which can produce single score pages as bitmaps and a MIDI file at the same time.
Of course, the whole thing is not the same as a MIDI sequencer where you can dynamically select the voices to be shown or vary the tempo for practice. Nevertheless it was okay for me, because you can, for example, have all the other voices play in the video while your part of the score is shown on the display. And besides I have not yet met a genuine MIDI sequencer for my video iPod...
All this video production for any MPEG4-capable portable player can be completely done with open-source software and the resulting videos are only slightly larger than an MP3 audio file without score (at least with an MPEG-4 codec). But a word of caution: when doing this you may not be afraid of command-line programs, because we shall use some of them. I shall show the commands needed one at a time to illustrate the process, but it is highly recommended to put them into a shell file in Unix or a bat-file in Windows (an example BAT-file is also in the download archive).
As a little motivation for reading on figure 1 shows one frame of Bach's chorale BWV639 on the iPod.
Fig. 1: iPod video frame showing measure 6 from BWV639
The full video can be seen here.
For the description of the process I assume that we are starting from scratch, i.e., no music exists so far and no score representation whatsoever. The process may be varied accordingly, e.g. when you wanted to generate a score for a music file already existing you would skip any MIDI file generation and encoding.
In the assumed context the process for getting a video in AVI format is as follows:
Once you have an AVI video, it can be encoded specifically for your target MP3 video player. In phase two below I shall describe the process for the video iPod.
So for the plain generation of the music notation video you need:
For the iPod video encoding you need either MeGUI (which contains all the necessary audio and video codecs) or roll on your own with
Once you have this all installed and have put the binary directories in your system's search path, we are ready to go. In this tutorial I can only show a toy example for space reasons, but of course the idea works for bigger files, too. The download archive contains the bach example.
The first step is to write a lilypond text file, which contains the notes of the score. You can also generate that file from a MIDI file via tools included in the lilypond distribution, but let's assume we have to write it from scratch.
Lilypond is a very technical text file format for scores and I shall not go into details, but the main idea is that notes are described by their names (like "c") together with durations forming sequences. Those sequences can be given symbolic names and combined into systems within a score.
The score for a simple c-major scale up and down in quarter notes and a trailing half note will look like this:
myMajorScale = \relative c' { c4 d e f | g a b c | b a g f | e d c2 | }
The digits behind the note pitches are the durations: "4" means a quarter, "2" means a half note and the notes without duration have the same duration as the note before.
To make the file complete we add a reference to an include file setting up the output size, some command to define the font-size and the score section which results in:
\version "2.10.0" \include "iPodNotationVideo.ly" #(set-global-staff-size 20) myMajorScale = \relative c' { c4 d e f | g a b c | b a g f | e d c2 | } \score{ << \new Staff { \clef "treble" \key c \major \tempo 4 = 60 \ \myMajorScale } >> \layout{} \midi{} }
As you can see there is an \include-directive for a file setting the output page properties which are specific for the desired output device. In our case it defines the page size to be 64mm×48mm (which is fine for the iPod), adjusts some space dimensions on the page and suppresses all headers. Make sure that this file called ipodNotationVideo.ly is in the same directory as the lilypond file above. You can download both in an archive here.
Now - assuming that lilypond is in the command search path of your system - we can let lilypond typeset the file into bitmap files and generate a midi file. The command is
lilypond --png demo.ly
This should produce four new files: demo.ps, demo-page1.png, demo-page2.png and demo.midi. The png-files are the key frames of our video, the midi file will later on be converted to some form suitable for a video (e.g. an MP3-file). The PostScript file is the basis for generating the PNG files and is not needed further on.
Let's have a look at one of the frames (figure 2). It shows the last two measures of the demo song.
Fig. 2: Measures 3 and 4 of the demo song as PNG file
The next step is to generate some WAV file with the audio of the demo song. We will use the timidity player for that:
timidity demo.midi -Ow -o demo.wav
The result is the WAV-file demo.wav which will be later processed by some audio codec (like MP3lame).
Now comes the most complicated part of the whole video generation: We have to generate a script for the AviSynth frame server from a configuration file and an AviSynth include file.
The configuration file has to set several variables: the frame rate, the pattern for the PNG-files, the name of the WAV file, the number of pictures and the list of picture durations. Let us look at the variables in detail:
Several of those parameters are easy to define, once the lilypond generation has been done. A bit tricky is the time list: how do we get at the times each picture must be shown?
Fortunately it is simple arithmetics, because we have been defining the tempo of the track in the lilypond file.
Assume for example that we have three pictures showing 2 measures, 1 measure and 3 measures, a tempo of 90 quarters per minute and a 3/4 time signature. Picture 1 has to be shown for
In our demo example both pictures show two measures and with a tempo of 60 quarters per minute and a 4/4 time signature this leads to a time list of "8,8".
Musicians typically prefer that notes are visible a little bit before they are played, so we have to shorten the first picture duration by some amount and add this to the duration of the last. Let's assume 0.4s is fine which leads to a time list of "7.6,8.4".
Unfortunately timidity also adds a long note decay to the end of the audio file depending on the last notes and instruments. In our case it is about two seconds; so we either have to add another 2 seconds to the last entry in the time list or accept that no score is shown for that period. For simplicity we'll do the latter.
The complete configuration file demo.cfg for AviSynth looks like that:
picturePattern = "demo-page%d.png" wavFileName = "demo.wav" pictureCount = 2 timeList = "7.6,8.4"
All the above configuration information has to be concatenated with an AviSynth script fragment @framePicImport.avsinc from the download archive to form a complete AviSynth script. The fragment defines some helper routines for generating the video stream from the pictures and the timing parameters and combines it with the audio stream.
I shall only show the core part of the generic AviSynth fragment without going into the details how the frames get duplicated appropriately. The core of the fragment looks like that:
# -- device specific settings -- videoFrameRate = 25 #fps deviceWidth = 640 # pixels deviceHeight = 480 # pixels # -- video creation and adaption -- imageSource(picturePattern, 1, pictureCount, videoFrameRate, \ pixel_type="rgb32") lanczosResize(deviceWidth, deviceHeight) Util_expandPictureVideo(timeList) video = convertToYV12 # -- audio import -- audio = wavSource(wavFileName) # -- combining the streams -- RETURN audioDub(video, audio)
The first part sets device-specific parameters like framerate, width and heigth of the device in pixels. Then the video is created by importing the pictures - one frame per picture - with the desired frame rate and resized to the resolution of the output device. By a utility function each frame is replicated an appropriate number of times to have it displayed according to the time list given. Finally the audio is imported and video and audio streams are combined.
The concatenation is done depending on your operating system. In Windows it is
copy /B demo.cfg+@framePicImport.avsinc demo.avsin Unix it is
cat demo.cfg @framePicImport.avsinc >demo.avs
Hooray!! When looking at the video with some player capable of showing AviSynth videos (like e.g. the Windows Media Player) it looks like the result we intended to have!
But we're not done yet: we have to encode the video for the target device (using a standard tool chain)...
In the following section the conversion process for MPEG4 is described (which applies to many video-capable players). When your player does not support MPEG4 videos, you have to adapt the process accordingly.
Encoding the video is done via a single pass with the MPEG4-encoder x264. Any multipass encoding is useless because the required video bitrate will be quite low: score pages do not change very much...
Here is the command for encoding the video in H.264-ASP (everything in a single line, please!):
x264 --bitrate 700 --level 3 --nf --no-cabac --subme 6 --analyse none --qpmin 16 --vbv-maxrate 1000 --me umh --merange 12 --thread-input --progress --no-psnr --output demo-(VIDEO).mp4 demo.avs
The resulting video file demo-(VIDEO).mp4 can already be checked, but of course it is silent.
Encoding the audio is done via the AAC-encoder faac.
Here is the command for encoding the audio:
faac -b 128 -P -X -R 44100 -B 16 -C 2 --mpeg-vers 4 -o demo.aac demo.wav
The resulting audio file demo.aac can already be checked. It should sound similar to the WAV-file, but if you have very good ears you can hear the difference...
Both streams must be combined into a single MPEG4-file by the multiplexing program nicmp4box as follows:
nicmp4box -add demo.aac -add demo-(VIDEO).mp4 -new demo.mp4
The resulting file demo.mp4 here in an archive can be transferred to the iPod and it should work.
We're done!!!!
Of course, typing the above commands manually is nonsense. In the following I have written down a Windows script for automating the process. It assumes that the path of the lilypond file directory is given as the first parameter, the file name without path and the extension .ly is given as the second parameter. A nonempty third parameter flags that the temporary files should be kept (for inspection).
The configuration file with extension .cfg has to be in the same directory as the lilypond file. As the patterns and wav file names are always the same the corresponding configuration settings are also generated by the script and need not occur in the configuration file.
REM ## music notation video generator from lilypond file %1\%2.ly REM ## and its configuration file %1\%2.cfg; resulting video is in REM ## %1\%2.mp4; a non-empty %3 parameter flags that temporary files REM ## should be kept SET avsFragmentFile=C:\video_files\@framePicImport.avsinc SET keepTemporaryFiles=%3 PUSHD %1 REM -- preparation of a functional AviSynth script -- lilypond --png %2.ly timidity %2.midi -Ow -o %2.wav ECHO picturePattern = "%2-page%%d.png" >%2.tmp ECHO wavFileName = "%2.wav" >>%2.tmp COPY /b %2.tmp+%2.cfg+%avsFragmentFile% %2.avs >NUL REM -- encoding for iPod -- SET x264options=--bitrate 700 --level 3 --nf --no-cabac --subme 6 SET x264options=%x264options% --analyse none --qpmin 16 --vbv-maxrate 1000 SET x264options=%x264options% --me umh --merange 12 --thread-input SET x264options=%x264options% --progress --no-psnr x264 %x264options% --output %2-(VIDEO).mp4 %2.avs faac -b 128 -P -X -R 44100 -B 16 -C 2 --mpeg-vers 4 -o %2.aac %2.wav nicmp4box -add %2.aac -add %2-(VIDEO).mp4 -new %2.mp4 REM -- cleanup -- IF NOT %keepTemporaryFiles%x==x GOTO :ENDIF DEL %2.tmp DEL %2.ps DEL %2-*.png DEL %2.midi DEL %2.wav DEL %2.avs DEL %2.aac DEL %2-(VIDEO).mp4 :ENDIF POPD
You have to adapt the path of the AviSynth fragment file according to your setup. When you store that BAT file (e.g. under the name makeNotationVideo.bat, you can do a generation for the lilypond file in C:\documents\test\fred.ly by the command:
makeNotationVideo C:\documents\test fred
By modifying the tool chain several variations are possible: One can
So there are many possibilities how to expand the basic idea shown here in your direction.
The article has demonstrated how to make music notation videos for portable video players with only a little effort. The necessary tools are all open-source, nevertheless the important step to master is the textual music notation language lilypond.
All other processing steps are easy to be done and the resulting notation video shows high quality score pages only confined by the dimensions and the resolution of the target device.
You can download an archive with this page (as the manual), the demo and Bach639 lilypond and configuration files, the lilypond include, the AviSynth fragment file and the Windows batch file for your own use.
The files and the method described are put into the public domain, but both are unsupported. If you want to comment, you can contact me by electronic mail (see below), but I cannot promise an immediate answer or even any answer at all.
Thanks to Rüdiger Murach for posing that challenge to me in a coffee-break discussion at work and to my partner Ulrike Gröttrup for her patience when always showing her another new boring iPod notation video...