Maybe you could write how you managed to add multiple variations.
Other than that, I think the best way would be to proceed in a fully automated way and eventually add some configuration files for fine-tuning and fixing bugs.
For most parts this does not sound complicated at all, as most information and code is available already.
It just boils down to
1) Extract samples from Speech.xwb and convert them to mp3 (or better ogg because of mp3 license issues). => Already possible with unxwb + lame/oggenc
2) Extract strings from MONKEY1.00x using scummtr => Just invoke the tool
3) Process scummtr output and match it to the sample names in speech.info => Nearly done, needs minor tweaks
4) Merge samples in monster.so? and output processed strings => Format is quite trivial so not a big task
5) Write strings back to MONKEY1.00x with scummtr => Just invoke the tool