In fact, it’s so hard that I’m going to expend a considerable number of words in this post on the topic of how to not film speech – by which I mean having speech in your machinima spoken by a character, but not actually filming them saying it, and by which I mean more specifically not filming their lips moving.
There are two reasons for taking this approach. The first is that speech in SL, frankly, doesn’t look great. Sure, many Bento-enabled mesh heads these days come with speech animations which, in all fairness, do look way better than the moving lip animations we used to have on the old system heads. That’s not the issue. The issue is that these movements are formed by routines which have no connection whatsoever to the actual words being spoken by the person controlling the avatar that owns the lips. The best you’re likely to achieve is something that looks like an extremely poorly dubbed foreign language film. At worst, it looks just plain weird.
To some extent, animators do get a certain amount of leeway here. There are close on a hundred different muscles involved in speaking, whereas most facial expressions use far fewer than this. Accordingly, you are more likely to see facial expression done convincingly in an animation than you are speaking.
But, whilst animated lip movements are often a very rough approximation of the real thing, they are at least done in a way that avoids creating incongruity. When a cartoon character is articulating a long vowel sound, for example, you can generally expect its mouth to be open rather than shut. There’s a lot you can get away with in movie-making without breaking the suspension of disbelief required of your viewer, but when you let an incongruity like mis-matched speech creep in, you’re starting to play it fast and loose with that vital ingredient.
The second reason is – as I discovered when making STÖMOL – not all mesh heads do have moving lips. Yes, I had quite a chunk of footage ‘in the can’ before it came to the business of filming my main character talking and discovering that lip movement was not something his head actually did, even though it was Bento-enabled. And there was just no way I was going to go back and refilm all those scenes with a different head – quite apart from anything else, many of the regions I had filmed in were by this time closed.
Filming speech obliquely
So my decision in STÖMOL was to film moving mouths as infrequently as possible. I wanted my viewers to be attending to what was being said, not distracted by the oddness of the lip movements (or complete lack of them). This was achieved (and I’ll leave it for you to decide whether or not it was achieved well) though a number of different approaches:
Voiceover. Whilst not exactly a filming of speech, it’s worth mentioning voiceover as a valid way of conveying verbally delivered information that sidesteps the need to see a person speaking it. Some people really dislike voiceover – famously, Blade Runner (a film that a few people have – correctly – identified as an influence on STÖMOL) had a voiceover track added when test audiences complained that they didn’t understand what was going on; this was then removed in the subsequent editions of the movie because fans hated it so much. I’m not particularly fond of it myself as a technique, but considered it a necessary evil for my movie, given all the other restrictions I was working with. It all comes down to how you use it, of course: put too much information into a voiceover and it starts to feel forced and artificial. The approach I took was to try to be as minimalist as possible with voiceover monologue, giving just enough information to set context.
Obscuring speakers’ mouths through composition. It’s actually more common than you might think for speaking characters in a movie to be offscreen or viewed from behind. The important thing to keep in mind when you use this approach is to leave enough clues for the viewer as to who is doing the speaking in each scene. In STÖMOL, all the speaking scenes involved Stömol himself and one other person: in each scene, it was always the other person who spoke first, and they would be depicted in a way that left the viewer in no doubt that it was that character who was speaking in that moment. Any voice that then followed this voice had, therefore, to be Stömol, so it didn’t matter that the viewer couldn’t see his face when he was speaking: it was just intuitive that it was him doing the talking at that point. Of course, once the viewer was familiar with his voice, this requirement became less important.
Obscuring speakers’ mouths with props. For a couple of the other characters in STÖMOL I covered their faces with masks, an aesthetic which works well in a science-fiction context (and, in this case, both characters were vigilantes of a sort, wanting to conceal their everyday identity), but perhaps not so well outside of that (at least for any story not set in 2020…). These characters wearing masks was also an opportunity to deliberately distort their voices electronically, which added to the soundscape of the movie and also worked as a useful way of masking audio issues (see more about that below).
Filming speech directly
But, for some characters, there really was no way of getting out of filming direct speech. And I did want to have at least some of this in the movie, because not having any would, in and of itself, have looked a little odd. I spent a lot of time thinking about how to do this effectively and you can judge the end results for yourself. It’s not perfect, by any stretch of the imagination, but I think it just about works.
The first thing you must decide on is whether you want to record characters speaking at the same time as you are filming them – ie, to have them perform their lines live. To do this you will need to make sure that the option for lips to move when someone speaks in voice is enabled (I think it is by default in Firestorm), as shown below. If you are using a mesh head it will need to be Bento-enabled in order for the lips to move (otherwise you will need to resort to a system head).
The small advantage to doing it this way is that the animated lip movements you then get will at least approximate the length of the utterance you want your actor to say: when they start speaking their lips will start to move and when they stop speaking their lips will stop. There are a few disadvantages to this, however. First, the audio quality you then get of them speaking will be limited to the quality offered up by a combination of the quality of the mic your actor has and Second Life’s voice processing routines and any capacity issues such as lag and your internet connection speed. Additionally, SL alters voice volume and stereo placement according to where the voice is coming from in the spatial environment, yet another factor to take into account. Second, you are then making things harder for your actor(s), who might be easily able to turn up for a shoot at a given time, but not so easily able to voice act at that particular moment: they might have lots of noise in their background, for example other family members talking or watching TV, or they might just be plain embarrassed to act ‘live’ and prefer some privacy to record their lines by themselves and get them just right before sending the audio files to you. Many of the cast members of STÖMOL had never voice acted before, so giving them the flexibility to record their lines in their own time made it much more comfortable for them and also made it likelier I would receive well-acted lines.
Additionally, it’s not actually the case that all animated heads will stop lip movement when you stop talking. I recently discovered that my new LeLutka head’s speaking animation is a looped animation lasting for about two to three seconds – so if I was to just say the single word ‘no’ into my microphone then the head’s lips will continue moving all the way to the end of this animation.
A final issue with this approach is that it requires you to have your script – or, at least, the script for the scene you are filming in that moment – worked out in advance. If you’ve watched my recorded interviews on the making of STÖMOL, you’ll know that this is not the approach I take. I pretty much make up the plot of all my stories as I go along – whether they are filmed or written – but there are also pragmatic reasons for this approach when it comes to filming in SL: Second Life is not a fixed place and regions come and go all the time; unless you plan on taking complete control of all your sets (ie, by building them yourself) you’re going to have to be flexible around this, and this could well entail making changes to your plot to reflect new locations you hadn’t planned on filming in (and the absence of locations you had). Of course, when I was recording a scene with direct speech in it, I did have a rough idea of what sort of thing might be said in it – I just didn’t want to commit myself there and then to an exact wording.
So the approach I took instead was to film footage of direct speech without sound and then, later on, write dialogue to fit my edit. This was quite a painstaking process, which involved watching my clips over and over to get a feel for the length of utterance depicted and what might fit it. Once I had an idea of a wording, I would then re-watch the clips again, speaking the lines myself over them to check that they fit. When I had it good enough, I then sent the script to my actors and asked them to record their lines and send these audio files to me. Of course, I couldn’t be certain that my actors would perform their lines at the same speed as me, so it was only ever going to be an approximation. It’s occurred to me since that one approach to getting this even more exact might be for me to record myself speaking the lines at the speed that fit the clip and then send this to my actors for them to ‘match.’ I’m not completely sold on this idea, though, because it might add stress for my actors, who I would prefer to be relaxed and focused on the acting of their lines rather than worrying about things like speed.
Although I wasn’t asking actors to say lines live, I still needed their avatars’ lips to move. So I asked them to type random things into their local chat box, since typing there also creates lip movement. You don’t have to actually enter these comments into chat, by the way (ie, hit the enter key): it’s the act of typing that creates lip movement. I filmed a number of shots this way, sometimes asking actors to type a string of letters and sometimes asking them to just type one or two, so that I had both long and short utterances to play with later.
Recording voice audio
A key disadvantage of this approach I took to filming direct speech is that it places responsibility for recording audio into the hands of the actors – and every actor will be different in terms of the hardware and software they have access to for creating audio recordings (and their level of skill in using it). If you do choose to take this approach then this is a limitation that you’re just going to have to work with and you’ll probably need to give support to some of your actors in terms of advising them on various technical issues.
As I said in part one of this series, I use Audacity for sound. Audacity can also record sound direct from a mic plugged into a PC and export these files out as MP3 or WAV files (it’s usually better to save as WAV files rather than mess about with MP3 bit rates – and choose 16-bit WAV files rather than 24 or 32-bit, since these are more widely compatible with other software). I’d be lying if I said that Audacity is simple, however it is free – so your actors won’t have to buy anything – and functions such as recording from mic aren’t too complicated if you follow a how-to video (of which there are plenty for Audacity). Also, it comes in versions for both Windows and Mac.
You could also ask your actors to record their lines on their phones. We think of phone voice quality as being pretty poor, but that’s because it gets shoved through various algorithms and reduced in bandwidth if it’s to be transmitted to another phone. If you’re just using a recording app to save recordings straight onto the local storage on your device then your phone mic might be capable of a much better recording than you might have thought (although, we’re not talking studio quality here). It’s also – obviously – highly portable, which has a range of advantages (see more below), and it’s relatively easy to send audio files from a phone, since most recording apps will have the standard ‘share’ function, allowing users to upload recordings to all the usual places, including email. Of course, phones vary in the quality of their mic and apps vary in the quality of their recording. Experiment. And make sure you don’t handle the phone whilst you’re recording, or hold it in a steady grip and keep your fingers still – any impacts on the phone case, however slight, will sound like thunder on the recording.
If you’re going to be voice acting yourself in your machinima, you might want to consider buying a decent quality microphone for your own lines. After a bit of research I bought myself a H2n Zoom Handy Recorder at a cost of about $150. This great little device can act completely independently from your computer, recording files directly onto an SD card. I got this primarily so that I wasn’t restricted to recording only at my PC, but I later discovered an additional benefit: it turned out that some part of my PC system – I suspect the onboard sound processing – was doing something to the quality of sound recorded on anything plugged into the mic, resulting in a thin and slightly sibilant recording. So a recording saved onto the mic’s SD card was a lot better quality than a recording made under identical circumstances but with the mic plugged into my PC.
If you’re going to use your mic for voiceover, you will also want to invest in a pop filter (which costs around $10), since ‘plosive’ letters like P and B produce little puffs of air that hit the mic when you’re speaking really close to it – and for voiceover you want to be sitting quite close to your mic.
Sound recording is a highly complex issue well beyond the scope of this article, but I’ve listed below a few important issues you need to be aware of, as well as a few workarounds. You might also like to pass some of these on to your actors when you ask them to record their lines.
Minimise all background noise as much as possible. Take a moment to stop and listen to the sounds around you. You might think you’re reading this in silence right now, but when you really pay attention to sound around you you’ll tune in to all sorts of noises that you’ve unconsciously filtered out, but which your microphone will likely pick up. Right now, for example, I can hear the faint high pitch of a battery charger nearby, the hum of a fridge next to my desk and also the hum of a freezer in the kitchen. All of this can be picked up. Whilst you can use filters in software like Audacity to remove constant background noises, these aren’t perfect and every time you run them you are also slightly degrading the quality of your recording (and, given the likelihood that your recording has been created on a low quality device in the first place, you really don’t want to reduce its quality any more than is absolutely necessary). So it’s better if you reduce these sounds as much as possible before you hit the record button. Switch off as many appliances as you can, or move to a room where there are fewer sources of sound.
Make outside recordings at night. Similarly, there are all sorts of sounds that will be picked up if you want to record your lines outdoors, such as traffic in the distance, people talking and overhead aircraft. I live in quite a rural area and thought I would get away with recording in a nearby park, but I forgot to think about birdsong! Recording at night time will likely provide you with a much quieter background, though it needs to be a still night since even the smallest of breezes will also be picked up. Ultimately, you’re just going to have to be patient and wait for the right conditions to come along.
Avoid recording on built-in laptop microphones. These are often low quality and return a very poor recording, in which the lower frequencies are weak and background noises amplified. If there really is no other option but to do this, keep the laptop unplugged and running on battery when you record, since laptop inbuilt mics can sometimes pick up an internal buzzing sound created by mains electricity.
Echo is your enemy. If you’re recording indoors then you’re going to have walls around you. Walls reflect sound. You might not think your room has much echo to it, but the moment you listen back to your recording you will notice that it sounds like it’s been recorded in a room, and that’s because of echo. If the character in your machinima is depicted as being in a room of similar size to the one you’re in, then you might just get away with it; if they’re depicted as being outdoors, however, then it’s going to sound immediately incongruous. This is the main reason, incidentally, why you might sometimes want to create recordings outside.
Whilst there are tools on audio editors that you can use to remove or reduce constant background noise (such as hiss or the hum of air conditioning), there is pretty much nothing you can do about echo, so your only option is to try to reduce it at the point of recording. One method is to record outside, another is to speak at a quieter volume and closer to your mic. There are limitations to this latter approach, however – the quieter you go the harder it will be to speak in a normal range of intonation (you won’t be able to get away, for example, with shouting). And if you get too close then your lines will sound more like voiceover than natural dialogue. Other tricks you can try are to speak facing something which will reflect sound less, for example sitting on the floor in front of a sofa or standing in front of some curtains or a book case – this still won’t be perfect (you’ll still get some echo from the ceiling and other walls), but it will help. It goes without saying that carpeted rooms are better.
But there is one trick for reducing echo that works surprising well, which you probably wouldn’t have thought of…
Consider recording lines in a car. Unlike rooms, car interiors have very few reflecting surfaces, and those that it does have (eg, your windscreen) are mostly at an angle so that sound reflected will hit an absorbing surface before it reaches your mic. Your car is the recording booth you never realised you owned.
Layer some background noise or music over spoken lines in editing to cover noise. Whilst this still won’t really work all that well for echo, adding some background noise befitting the environment shown in your scene will go a long way towards masking low levels of noise on your recording. This could include street sounds, background music, wind, rain, and so on.
Deliberately distort your sound. As I mentioned earlier, a couple of the characters in my movie wore masks as my cunning way of avoiding having to film their lips move. This was also an opportunity to distort their voices as a result of wearing a mask. For the character of Adevarul I used an online filter I found, but for Waarheid I did the distortion myself in Audacity, using the in-built echo function to introduce a tiny echo that sounded like an electronic filter. When voices are distorted like this, the viewer pays less attention to issues with sound quality, since these sound like they are the intentional result of the device used within the movie to distort the voice, rather than the result of a low quality recording.
Another distortion I used in a couple of places in STÖMOL was reverb. Whenever speech took place in a large space, I added plenty of reverb to give an echo effect (though note that echo and reverb are technically two different things so far as sound editors like Audacity are concerned). This had the dual effect of making the audio environment in the movie sound more realistic and it also masked out some of the low-level background noise on my tracks.
That’s it for this part. I have one more part to this series planned, which will be something of a mixed bag of bits and pieces, including a couple of editing tips and some stuff about copyrighted materials (images and sounds). Until then, happy filming!