FXE Lipsyncer tool
[continued from Voice over questions... c]
original code by 0100010
original NwVault release:
I’ve adopted 0100010’s codebase and created a repository at Github:
total: 2m22s, lipsyncs start at 30s
what is a lipsyncer? what is an .FXE file? what the heck is going on?
tldr; i don’t know really.
That is, i understand the application and what it’s doing; it takes a wavefile of the spoken voice plus text of what is said and breaks it down into distinct sound-types (phonemes) that are used to generate a .FXE file. The FXE file is read by the NwN2 engine to play pseudo-realistic mouth movements on the creature that speaks a voiceover.
why is text important? why can’t the stupid computer just figure it out from the wavefile?
Because… the spoken voice has an infinity of nuances. Intonation, pronunciation, also the recording itself affect how well a SpeechRecognitionEngine can interpret what is said. So the Lipsyncer makes two passes: The first pass tries to get a Recognition from a dictation lexicon, then a second pass will try to get a Recognition based only on any provided text. The results of both passes shall be displayed and you can pick the best one.
what is a SpeechRecognitionEngine?
Ah, there’s the rub. The Lipsyncer is NOT a SpeechRecognitionEngine. SpeechRecognitionEngines ship with Windows and which ones you get depend on what language of Windows you’re using. You may or may not be able to download additional SpeechRecognitionEngines for your OS.
what? why so many SpeechRecognitionEngines?
Because each one is designed to be used in a specific language or set of languages. There may be universal SpeechRecognitionEngines but I haven’t seen any in my limited investigations yet.
To find what SpeechRecognitionEngines are currently installed on your computer, goto ControlPanel|SpeechRecognition|AdvancedSpeechOptions|SpeechRecognition|Language. The dropdown lists your SpeechRecognizers with their languages.
At present, the Lipsyncer works only with engines listed as “Microsoft Speech Recognizer”. You don’t have to change anything in the dropdown as far as the Lipsyncer is concerned; the Lipsyncer has its own dropdown list that works independently – that is, changing the SpeechRecognizer in the Lipsyncer’s dropdown does NOT affect that of your operating system or vice versa.
However. I have a recognizer for EnglishUS and another for EnglishGB on my system. The US version works fine but I get garbled results with the GB version. Go figure, because at this point I can’t even assume that an EnglishUS version on say a French Windows OS is the same EnglishUS version on a US Windows OS. It’s spaghetti, especially when one tries to account for the progression of SpeechRecognition software over the years.
technical: There are several .NET namespaces that can be used. “System.Speech”, “Microsoft.Speech”, and the one that 0100010 chose for the FxeGenerator, “Interop.SpeechLib”. There also appear to be an increasing number of independently produced (non-Microsoft) APIs available.
They each appear to have their specialties and idiosyncracies. The priority seems geared toward telling your computer what to do. That’s not what we want here.
“Interop.SpeechLib” interfaces quite well through SAPI 5.4 with the EnglishUS SpeechRecognizer that’s on my machine. But a newer platform might be better for the future … because the availability of SpeechRecognizers for various languages is limited at present. Your version of Windows, in your language, might not even have one.
I don’t have a release of 0100010’s FXE Generator yet. But if you have custom voiceovers and want to demo a debug version of the rewritten and upgraded FXE Lipsyncer just send me a PM,
Windows w/ .NET 3.5 req’d