Note: this post is a brief summary of an academic article I recently wrote that got published on the International Journal of Qualitative Methods. If you want to find out more (or are looking for an academic text), then click here.
You've done a few interviews and now you need to transcribe them. Or maybe there's a podcast and you want a copy of its script. Or, even more simply, you want to turn your speech into words. And, above all, you want to do that in a privacy safe way without uploading your audio to the Internet, quickly, and for free. Can you even do that?!
Yes, you can! But before I show you how, let me explain why you need to think about privacy here. As you may have noticed, there is a pletora of websites that offer the possibility of transcribing audio or video files automatically and in just a few seconds. Some of them are free, others offer a pay-as-you-go plan, most ask to purchase a plan - they are businesses like any other, after all.
Some of these plans are cheap, others are very expensive. Besides their cost, however, these platforms don't perform transcription on the device but require you to upload your files over the Internet. This means your data (your audio or video files) need to leave your device and, in so doing, they become subject to these companies' privacy policies. Each company has its own policy, and some of them might be GDPR-compliant, but in general this means your files might be used to train AI models; accessed by third parties; stored in countries where privacy regulations are not as strict as the one you live in; or even reviewed by a human.
And what's worse, these privacy policies are often vague or written in formal, technical and complicated language, therefore making it really hard for a common person to understand what's going to happen to their files. Plus privacy policies are updated regularly, thus requiring constant monitoring and the capacity to understand each time the impact that any changes will have.
And here is where Whisper comes in. Launched by OpenAI (does ChatGPT ring a bell?) and trained on 680,000 hours of audio data, Whisper is an open-source speech recognition system that supports transcription in 99 languages. However, installing Whisper requires running some code and using it, too, is not easy. Since it's open-source though, a few programmes have been developed that use its models but make them more accessible: SpeechPulse (for Windows) and MacWhisper (for Mac).
Both programmes offer the possibility, through a one-time payment of $19.95 and €30 (plus taxes), respectively, to download all Whisper language models and transcribe multiple files at the same time. To use them, simply upload your file and let the programme generate an automated transcript for you. Below, for instance, is the transcript generated offline by MacWhisper in Large mode of the first 90 seconds of this interview, after my revisions. If you are transcribing live or don't want to upload your file, simply connect the audio input and output ports of your device with a male-to-male aux cable and play your audio/video files, or use virtual audio devices like VB-Cable in case you don't have any microphone ports (see here for instructions).
Privacy-wise, SpeechPulse ‘works fully offline’, whereas in the case of MacWhisper ‘[a]ll transcription is done on your device, no data leaves your machine’, in line with the product’s remarkably clear Privacy Policy (‘[w]e don’t want to know anything about you'). That said, one can never be safe enough and given that interviews will contain personal or sensitive information, my recommendation is to always transcribe offline on an ad hoc device where, after downloading all language packs needed and entering any activation key to unlock additional features, Internet has been permanently disabled, i.e. a standalone device that is used only for transcribing and other offline tasks.
This is where one of SpeechPulse’s and MacWhisper’s greatest comparative advantage lies: since they aren't subscription-based, once they're installed they no longer require an Internet connection to renew their license, therefore offering a lifetime solution for offline transcription. Since the device should only be used offline, there's of course the question of how any programme updates could be installed, given that new languages may be added in the future or that performance of current languages may be improved. However, this can easily be solved by downloading any new future language packs or programme versions – or even new programmes that currently don't exist – and transferring them to the offline device via an USB drive or SD card (see note 1).
Be aware that to run SpeechPulse and MacWhisper a device with certain technical requirements is needed. It may not always be possible to either buy or keep a new laptop only for transcription, but purchasing a second-hand one might do the trick (see note 2). Rather than a laptop, one could purchase a tablet instead, therefore reducing costs even further (see note 3). The device itself could also be re-sold after use, though one should ensure to have permanently deleted all data before passing this over to someone else (see note 4).
In conclusion, using programmes that rely on Whisper’s language models such as SpeechPulse and MacWhisper is by far at the moment the best way to transcribe audio or video files. With these programmes you can transcribe at any time and from any part of the world in various languages for free (or at a very little cost through a one-off payment) and above all have your data processed on the device. If not in possession already, buying an ad hoc device that is disconnected from the Internet will require a certain investment initially. However, there are ways to minimise costs, both in the present (e.g. purchasing a tablet or second-hand laptop, or a second-hand tablet) and in the future (e.g. re-selling the device when no longer needed). Gone are thus the days when ‘providing accurate transcriptions of long blocks of actual human conversation’ was deemed ‘beyond the abilities of even today’s most advance software’.
Note 1: an even further level of security should be added by encrypting all relevant files and installing a firewall software such as GlassWire or Portmaster (for Windows) or Little Snitch (for Mac) or any equivalent programme before airgapping the device and using it to block any potential connection attempts from the programme that one is using.
Note 2: if purchasing a second-hand device, make sure that this is reset to factory settings and that it is safe to use, consulting a professional if necessary.
Note 3: please note, due to lack of funding it was not possible for me to test either SpeechPulse or MacWhisper on a tablet.
Note 4: since data can be recovered even after a factory reset, it is important to perform this with the advice of a professional in order to ensure that data have been encrypted and deleted permanently.
Comentários