26 June, 2007

Using Speech Recognition Software to Help Create Accessible Resources

We've been working on creating more audio and video resources recently, and we've wanted to develop a process to allow us easily and quickly create transcriptions of them.

Transcriptions, we think, would be a step towards helping make resources accessible for all potential users. Not only that but it's nice to have a transcription so people can scan the contents to see if they want to spend time watching a video.

We've just purchased a copy of Dragon Naturally Speaking 9 Preferred, which transcribes audio files in several formats:

.wav
.vox
.mp3
.wma
.sri

This is suitable for us as the software we are using for creating video (Premiere Pro 1.5) and screencasts (Camtasia Studio 3) both allow export of the audio as .wav files.

To test it I transcribed a short (134 second) video manually. This contained 391 words and took me about 20 minutes to transcribe (without any special transcription software or peddles).

Dragon transcribed this (before I trained it to recognise my voice) and it got 296 words right. After 30 minutes general training it got 308 words correct, and adding special words that I used in the videos but which weren't in Dragon's vocabulary pushed that to 311 words.

To correct this final file took about 10 minutes, indicating that this way of creating transcriptions would half the time it has been taking me to create transcriptions.

Spending 30 minutes training Dragon to recognise my voice raised the accuracy from 76% to 80%, which doesn't sound like much but I think would be worth asking staff to train Dragon if they were creating a lot of resources that were to be transcribed.

In conclusion, this doesn't look like it will make transcriptions almost automatic, and that would be important if we were to transcribe everything. However it will help us create transcriptions when we want to.

Looking online at transcription services, they charge from about £0.60 per minute for 1-2 speakers to £1.25 per minute for 5 or more speakers. Outsourcing to these services might be a good solution if we have too much work in this area.

1 comment:

Michal said...

Hi, I found at file-extensions.org some information about sri file.