Businesses develop technology which helps us communicate online. Captions and transcriptions are examples of this. Read on to find out more.
Businesses develop technology which helps us communicate online. Captions and transcriptions are examples of this. Read on to find out more.
One of the struggles in life is getting meaning across. A concept can be explained in three different ways to someone and they still might not understand it. Nothing being said can mean so much.
There are hand gestures, nods and shakes, shrugs – all this factors into the intention of expressing meaning.
Filter this through technology then it becomes a little bit more difficult: discerning sarcasm via text messages, making eye contact through FaceTime, hearing someone’s pre-recorded message clearly.
Devices, platforms, and pieces of software have sought to make communication easier and have achieved it. Captioning and transcribing are two of such things which often fall under the one heading of speech-to-text. What are they and what their benefits?
Captioning
Open and closed captions transcribe speech to text. The text details what is said, who said it, and other non-speech sounds like music and environmental noise, and places the text, time-coded, at the bottom of the video to play as the audio does.
They are different to subtitles in that they provide much more information and assume that the audience cannot hear the audio whatsoever. Captions can be in a different language to the content’s original one, much like subtitles.
The difference between open and closed is permanence: open captions are always there and cannot be turned off; closed can be turned off.
Transcribing
Transcriptions detail similar things to captions. What is said will be included, as will the speaker. Time-codes are possible, depending on the service. However, transcriptions do not appear on the video. They are separate. The transcription will be a text file, and is deemed as its own material, not reliant on the audio or the video to make meaning.
Human and AI
The transcription service is another market that straddles the man-machine line.
Transcriptions are an ancient art for humans. The spoken words of monarchs, chiefs, emperors, religious leaders, and peasants were transcribed often.
This is still true today. PR professionals, courtroom reporters, journalists, academics, secretaries, and many more – all these turn speech into text in their industries. They do it accurately and reliably, though they are more expensive than their machine counterparts.
Businesses like Verbit use AI to transcribe audio – pre-recorded or live – into captions or transcriptions. The accuracy is high, as is the speed in which they can accomplish the task. With each new job, the information fed into the AI helps it to learn and improve. Though, of course, human hands design and programme them so it is, in fact, a working relationship.
Use
The speech-to-text practice is often seen as primarily concerned with accessibility. People who are d/Deaf or have hearing impairment use the captions or transcriptions to engage with the audio-visual content.
This is true. However, those who aren’t deaf and don’t have hearing impairment also lean on these services. Studies show the majority of users watch video content without the sound on, so captions are a great means of adhering to that trend.
Captions are all about understanding the content in the moment: be it social media videos, training videos, or live conferences.
Transcription services are a little different. They are more often used after the lecture has been attended, after the client has made their statement, after the meeting has finished. As mentioned, they are separate to the video.
This, though, can help people in the moment: students who make copious notes don’t have to if there is a transcription of a lecture being sent to them, which results in them internalising and comprehending the content with greater in-the-moment focus than they might have otherwise had.
Fundamentally, caption and transcription services enable people to communicate better: both as a speaker and as a listener. Speech and text are clear.
Thanks for signing up to Minutehack alerts.
Brilliant editorials heading your way soon.
Okay, Thanks!