Zillah Watson and Tristan Fearne from BBC Research & Development told us about this project
What is the aim of this project?
We wanted to find a way to make the BBC's radio and television archives more searchable.
How big is the archive?
Our starting point was a massive audio archive of World Service broadcasts in English, dating back 6 decades and covering over 36,000 radio programmes. That’s more than 3 years' continuous listening!
How did you do it?
We used speech-to-text software to create transcripts of radio programmes. Then algorithms were used to extract topic tags from the transcripts. The tags then enable you to search the programmes.
Great idea, so what came next?
The next stage was to add speaker recognition - each programme has been segmented to show where different people are talking. s can help tell us who each voice belongs to. And once a speaker’s voice has been identified, you can find other programmes in the archive featuring the same person. So you can search for all the programmes which contain, say, Nelson Mandela’s voice.
Sounds smart, so how do you want to make it better?
The tags weren’t perfect, and so we decided to see if s of the archive could help improve it by voting tags up or down. s can also help select better photos, which are automatically pulled in by the tags. So far in our experiment over 67,000 tags have been improved.
I like the sound of that.
That’s what we’re hoping. We’ve written in greater detail about how we designed and built the site.