Utilizing Trint, a speech-to-text web application that uses machine learning to transcribe audio and video, a summer project team made amazing progress towards the goal to eliminate the CBS, NBC and CNN backlog of news programs in the Vanderbilt Television News Archive (VTNA) and improve the accuracy of our collection by transcribing broadcast news shows, including a number of special news events.
The team uploaded videos of news broadcasts and then labeled the transcripts by parts of the program, including introductions, news stories, commercials and the good-night segments. The timecoded transcripts were then quality checked for the spelling of names, places and products. Once corrected, the files were exported as closed caption text documents and .xml files ready to upload to the VTNA database.
The summer team added 895 broadcast in a quick five months, helping reduce a backlog of four years down to six months. With news episodes being recorded every day, it is wonderful news that several members of the summer team committed to continuing their remote work with the program this fall. Five students joined the team for the fall semester.