Taking a note from language documentation work, would a tool like ELAN be helpful for this type of project?
https://www.mpi.nl/corpus/html/elan_ug/index.html