Corpora

Romansh Tuatschin documentation corpus
  • Part of the Project Tuatschin SNF research project, Universität Zürich
  • > 70 hours of audio and video recordings
  • > 100,000 sentences/1,000,000 tokens
  • Automatically annotated for part-of-speech and morphosyntactic information, and code-mixing infor- mation (language identification)
  • Manually corrected
  • Lemmatised, English lemma translation, German translation
  • Content: natural speech (approx. 80%); other genres (theatre, interviews, films and film raws, approx. 20%)
  • Team members at the University of Zürich: Michele Loporcaro, Sabine Stoll, Géraldine Walther, Claudia Cathomas, Jekaterina Mažara, Philippe Maurer, Nora Julmi, Rolf Hotz, Nathalie Schweizer
Romansh Tuatschin acquisition corpus
  • Part of the Project Tuatschin SNF research project, Universität Zürich First documentation of language acquisition for a variety of Romansh
  • approx. 450 hours of audio and video recordings
  • approx. 450,000 sentences/2,500,000 tokens
  • Automatically annotated for part-of-speech and morphosyntactic information, and code-mixing infor- mation (language identification)
  • Manually corrected
  • Lemmatised, English lemma translation, German translation
  • Content: recordings of 6 children (age range 2;0-4;1)
    • in natural settings at home or outside
    • 4-5 hours per child per month over 1-2 years
    • children: 2x approx. 2 to 4 years, 2x approx. 2 to 3 years, 2x approx 3-4
  • Team members at the University of Zürich: Michele Loporcaro, Sabine Stoll, Géraldine Walther, Claudia Cathomas, Jekaterina Mažara, Philippe Maurer, Nora Julmi, Rolf Hotz, Nathalie Schweizer