Linguistic resources

Lexical resources

Alexina_PARSLI lexical resources

(Alexina resources can be found on the Alexina project page)

Leffla: Latin (Walther, 2013)
  • approx. 2,200 verbal entries; 115,000 inflected forms
  • 3 different implemented analyses
Lefff with Alexina_PARSLI: French (Sagot & Walther, 2011; Walther & Sagot, 2011)
  • in collaboration with Benoît Sagot (ALMAnaCH, INRIA)
  • new versions built from the Lefff (Sagot, 2010)
  • approx. 7,800 verbal entries; approx. 370,000 inflected forms
  • 4 different implemented analyses
MaltLex: Maltese (Camilleri & Walther, 2012)
  • in collaboration with Maris Camilleri (University of Surrey)
  • approx. 560 verbal entries; 9,000 inflected forms
  • 2 different implemented analyses
TuatLex: Tuatschin (Romansh) (Walther & Sagot, 2017)
  • in collaboration with Claudia Cathomas (Universität Zürich) & Benoît Sagot (ALMAnaCH, INRIA)
  • approx. 2,200 entries; 46,200 inflected forms
KhaLex: Khaling (Kiranti) (Walther et al., 2013)
  • in collaboration with Guillaume Jacques (CRLAO, CNRS) & Benoît Sagot (ALMAnaCH, INRIA)
  • approx. 170 verbal entries; 50,100 inflected forms
  • 4 different implemented analyses
ThuLex: Thulung (Kiranti)
  • in collaboration with Aimée Lahaussois (HTL, CNRS)
  • just starting

Alexina lexical ressources

(Alexina resources can be found on the Alexina project page)

PerLex: Persian (Sagot & Walther, 2010; Sagot et al., 2011)
  • in collaboration with Benoît Sagot (ALMAnaCH, INRIA) within the PerGram project
  • approx. 30,000 entries; 550,000 inflected forms
SoraLex: Sorani Kurdish (Walther & Sagot, 2010)
  • in collaboration with Benoît Sagot (ALMAnaCH, INRIA)
  • approx. 520 entries; 30,000 inflected forms
KurLex: Kurmanji Kurdish (Walther et al., 2010)
  • in collaboration with Benoît Sagot (ALMAnaCH, INRIA) & Karën Fort (Université Paris Sorbonne)
  • approx. 22,000 entries; 410,000 inflected forms

Corpora

Romansh Tuatschin documentation corpus
  • Part of the Project Tuatschin SNF research project, Universität Zürich
  • > 70 hours of audio and video recordings
  • > 100,000 sentences/1,000,000 tokens
  • Automatically annotated for part-of-speech and morphosyntactic information, and code-mixing infor- mation (language identification)
  • Manually corrected
  • Lemmatised, English lemma translation, German translation
  • Content: natural speech (approx. 80%); other genres (theatre, interviews, films and film raws, approx. 20%)
  • Team members at the University of Zürich: Michele Loporcaro, Sabine Stoll, Géraldine Walther, Claudia Cathomas, Jekaterina Mažara, Philippe Maurer, Nora Julmi, Rolf Hotz, Nathalie Schweizer
Romansh Tuatschin acquisition corpus
  • Part of the Project Tuatschin SNF research project, Universität Zürich First documentation of language acquisition for a variety of Romansh
  • approx. 450 hours of audio and video recordings
  • approx. 450,000 sentences/2,500,000 tokens
  • Automatically annotated for part-of-speech and morphosyntactic information, and code-mixing infor- mation (language identification)
  • Manually corrected
  • Lemmatised, English lemma translation, German translation
  • Content: recordings of 6 children (age range 2;0-4;1)
    • in natural settings at home or outside
    • 4-5 hours per child per month over 1-2 years
    • children: 2x approx. 2 to 4 years, 2x approx. 2 to 3 years, 2x approx 3-4
  • Team members at the University of Zürich: Michele Loporcaro, Sabine Stoll, Géraldine Walther, Claudia Cathomas, Jekaterina Mažara, Philippe Maurer, Nora Julmi, Rolf Hotz, Nathalie Schweizer