TalkBank MOR and UD Grammars

We are currently transitioning the TalkBank system for morphosyntactic analysis from the MOR/POST/MEGRASP system to the UD (Universal Dependencies) system which is described in detail here . We apply UD taggers to TalkBank files using Chris Manning's Stanza system that has been built into the Batchalign2 program created by Houjun Liu.

The great advantage of UD over MOR is that it is available for many more languages. It also seems to perform as well as or better than MOR for computing dependency relations on the %gra line. However, its control of morphological analysis on the %mor line is not yet as good as MOR. So, for English and a few other languages, we will retain use of MOR for this purpose until we have finally harmonized codes with UD. For English only, the UD tiers are called %umod and %ugra, leaving the names %mor and %gra for the tiers created by MOR.

As of March 2024, we have tagged these languages in CHILDES using UD: Afrikaans, Cantonese, Catalan, Croatian, Dutch, Estonian, German, Icelandic, Irish, Italian, Japanese, Korean, Mandarin, Norwegian, Polish, Portuguese, Romanian, Serbian, Slovenian, Spanish, Swedish, Turkish, and Welsh. Once UD grammars become available, we hope to apply UD through Batchalign to languages such as Sesotho, Nungon, . Currently, application to Arabic, Bulgarian, Farsi, Greek, Hebrew, Russian, and Tamil is blocked by the fact that the transcripts were done in a non-standard romanization not supported by UD. Application to Danish and Hungarian will require extensive cleanup of the transcripts. Users may wish to still rely on the MOR grammars for English and Hebrew and the word segmenter for Chinese .

  • English (eng): This grammar was built by Brian MacWhinney and Mitzi Morris.
  • Hebrew (heb): This grammar was developed by Aviad Albert, Bracha Nir, Shuly Wintner, Brian MacWhinney, and Ruth Berman.