TalkBank Contributing

We expect that researchers will contribute corpora constructed with TalkBank programs and tools. It is the obligation of TalkBank and TalkBank users to assure that these contributions are properly acknowledged and cited and that the data are correctly stored and distributed.

To contribute a new data set:

  1. First, please send an email message to me (Brian MacWhinney) at macw@cmu.edu describing your contribution. For FluencyBank contributions, please write to both Brian MacWhinney and Nan Bernstein Ratner (nratner@umd.edu).
  2. Additional instructions that are specific to HomeBank are here.
  3. Additional instructions that are specific to PsychosisBank are here.
  4. Make sure that your contribution is in accord with your IRB Regulations and your desired Contribution Options .
  5. Next, you should send us your media files. Video files should be in .mp4 format, but we can convert from other formats. Audio files should be in .wav or .mp3, but we can convert from other formats. Please use names for your video files that are as short as possible with nothing more than the code name for the participant, session number, and for children, the age in the format YYMMDD. To transfer media, you can follow these steps:
    • Connect to https://talkbank.wetransfer.com
    • You will have to check a box saying that you agree to their terms and conditions. You will only have to do this once.
    • In the box on the left, enter macw@cmu.edu as "email to".
    • Enter your email as "your email" and add a message, if you wish.
    • Then click the "Plus" icon in the upper right.
    • Drag the files you wish to transfer into the "Add Your Files" window or click the plus to locate them.
    • Click the "Transfer" button and watch the transfer going through. You can do other work on your computer during this time.
    • The WeTransferPlus facility then sends us an email advising us when the transfer is complete. Please use only WeTransfer and not Dropbox, Box, or Google Drive for file transfer.
  6. If we have agreed that we will use ASR to create initial versions of transcripts from your media, we will run this process and send the resultant transcripts back to you for approval and corrections. In this case, you can skip the next three steps.
  7. If you already have CHAT files, please make sure that your CHAT files pass CLAN's CHECK program. If your data are not in CHAT or do not pass CHECK, they can still be contributed, but we will need to work to bring them into CHAT format.
  8. TalkBank uses a strict system for matching transcripts to media. This requires that each transcript align with only one media file and that the names of the transcript file and the media file be the same (ignoring the extensions). For example, the file 020456.cha must have a matching 020456.mp4 (or .wav or .mp3) media file. In addition, the @Media line in the *.cha file should use the name of the media which matches the name of the transcript. In general, please try to use short file names to make processing easier. Information already provided in folder names and the @ID lines does not need to be duplicated in file names.
  9. Please combine your transcripts into a single .zip file and send that file as an email attachment to macw@cmu.edu.
  10. Please send us the information needed to create a corpus web page, such as this one . You can use this template to create that HTML page. You just need to replace the various XXX fields with the necessary information.
  11. Along with the documentation, please complete this contribution form, scan it, and send us the scan and documentation either through WeTransfer or as an email attachment to macw@cmu.edu.
  12. Once everything is in place, we will create a webpage for your corpus, like this one along with a DOI number, and we will announce the addition of the new corpus to the Info-CHILDES list or the AphasiaBank list.
  13. If you need to cite your corpus, you can use the format in this example, as suggested by the APA manual: Bernstein Ratner, Nan. (1988). Bernstein Ratner Corpus. (data file) Retrieved from https://childes.talkbank.org doi:10.21415/T5CC7X.
  14. We are very thankful for the kindness and collegiality you are showing in contributing your hard-won data.

Guidelines for corpus documentation are given in section 4.5 of the CHAT manual.