Next to the word and character frequency measures (i.e. the number of times a word or a character occurs in the corpus), we also calculated the contextual diversity (CD) measure for the words and the characters. This is defined as the number of films in which the word or character appears. Extensive analyses by Adelman et al. [16] suggest that CD is a (slightly) more informative measure, a finding confirmed by Brysbaert and New [12] and Keuleers et al. [13]. We did not calculate the CD measure for the PoS dependent frequencies, as to our knowledge this information has not yet been needed.