Semantic features
25 content information units (ICUs)
Separate subjects, places, objects and actions that are represented in the Cookie Theft image.
The list of content units (ICUs) for the Cookie Theft image test as established by Yorkston and Beukelman (1980).
The definition of ICUs can be found here
Total number of ICUs
Total number of ICUs that appear in the sample. Total number of ICUs labeled as "TRUE".
Efficiency
Ratio of the total length of the sample to the total number of ICUs present in the sample.
Idea density
Average semantic similarity between (conceptually distinct) ideas transmitted within a window of words moved through the text.
The average cosine distance (semantic similarity) between all pairs of word embeddings within a window moved through the text. Word embeddings will be extracted from the spaCy en_core_web_lg model, which supports syntactic dependency identification and Part-of-Speech tagging. Within a window, all cosine distances will be averaged. Windows of 3, 10, 25 and 40 words with an increment of half the window length will be implemented.
Variable names
| Variable name | Description | English | French |
|---|---|---|---|
| idea_density_3 | Average semantic similarity between (conceptually distinct) ideas transmitted within a window of 3 words moved through the text | yes | yes |
| idea_density_10 | Average semantic similarity between (conceptually distinct) ideas transmitted within a window of 10 words moved through the text | yes | yes |
| idea_density_25 | Average semantic similarity between (conceptually distinct) ideas transmitted within a window of 25 words moved through the text | yes | yes |
| idea_density_40 | Average semantic similarity between (conceptually distinct) ideas transmitted within a window of 40 words moved through the text | yes | yes |
| ICU_{category} | Is this content unit mention in the text. Only available for the cookie theft task in English and French and the picnic task in English | yes | yes |
| n_icu_true | Total number of ICUs that appear in the sample (labeled as 'TRUE') | yes | yes |
| icu_efficacity | Ratio of the total length of the sample to the total number of ICUs present in the sample | yes | yes |