Skip to content

Semantic features

25 content information units (ICUs)

Separate subjects, places, objects and actions that are represented in the Cookie Theft image.

The list of content units (ICUs) for the Cookie Theft image test as established by Yorkston and Beukelman (1980).

The definition of ICUs can be found here

Total number of ICUs

Total number of ICUs that appear in the sample. Total number of ICUs labeled as "TRUE".

Efficiency

Ratio of the total length of the sample to the total number of ICUs present in the sample.

Idea density

Average semantic similarity between (conceptually distinct) ideas transmitted within a window of words moved through the text.

The average cosine distance (semantic similarity) between all pairs of word embeddings within a window moved through the text. Word embeddings will be extracted from the spaCy en_core_web_lg model, which supports syntactic dependency identification and Part-of-Speech tagging. Within a window, all cosine distances will be averaged. Windows of 3, 10, 25 and 40 words with an increment of half the window length will be implemented.

Variable names

Variable name Description English French
idea_density_3 Average semantic similarity between (conceptually distinct) ideas transmitted within a window of 3 words moved through the text yes yes
idea_density_10 Average semantic similarity between (conceptually distinct) ideas transmitted within a window of 10 words moved through the text yes yes
idea_density_25 Average semantic similarity between (conceptually distinct) ideas transmitted within a window of 25 words moved through the text yes yes
idea_density_40 Average semantic similarity between (conceptually distinct) ideas transmitted within a window of 40 words moved through the text yes yes
ICU_{category} Is this content unit mention in the text. Only available for the cookie theft task in English and French and the picnic task in English yes yes
n_icu_true Total number of ICUs that appear in the sample (labeled as 'TRUE') yes yes
icu_efficacity Ratio of the total length of the sample to the total number of ICUs present in the sample yes yes