Syntactic features
Universal syntactic dependencies
Universal syntactic dependencies are a set of rules that model grammatical relationships in languages. They are characterized by a hierarchical structure in which words are connected according to their syntactic function in a sentence. These rules are called "universal" because they are intended to be applicable across different languages, providing a common framework for linguistic analysis.
The directional dependency relation is a specification of dependency grammar that establishes a connection between a syntactic unit (e.g. a verb) and the entities that make up its relational structure (such as subjects and objects). In a dependency tree, which is a graphical representation of these relationships, the words or morphemes are the nodes and the dependency relationships are the edges, often annotated with syntactic functions such as subject object, etc. https://fr.wikipedia.org/wiki/Grammaire_de_dépendance
Total number of each syntactic dependency: This means counting how many times each type of dependency relation (such as subject, object, complement, etc.) appears in a text.
Calculation with spaCy dependencies (DEP): spaCy is able to analyze a sentence and identify these dependency relationships. Each word in a sentence is associated with a DEP tag that describes its syntactic role.
Two calculation methods: - In absolute number: Count the total number of times a specific syntactic dependency appears. - In relation to the total number of words: Calculate the frequency of each syntactic dependency in relation to the total number of words in the sample, giving a relative measure.
Example
To illustrate the directional dependency relationship in syntax, let's consider the simple sentence: "The cat eats a mouse."
In a dependency tree for this sentence :
- "eats" would be the root because it's the verb, the main action of the sentence.
- "The cat" would be an actant, more precisely the subject of the verb "eats". There would therefore be a directional arrow starting from "mange" and pointing towards "chat", indicating that "chat" is the subject of "mange".
- A mouse" would be another actant, the direct object of the verb "mange". Similarly, a directional arrow would run from "mange" to "souris" to indicate this relationship.
In this tree, each word is connected by lines (or edges) that show how each word depends on the verb (or other words) for its syntactic function in the sentence.
References
Length of syntax dependencies
Average and maximum length of syntactic dependencies. Average and maximum number of words in a sample's syntactic dependencies.
Left and right children
Direct dependents of a word that are connected to it by a single arc to its left or right in the dependency tree.
We measure the average number of left and right children for each word in a sample of texts. This helps us to understand the syntactic structure of the sentences in this sample.
Calculated using spaCy commands n_left
and n_right
in two ways:
- as an absolute number
- in relation to the total number of words in the sample
https://spacy.io/usage/linguistic-features#navigating |
Verbs with inflections (conjugated verbs)
Verbs in the sample that do not match their lemma as extracted by spaCy.
Calculated in two ways:
- in absolute number
- in relation to the total number of words in the sample
Subordinate clauses
Group of words that does not express a complete thought, does not constitute a complete sentence. Complex clauses involving subordination occur when a syntactic dependent (main or not) is used as a causal structure.
Total number of the 4 basic universal dependency types calculated using spaCy's default dependency parse:
- Clausal subjects (csubj)
- Clausal complements divided into those whose subject must be checked (subject outside the clause;
xcomp
) and those whose subject is not checked (subject inside the clause;ccomp
). - Adverbial clause modifiers (
advcl
) - Adnominal clause modifiers (
acl
)
Calculated in two ways:
- as an absolute number
- in relation to the total number of words in the sample
https://universaldependencies.org/u/overview/complex-syntax.html
Average sentence length
Average number of words per sentence. The average number of words per sentence in the sample will be calculated. Sentence boundaries will be determined by spaCy's default "dependency parse".
https://spacy.io/usage/linguistic-features#sbd
Incomplete sentences
Phrases that do not contain a minimum of one verb and its subject. Total number of sentences in the sample that contain no verb with its subject.
Calculated in two ways:
- in absolute numbers
- in relation to the total number of words in the sample
Could indicate: lexical-semantic deficits, syntactic deficits, difficulties with discourse planning (Boschi et al. 2017).
Number of prepositional phrases
Sentences that contain a preposition its object (noun or pronoun) and any object modifier, (Boschi et al. 2017).
Calculated in the following two ways:
- in absolute number
- in relation to the total number of words in the sample
Number of verbal phrases
Basic sentences containing at least one verb and its dependents. Calculated using basic spaCy implementations.
Calculated in two ways:
- absolute number
- in relation to the total number of words in the sample
Length and number of noun phrases
A nominal phrase is a group of words centered around a noun (substantive) that functions as the subject, object or complement in a sentence. For example, in the sentence "The black cat sleeps on the carpet", "The black cat" is a noun phrase.
The length of a nominal phrase is the number of words that make it up. It can vary from two words like "A house" to a longer sequence like "The big house by the road".
Total number and average length of noun phrases in the sample. Calculated using basic spaCy implementations.
Calculated in two ways:
- absolute number
- in relation to the total number of words in the sample
https://spacy.io/usage/linguistic-features#noun-chunks
Verb tenses used
Forms verbs take to indicate when the action takes place in time. Total number of verbs conjugated in the present, past and future tenses in the sample.
Calculated in two ways:
- in absolute numbers
- in relation to the total number of words in the sample
Clauses per sentence
Groups of words comprising a subject and a verb normally used to add further details about a noun in a sentence. Average number of clauses per sentence calculated using basic spaCy implementations.
Proportion of nouns with determiners
Proportion of names for which a determiner is present. Number of names in the sample attached to a determiner out of the total number of names in the sample. Calculated using spaCy's dependency parse.
Coordinated phrases
Phrases linked by one or more coordinating conjunctions. Total number of sentences in the sample containing the following coordinating conjunctions: "and", "but", "for", "nor", "or", "yet", "so" (Boschi et al. 2017).
Calculated in the following two ways:
- in absolute numbers
- in relation to the total number of words in the sample
Variable names
Longueur_moyenne_des_dependances
Longueur_maximale_des_dependances
Moyenne_enfants_gauches
Moyenne_enfants_droits
Total_enfants_gauches
Total_enfants_droits
Nombre_de_verbes_inflexion
Verbe_inflection_relatif
Sujets_Clausaux_absolu
Sujets_Clausaux_relatif
Complements_Clausaux_Controles_absolu
Complements_Clausaux_Controles_relatif
Complements_Clausaux_Non_Controles_absolu
Complements_Clausaux_Non_Controles_relatif
Modificateurs_Clauses_Adverbiaux_absolu
Modificateurs_Clauses_Adverbiaux_relatif
Modificateurs_Clauses_Adnominaux_absolu
Modificateurs_Clauses_Adnominaux_relatif
Longueur_moyenne_phrases
Nombre_de_phrases_incompletes_absolu
Nombre_de_phrases_incompletes_relatif
Nombre_de_phrases_prepositionnelles_absolu
Nombre_de_phrases_prepositionnelles_relatif
Nombre_de_phrases_verbales_absolu
Nombre_de_phrases_verbales_relatif
Nombre_absolu_phrases_nominales
Longueur_moyenne_phrases_nominales
Frequence_relative_phrases_nominales
Nbre_verb_present_absolu
Nbre_verb_present_relatif
Nbre_verb_past_absolu
Nbre_verb_past_relatif
Nbre_verb_future_absolu
Nbre_verb_future_relatif
Nbre_clauses_par_phrase
Proportion_noms_determinants
Nombre_de_phrases_coordonnees
Frequence_relative_phrases_coordonnees