speechmetryflow
Automated nextflow-based workflow designed to extract both audio and text metrics from speech tasks (like picture descriptions) at scale.
Running
nextflow run lingualab/speechmetryflow -r {last_release_or_tag} --input participant_ids.csv
Replace the -r
option with the release you want to use
Files needed
participant_ids.csv
This CSV file must contain at least 4 columns:
- participant_id is required for the pipeline to find your files. These files must begin by the participant_id. To specify the folder where your files are located, see nextflow.config.
- language: 2 choices,
en
orfr
. - sex: 2 choices,
male
orfemale
. - task: 2 choices,
cookie_theft
orpicnic
.
Example:
participant_id | language | sex | task |
---|---|---|---|
sub-PKM8767 | en | male | cookie_theft |
sub-SBK4467 | en | female | picnic |
nextflow.config
Example for elm server:
params {
audio_folder = "/data/brambati/dataset/CCNA/derivatives/audio_extract"
text_folder = "/data/brambati/dataset/CCNA/derivatives/cookie_txt"
}
And then run:
nextflow run lingualab/speechmetryflow -r {last_release_or_tag} -profile unf_elm --input participant_ids.csv
output
The pipeline produces 3 csv files:
population_lingualab_audio
: metrics compute withlingualabpy_lingualab_audio
from lingualabpypopulation_uhmometer_metrics
: metrics compute with uhm-o-meterpopulation_lingualab_text
: metrics compute with Text2Variable