Skip to main content
Nasjonalbiblioteket

NST Pronunciation Lexicon for Danish

  • Datasets
  • Public access 

    Publicly available to everyone. Access may still require registration and an API key request, as long as anyone can request such registration and/or API keys.

    Read more about access levels here

  • Open data 

    The dataset is classified as public access and has at least one distribution with an approved open license.

Description

This pronunciation lexicon for Danish was originally produced by Nordic Language Technology (NST), and contains approximately 238,000 entries. The word list consists of a frquency-based 100k list and some additional material.

The lexicon is available as one file in simple text format. Each entry/line contains 51 data fields, separated by a semicolon. Not all fields are equally relevant for all purposes, but given the format it is easy to extract the information you need.

The lexicon contains, among other things, information about the decomposition of compounds and one or more phonetic transcriptions. All transcriptions have been done manually. Some lexical tools that can be used to handle the lexicon, can be downloaded as a separate zip file.

The transcription format is SAMPA (Speech Assessment Methods Phonetic Alphabet).

Distributions
1

APIs providing this dataset
0

No registered APIs provide this dataset.

Similar datasets

NST Pronunciation Lexicon for SwedishNasjonalbiblioteket
Public access
Grapheme-to-Phoneme Models for NorwegianNasjonalbiblioteket
Public access
SCARRIE LexiconNasjonalbiblioteket
Public access
ONOMASTICA Pronunciation LexiconNasjonalbiblioteket
Public access
N-grams from NBdigital 2021Nasjonalbiblioteket
Public access