Skip to main content
Nasjonalbiblioteket

NST Pronunciation Lexicon for Norwegian Bokmål

Description

This pronunciation lexicon for Norwegian Bokmål was originally produced by Nordic Language Technology (NST), and contains approximately 785,000 entries. The word list is based on the 100,000 most frequent word forms in NST's Norwegian text corpus.

The lexicon is available as one large csv file. Each entry (line) contains 51 fields, separated by a semicolon. Not all fields are equally relevant for all purposes, but given the format, it should be easy to extract relevant information.

The lexicon contains, among other things, information about the decomposition of compounds and one or more phonetic transcriptions. The phonetic transcription has partly been done manually, but to a large extent it was done automatically using an inflector. Parts of the output of this process was manually checked afterwards. The inflector and other lexical tools that can be used in processing the lexicon, can be downloaded as a separate file.

The transcription format is SAMPA (Speech Assessment Methods Phonetic Alphabet). See http://www.phon.ucl.ac.uk/home/sampa/index.html.

A script for converting the SAMPA transcriptons to IPA can be found on GitHub (https://github.com/peresolb/sampa_to_ipa).

Distributions
1

Download

APIs providing this dataset
0

No registered APIs provide this dataset.

Similar datasets

Norsk Ordbank - Norwegian Nynorsk 2005-2012Nasjonalbiblioteket
Public access
ONOMASTICA Pronunciation Lexicon 2Nasjonalbiblioteket
Public access
Translation Memories from Semantix ASNasjonalbiblioteket
Public access
NST Pronunciation Lexicon for SwedishNasjonalbiblioteket
Public access
spaCy for Norwegian NynorskNasjonalbiblioteket
Public access