Skip to main content
Nasjonalbiblioteket

Norwegian Conversation Speech Corpus

  • Datasets
  • Public access 

    Publicly available to everyone. Access may still require registration and an API key request, as long as anyone can request such registration and/or API keys.

    Read more about access levels here

  • Open data 

    The dataset is classified as public access and has at least one distribution with an approved open license.

Description

NB Samtale is a speech corpus made by the Language Bank at the National Library of Norway. The corpus contains orthographically transcribed speech from podcasts and recordings of live events at the National Library. The corpus is intended as an open source dataset for Automatic Speech Recognition (ASR) development, and is specifically aimed at improving ASR systems' handle on conversational speech.

The corpus consists of 12,080 segments, a total of 24 hours transcribed speech from 69 speakers. The corpus ensures both gender and dialect variation, and speakers from five broad dialect areas are represented. Both Bokmål and Nynorsk transcriptions are present in the corpus, with Nynorsk making up approximately 25% of the transcriptions.

We greatly appreciate feedback and suggestions for improvements. PLease contact us at sprakbanken@nb.no.

Distributions
1

Nameless distribution
  • zip
Download

APIs providing this dataset
0

No registered APIs provide this dataset.

Similar datasets

NST Pronunciation Lexicon for SwedishNasjonalbiblioteket
Public access
Grapheme-to-Phoneme Models for NorwegianNasjonalbiblioteket
Public access
SCARRIE LexiconNasjonalbiblioteket
Public access
ONOMASTICA Pronunciation LexiconNasjonalbiblioteket
Public access
N-grams from NBdigital 2021Nasjonalbiblioteket
Public access