Skip to main content
Nasjonalbiblioteket

N-gram - Norwegian Bokmål

Description

These n-grams (n=1-6) are made on the basis of the texts in Norwegian Newspaper Corpus and the news texts from the text corpus from Nordic Language Technology AS (NST). In total, the source material consists of 1175 million words of running text.

The n-grams are sorted alphabetically and by frequency, respectively. Frequency lists (unigrams) are published in a separate download. A simplified version, listing the 1000 most frequent n-grams is also available for download.

Distributions
1

Download
Description:
Not provided
Access URL:
https://hdl.handle.net/21.11146/12
Direct download:
  1. https://www.nb.no/sbfil/dok/ngram_nob.pdf
    Generating preview...
API:
Not provided
Documentation:
Not provided
License:
Conforms to:
Not provided

APIs providing this dataset
0

No registered APIs provide this dataset.

Similar datasets

Norsk Ordbank - Norwegian Nynorsk 2005-2012Nasjonalbiblioteket
Public access
ONOMASTICA Pronunciation Lexicon 2Nasjonalbiblioteket
Public access
Translation Memories from Semantix ASNasjonalbiblioteket
Public access
NST Pronunciation Lexicon for SwedishNasjonalbiblioteket
Public access
spaCy for Norwegian NynorskNasjonalbiblioteket
Public access