Skip to main content
Nasjonalbiblioteket

NST N-gram - Norwegian Bokmål

  • Datasets
  • Public access 

    Publicly available to everyone. Access may still require registration and an API key request, as long as anyone can request such registration and/or API keys.

    Read more about access levels here

  • Open data 

    The dataset is classified as public access and has at least one distribution with an approved open license.

Description

These n-grams are derived from parts of the Text Corpus from Nordic Language Technology AS (NST). The source material consists of 510 million words of running text.

The n-grams are also available as an overview listing only the 1000 most frequent n-grams (n=1-6).

In the full version, all the derived n-grams (n=1-6) are sorted alphabetically and by frequency, respectively. Frequency lists (unigrams) are also available separately.

Distributions
1

Nameless distribution
  • gtar
Description:
Not provided
Access URL:
https://hdl.handle.net/21.11146/3
Direct download:
API:
Not provided
Documentation:
Not provided
License:
Conforms to:
Not provided
Download

APIs providing this dataset
0

No registered APIs provide this dataset.

Similar datasets

NST Pronunciation Lexicon for SwedishNasjonalbiblioteket
Public access
Grapheme-to-Phoneme Models for NorwegianNasjonalbiblioteket
Public access
SCARRIE LexiconNasjonalbiblioteket
Public access
ONOMASTICA Pronunciation LexiconNasjonalbiblioteket
Public access
N-grams from NBdigital 2021Nasjonalbiblioteket
Public access