Skip to main content
Nasjonalbiblioteket

NST N-gram - Swedish

  • Datasets
  • Public access 

    Publicly available to everyone. Access may still require registration and an API key request, as long as anyone can request such registration and/or API keys.

    Read more about access levels here

  • Open data 

    The dataset is classified as public access and has at least one distribution with an approved open license.

Description

This collection of n-grams (n=1-6) has been produced on the basis of approximately 400 million words of running text from the Swedish text corpus of Nordic Language Technology AS. The corpus contains all the n-grams, sorted alphabetically and by frequency, respectively. There is also a second format available, making it possible to select text types. This version contains more texts and is based on approximately 437 million words. A simplified version, listing the 1.000 most frequent n-grams is also available separately.

Distributions
1

Nameless distribution
  • gtar
Description:
Not provided
Access URL:
https://hdl.handle.net/21.11146/11
Direct download:
API:
Not provided
Documentation:
Not provided
License:
Conforms to:
Not provided
Download

APIs providing this dataset
0

No registered APIs provide this dataset.

Similar datasets

NST Pronunciation Lexicon for SwedishNasjonalbiblioteket
Public access
Grapheme-to-Phoneme Models for NorwegianNasjonalbiblioteket
Public access
SCARRIE LexiconNasjonalbiblioteket
Public access
ONOMASTICA Pronunciation LexiconNasjonalbiblioteket
Public access
N-grams from NBdigital 2021Nasjonalbiblioteket
Public access