Skip to main content
Nasjonalbiblioteket

NST N-gram - Danish News Text

Description

This corpus contains n-grams derived from a 290 million word corpus of Danish news text from the papers Berlingske Tidende, Ekstrabladet og Politiken. The time period covered is 1995-1999. The corpus was originally developed by Nordic Language Technology (NST) 1997-2003. The n-grams were generated by Uni Research for the National Library and the Language Bank.

Sequences of one to six words have been generated (i.e., unigrams, bigrams, trigrams, 4-grams, 5-grams and 6-grams) and ordered both by frequency and alphabetically. For convenience, a collection of the 1000 most frequent n-grams of all types listed above is also made available as a separate download.

Distributions
1

Download
Description:
Not provided
Access URL:
https://hdl.handle.net/21.11146/28
Direct download:
  1. https://www.nb.no/sbfil/dok/ngram_dan.pdf
    Generating preview...
API:
Not provided
Documentation:
Not provided
License:
Conforms to:
Not provided

APIs providing this dataset
0

No registered APIs provide this dataset.

Similar datasets

Norsk Ordbank - Norwegian Nynorsk 2005-2012Nasjonalbiblioteket
Public access
SCARRIE LexiconNasjonalbiblioteket
Public access
ONOMASTICA Pronunciation Lexicon 2Nasjonalbiblioteket
Public access
Texts from Norwegian WikipediaNasjonalbiblioteket
Public access
N-gram - Norwegian NynorskNasjonalbiblioteket
Public access