Skip to main content
Nasjonalbiblioteket

Norwegian Newspaper Corpus

Description

The Norwegian Newspaper Corpus was a project at the University of Bergen where news websites were crawled for news articles.

This version of The Norwegian Newspaper Corpus consists of text from 1998 to 2019. The corpus contains approximately 1,68 billion words for Norwegian Bokmål, and about 68 million words for Norwegian Nynorsk.

There is also a simplified version of the corpus available (1998-2011), where duplicate sentences have been removed and the sentences are ordered alphabetically.

The texts from 1998-2011 are collected in a single downloadable file, otherwise the data are structured as one file per year. See the documentation files for a description of the content and file formats.

Distributions
1

Download
Description:
Not provided
Access URL:
https://hdl.handle.net/21.11146/4
Direct download:
  1. https://www.nb.no/sbfil/tekst/nak_2018.tar
    Generating preview...
  2. https://www.nb.no/sbfil/tekst/nak_2015.tar
    Generating preview...
  3. https://www.nb.no/sbfil/tekst/nak_2016.tar
    Generating preview...
  4. https://www.nb.no/sbfil/tekst/nak_2013.tar
    Generating preview...
  5. https://www.nb.no/sbfil/tekst/nak_2014.tar
    Generating preview...
  6. https://www.nb.no/sbfil/tekst/nak_2012.tar
    Generating preview...
  7. https://www.nb.no/sbfil/tekst/nak_2019.tar
    Generating preview...
  8. https://www.nb.no/sbfil/tekst/nak_2017.tar
    Generating preview...
API:
Not provided
Documentation:
Not provided
License:
Conforms to:
Not provided

APIs providing this dataset
0

No registered APIs provide this dataset.

Similar datasets

Norsk Ordbank - Norwegian Nynorsk 2005-2012Nasjonalbiblioteket
Public access
ONOMASTICA Pronunciation Lexicon 2Nasjonalbiblioteket
Public access
Translation Memories from Semantix ASNasjonalbiblioteket
Public access
NST Pronunciation Lexicon for SwedishNasjonalbiblioteket
Public access
spaCy for Norwegian NynorskNasjonalbiblioteket
Public access