Skip to main content
Nasjonalbiblioteket

Tagged Norwegian Bokmål texts from NBdigital

Description

This corpus contains 4,807 morphologically tagged texts in Norwegian Bokmål from the National Library of Norway's corpus of texts in the public domain. All texts have been published after 1960.

The texts were automatically tagged with the Oslo-Bergen tagger (see http://www.tekstlab.uio.no/obt-ny/english/index.html), with syntactic disambiguation. In theory, this should give an accuracy of approximately 96,5%. However, the texts have been digitized and OCR-read automatically (with an average word confidence of approximately 90%); this means the overall accuracy is probably considerably lower.

The data is stored as one xml file per text/book, with a simple xml structure. See the documentation file for an example.

Distributions
1

Download
Description:
Not provided
Access URL:
https://hdl.handle.net/21.11146/43
Direct download:
API:
Not provided
Documentation:
Not provided
License:
Conforms to:
Not provided

APIs providing this dataset
0

No registered APIs provide this dataset.

Similar datasets

Norsk Ordbank - Norwegian Nynorsk 2005-2012Nasjonalbiblioteket
Public access
Translation Memories from Semantix ASNasjonalbiblioteket
Public access
NST Pronunciation Lexicon for SwedishNasjonalbiblioteket
Public access
Grapheme-to-Phoneme Models for NorwegianNasjonalbiblioteket
Public access
spaCy for Norwegian NynorskNasjonalbiblioteket
Public access