Skip to main content

Tagged Norwegian Bokmål texts from NBdigital

Description

This corpus contains 4,807 morphologically tagged texts in Norwegian Bokmål from the National Library of Norway's corpus of texts in the public domain. All texts have been published after 1960.

The texts were automatically tagged with the Oslo-Bergen tagger (see http://www.tekstlab.uio.no/obt-ny/english/index.html), with syntactic disambiguation. In theory, this should give an accuracy of approximately 96,5%. However, the texts have been digitized and OCR-read automatically (with an average word confidence of approximately 90%); this means the overall accuracy is probably considerably lower.

The data is stored as one xml file per text/book, with a simple xml structure. See the documentation file for an example.

Distributions
1

Download
Description:
Not provided
Access URL:
https://hdl.handle.net/21.11146/43
Direct download:
API:
Not provided
Documentation:
Not provided
License:
Conforms to:
Not provided

APIs providing this dataset
0

No registered APIs provide this dataset.