Skip to main content
Nasjonalbiblioteket

Public Domain Texts from NBdigital

Description

This corpus consists of public domain texts from the National Library's online collection. The corpus contains 26,344 books (and other written material) by 10,756 different authors (including, e.g., public institutions for publically available material).

The material is downloadable as compressed tar-files containing the texts in two formats: html and simple text without any markup. The character encoding is UTF-8 for both formats.

The quality of the texts varies depending on the quality of the OCR. In addition to texts in Norwegian (Bokmål and Nynorsk), the collection contains texts in several other languages.

Distributions
1

Download
Description:
Not provided
Access URL:
https://hdl.handle.net/21.11146/34
Direct download:
API:
Not provided
Documentation:
Not provided
License:
Conforms to:
Not provided

APIs providing this dataset
0

No registered APIs provide this dataset.

Similar datasets

Norsk Ordbank - Norwegian Nynorsk 2005-2012Nasjonalbiblioteket
Public access
ONOMASTICA Pronunciation Lexicon 2Nasjonalbiblioteket
Public access
Translation Memories from Semantix ASNasjonalbiblioteket
Public access
NST Pronunciation Lexicon for SwedishNasjonalbiblioteket
Public access
spaCy for Norwegian NynorskNasjonalbiblioteket
Public access