Skip to main content
Nasjonalbiblioteket

Discussions from Wikipedia

Description

This corpus is a dump of discussion threads from the Norwegian Wikipedia, where authors discuss various issues regarding the publication of specific Wikipedia articles.

The material is split into two files, one each for Norwegian Bokmål (nb.wikipedia.json) and Nynorsk (nn.wikipedia.json). Each file is a structured JSON array. One discussion corresponds to one element, with one level containing text and metadata. There are eight key/value pairs per discussion:

  • title: title of article under discussion
  • pageid: text identifier
  • revid: audit information
  • wikidata: other data
  • contentcategories: metadata
  • hiddencategories: metadata
  • text: discussion text
  • bytelength: length of text in number of bytes

An example of this can be found in the pdf file (2019_wikidisc.pdf).

Distributions
1

Download
Description:
Not provided
Access URL:
https://hdl.handle.net/21.11146/66
Direct download:
API:
Not provided
Documentation:
Not provided
License:
Conforms to:
Not provided

APIs providing this dataset
0

No registered APIs provide this dataset.

Similar datasets

Norsk Ordbank - Norwegian Nynorsk 2005-2012Nasjonalbiblioteket
Public access
ONOMASTICA Pronunciation Lexicon 2Nasjonalbiblioteket
Public access
Translation Memories from Semantix ASNasjonalbiblioteket
Public access
Grapheme-to-Phoneme Models for NorwegianNasjonalbiblioteket
Public access
SCARRIE LexiconNasjonalbiblioteket
Public access