Skip to main content
Nasjonalbiblioteket

Discussions from Wikipedia

Description

This corpus is a dump of discussion threads from the Norwegian Wikipedia, where authors discuss various issues regarding the publication of specific Wikipedia articles.

The material is split into two files, one each for Norwegian Bokmål (nb.wikipedia.json) and Nynorsk (nn.wikipedia.json). Each file is a structured JSON array. One discussion corresponds to one element, with one level containing text and metadata. There are eight key/value pairs per discussion:

  • title: title of article under discussion
  • pageid: text identifier
  • revid: audit information
  • wikidata: other data
  • contentcategories: metadata
  • hiddencategories: metadata
  • text: discussion text
  • bytelength: length of text in number of bytes

An example of this can be found in the pdf file (2019_wikidisc.pdf).

Distributions
1

Download
Description:
Not provided
Access URL:
https://hdl.handle.net/21.11146/66
Direct download:
API:
Not provided
Documentation:
Not provided
License:
Conforms to:
Not provided

APIs providing this dataset
0

No registered APIs provide this dataset.

Similar datasets

NST Pronunciation Lexicon for SwedishNasjonalbiblioteket
Public access
Grapheme-to-Phoneme Models for NorwegianNasjonalbiblioteket
Public access
SCARRIE LexiconNasjonalbiblioteket
Public access
ONOMASTICA Pronunciation LexiconNasjonalbiblioteket
Public access
N-grams from NBdigital 2021Nasjonalbiblioteket
Public access