Skip to main content
Nasjonalbiblioteket

Norwegian-English Parallel Corpus from Public Web Sites

  • Datasets
  • Public access 

    Publicly available to everyone. Access may still require registration and an API key request, as long as anyone can request such registration and/or API keys.

    Read more about access levels here

  • Open data 

    The dataset is classified as public access and has at least one distribution with an approved open license.

Description

This is a sentence-aligned parallel corpus built from the public web sites www.nav.no, www.nyinorge.no and skatteetaten.no. These web sites provide information in both Norwegian Bokmål and Nynorsk, and parts of this is translated into English. The material is split in two corpora, one for Norwegian Bokmål-English, and one for Norwegian Nynorsk-English. Only sentences with a corresponding translation are included in the corpora.

The corpora were made by Paul Meurer and Andrew Salway at the University of Bergen for the Language Bank. See the attached report for a description of how this was done.

The corpora are also available at the Clarino Bergen Centre's corpus management and analysis system Corpuscle (https://clarino.uib.no/korpuskel/).

Distributions
1

Nameless distribution
  • zip
Description:
Not provided
Access URL:
https://hdl.handle.net/21.11146/68
Direct download:
API:
Not provided
Documentation:
Not provided
License:
Conforms to:
Not provided
Download

APIs providing this dataset
0

No registered APIs provide this dataset.

Similar datasets

NST Pronunciation Lexicon for SwedishNasjonalbiblioteket
Public access
Grapheme-to-Phoneme Models for NorwegianNasjonalbiblioteket
Public access
SCARRIE LexiconNasjonalbiblioteket
Public access
ONOMASTICA Pronunciation LexiconNasjonalbiblioteket
Public access
N-grams from NBdigital 2021Nasjonalbiblioteket
Public access