Skip to main content
Nasjonalbiblioteket

Translation Memories from Semantix AS

Description

This corpus contains translation memories provided to the National Library of Norway by Semantix AS. The translations have been carried out on behalf of various public agencies and institutions.

The corpus is composed of texts of English or Norwegian Bokmål origin, with parallelized translations into the opposite language. There are some very few examples of translation into Norwegian Nynorsk, but for simplicity, these have been classified as Norwegian Bokmål.

All translations from English to Norwegian Bokmål are collected in one separate file, and vice versa. The files are in TMX 1.4 format (a variant of XML). Each translation unit (TU) is marked with the institution for which the translation has been carried out. A TU corresponds (more or less) to a meaningful linguistic unit, typically a sentence, a heading etc. A TU may also consist of a single word or several clauses.

The corpus contains a total of 1,325,013 TUs, distributed as follows:

  • English > Norwegian Bokmål: 250,053 TUs
  • Norwegian Bokmål > English: 1,074,960 TUs

The documentation file contains an overview of the agencies and institutions, and the number of TUs belonging to each institution.

Distributions
1

Download
Description:
Not provided
Access URL:
https://hdl.handle.net/21.11146/62
Direct download:
API:
Not provided
Documentation:
Not provided
License:
Conforms to:
Not provided

APIs providing this dataset
0

No registered APIs provide this dataset.

Similar datasets

Norsk Ordbank - Norwegian Nynorsk 2005-2012Nasjonalbiblioteket
Public access
SCARRIE LexiconNasjonalbiblioteket
Public access
N-grams from NBdigital 2021Nasjonalbiblioteket
Public access
ONOMASTICA Pronunciation Lexicon 2Nasjonalbiblioteket
Public access
Målfrid 2023 – Freely Available Documents from Norwegian State InstitutionsNasjonalbiblioteket
Public access