Documentation

Programmatic access to Palladia benchmarks

Overview

The following docs are a straightforward way to get to know how to access the demo data used in this website. The Palladia GitHub repository provides free and unlimited access to benchmark results. All responses are JSON encoded.

Base URL
https://raw.githubusercontent.com/Dassoo/Palladia/refs/heads/main/benchmarks

Usage

JavaScript

const category = 'EarlyModernLatin';
const document = '1564-Thucydides-Valla';
const filename = '00363.bin';

const url = `https://raw.githubusercontent.com/Dassoo/Palladia/refs/heads/main/benchmarks/GT4HistOCR/corpus/${category}/${document}/${filename}.json`;

fetch(url)
  .then(response => response.json())
  .then(data => {
    console.log('OCR Data:', data);
  })
  .catch(error => console.error('Error:', error));

Python

import requests

category = 'EarlyModernLatin'
document = '1564-Thucydides-Valla'
filename = '00363.bin'

url = f'https://raw.githubusercontent.com/Dassoo/Palladia/refs/heads/main/benchmarks/GT4HistOCR/corpus/{'{'}category{'}'}/{'{'}document{'}'}/{'{'}filename{'}'}.json'

response = requests.get(url)
if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print(f'Error: {response.status_code}')

cURL

curl \
  -H "Accept: application/json" \
  https://raw.githubusercontent.com/Dassoo/Palladia/refs/heads/main/benchmarks/GT4HistOCR/corpus/EarlyModernLatin/1564-Thucydides-Valla/00363.bin.json

Endpoints

GET/manifest.json

Retrieve the list of all benchmarked documents.

GET/GT4HistOCR/corpus/{corpus_name}/{document_name}/_summary.json

Retrieve detailed document information.

GET/GT4HistOCR/corpus/{corpus_name}/{document_name}/{benchmark_file}.json

Retrieve specific file benchmarks. Replace ".json" with ".png" to retrieve the source image instead.