Welcome to Digital Humanities Toolkit

A tool for processing digital text in different formats and languages

DHTK is a project aims at creating a library with the intent to help researchers in digital humanities finding data and metadata in a simple way. The main idea is that scholars involved in Digital Humanities research often need a certain number of datasets concerning cultural data. Internet is a great source for this kind of information and repositories such as Project Gutenberg, Europeana, Archive.org, et similia, are great resources. Unfortunately, these resources are organized and structured in different ways and often incompatible. Different APIs and different metadata make these repositories difficult to exploit.

Finding resources, checking metadata and finally cleaning data, can be long and tedious process. This is where DHTK comes in as it proposes a unified API to accomplish those tasks and thus support research in Digital