AI helps historians complete ancient Greek inscriptions damaged over millennia – TechCrunch

As if being a scholar of ancient Greek wasn’t fundamentally difficult enough, the primary texts they rely on are often damaged beyond repair, being thousands of years old. Historians may have a powerful new tool to Ithaca, a machine learning model built by DeepMind that makes surprisingly accurate guesses about missing words as well as the location and date of text. It’s an unusual application of AI, but shows how useful it can be outside of the tech world.

The problem of incomplete ancient texts crosses many disciplines in which experts work with degraded materials. The original document can be made of stone, clay, or papyrus, written in Akkadian, Ancient Greek, or Linear A, and describe anything from a grocer’s bill to a hero’s journey. What they all have in common is damage accumulated over thousands of years.

Gaps where text is worn or torn are often called gaps and can be as short as a missing letter or as long as a chapter or even an entire story. Completing them may be trivial or impossible, but you have to start somewhere – and that’s where Ithaca is supposed to help.

Trained on a huge library of ancient Greek texts, Ithaca (named after Odysseus’ native island) can not only tell what a missing word or phrase is, but can also determine its age and where it is. Has been written. It’s not going to complete an entire epic cycle on its own – it’s meant to be a tool for those working with these texts, not a solution.

An article published in the journal Nature demonstrates its effectiveness, taking as an example certain decrees of Periclean Athens. Thought to have been written around 445 BC. AD, Ithaca suggested, based on his textual analysis, that they actually dated to 420 BC. circa AD – in accordance with more recent evidence. That might not seem like a lot, but imagine if the Bill of Rights was written 20 years later!

Picture credits: DeepMind

As for the text itself, the study’s experts got it about 25% on the first pass; not exactly stellar, although of course text restoration is not meant to be an afternoon lark but a long term project. Coupled with Ithaca, however, they quickly achieved 72% accuracy. This is often the case in other situations where humans are ultimately more specific but can speed up their process by quickly eliminating dead ends or suggesting a starting point. In medical data, it can be easy to watch for an anomaly that AI could quickly point out – but ultimately it’s human expertise that picks up the details and finds the right answer.

You can test a clean version of Ithaca here, if you have an ancient Greek text riddled with gaps at hand, or use one of their provided examples to see how it fills in the requested gaps. For longer pieces or more than 10 letters missing, try it in this Colab notebook. The code is available at this GitHub page.

While Ancient Greek is an obvious and fruitful area in which Ithaca can start, the team is already hard at work on other languages ​​as well. Akkadian, Demotic, Hebrew, and Maya are all on the list, and I hope more will be added over time.

“Ithaca exemplifies the potential contribution of natural language processing and machine learning in the humanities,” said Ion Androutsopoulos, a professor at the University of Athens who worked on the project. “We need more projects like Ithaca to further showcase this potential, but also suitable courses and teaching materials to train future researchers who will have a better joint understanding of humanities and AI methods.”

Comments are closed.