Somo ranked #2 in Campaign Live’s Best Places to Work 2019
Campaign Live recently picked the 50 Best Places to Work in 2019 and Somo ranked #2 in medium-sized agencies and #6 overall!Read more
Our recent foray to Google Cloud Next '18 led us to some fascinating, weird and wonderful tech, but one of the most eye-catching titles was surely this:
The Hieroglyphics Initiative: Cloud AI and AutoML in the Translation of Ancient Egyptian.
I got chills. The fact is, the modern decipherment of hieroglyphic scripts is one of the most iconic and astounding stories in linguistics and archeology.
As hieroglyphs fell out of use, their meaning was forgotten. The first attempt to decipher them was made in the 9th and 10th centuries by Arab historians in medieval Egypt; they were able to make some headway by comparing them to contemporary Coptic language.
European scholars took up the baton around six centuries later, building on these early works by Egyptian scholars, but while they made steps towards a solution, they failed to find it.
Then, in 1799, the Rosetta Stone was discovered: a localised document in ancient Egyptian using both hieroglyphs and demotic script, and in ancient Greek. (I still find the mere existence of this document wonderful; our ancestors clearly had the sophistication to care about l10n and i18n.) By some extremely clever reasoning, scholars were able to work out that hieroglyphs were partly a phonetic script — that is, symbols encoded sounds, not meanings, as our own alphabet does (by contrast, Chinese is an example of an ideographic script, where symbols have fixed meanings but not necessarily fixed pronunciations).
This feat of scholarship and codebreaking and the artefact that enabled it has captured the imagination of the world to such a great extent that a company specialising in teaching second languages is named after it.
Now, machine translation is one of the most brilliant and successful applications of deep learning there is. You can imagine how excited I was to discover what had been cooked up in the name of Egyptology.
However, this is not a Decipherment Of Linear B style story about a cryptanalytical approach to understanding ancient texts through clustering and unsupervised learning techniques. Rather, this is about a remarkable use of supervised learning to produce a tool to aid the laborious process of interpreting and translating hieroglyphic texts.
Assassin's Creed Origins garnered praise from reviewers for its historical accuracy — creating an immersive experience closely matching our understanding of ancient Egypt. As such, the game owes a debt of gratitude to Egyptology, and the game makers, Ubisoft, wanted to repay some of that debt by using AI to help interpret hieroglyphs.
To do that, they teamed up with Psycle, a company specialising in (among other things) education tools. Partnering with Google, they attacked this extremely challenging problem.
To understand why this problem is so challenging, it's worth taking a closer look at hieroglyphs themselves.
Hieroglyphs are a highly nuanced and complex form of writing. They consist of both ideographic symbols (symbols that convey a particular meaning) and phonetic symbols (symbols that convey a particular sound).
A modern analogue might be Japanese, which has Kanji symbols (ideograms based on Chinese languages), and Hiragana and Katakana (phonetic alphabets for Japanese and foreign words respectively); Japanese has many more characters than ancient Egyptian and is skewed towards ideograms, but Japanese also has millions of living native writers who can tell us what something means.
Rather like beautiful and subtle Arabic typography, hieroglyphs can change according to their position and context. For example, the beaks of birds, the heads of people and so on face against the direction the symbols should be read. Like some Greek texts, hieroglyphic texts may be written boustrophedonically (a wonderful word meaning "as the ox turns", where one line travels left-to-right and the next line travels right-to-left, like an ox ploughing a field). Others may go from right to left or from top to bottom.
In addition, engraved texts may be degraded; the primary symbol in a group depends heavily on context, and there is no punctuation (though cartouches, groups of symbols within a box, do help somewhat). When a correct reading can rely on something as subtle as the specific shape of a bird's beak, interpretation becomes a laborious task indeed.
There are somewhere in the region of a thousand distinct hieroglyphs. The most commonly used list is Gardiner's sign list, which includes 763 signs in 26 categories; Unicode includes 1071 signs, organised using Gardiner's categorisation. Another historian, Georg Möller, codified even more. The fact that it's hard for me to tell you definitively how many hieroglyphs there are may give some further insight into the difficulties Egyptologists face.
This is where Assassin's Creed and Ubisoft come into the picture. They reasoned that a system that can identify hieroglyphs even partially would accelerate scholarly efforts to interpret texts, simply by reducing the amount of brute work involved in figuring out what each sign is.
However, machine learning is data hungry. Moreover, about 100 glyphs cover 80% of all known texts, meaning the remaining several hundred have very few examples in existence at all.
If you don't have a lot of data, one tried and true technique is data augmentation, which can be broadly considered a method of making more data out of a small amount of existing data, and this is where Ubisoft and Psycle did their first particularly clever thing.
They created a set of canonical forms of the Gardiner signs. They then created a website where gameplayers could draw round the edge of these signs.
This partially mimicked the kind of natural variation you might expect in a set of real hieroglyphs. After all, these would all have been made by real human scribes and engravers trying to reproduce the canonical hieroglyphic forms.
These scribes would not have been tracing round an outline, so the data generated by this website would still be "cleaner" than real hieroglyph data — there would be no variation in aspect ratio or relative proportions (as you find in many writing systems, Latin included), but at least there would be some variation to give the algorithms a better chance with real, messy data.
The method also tended to introduce some variation in proportion. We see this in writing systems today: for example, different fonts may have a different x-height (the height of the lower case letters in proportion to the upper case ones), as seen below. However, since people were following outlines, there would be less of this kind of variation.
Even with this augmented data set, there would not be enough to train a deep convolutional network of the type you'd want to classify hieroglyphs. Fortunately, a standard technique has emerged to deal with this kind of problem: transfer learning.
The idea behind transfer learning is pretty simple: you take an existing deep network trained on the customary millions of data points, then replace only the final layer of the network, and retrain that on the data you care about. For example, a generic image recognition system can be made specific, to solve a particular image recognition problem.
The idea behind this is that the early layers of a deep network tend to detect simple features common to many of the training examples. For example, in images, early layers will detect edges, blobs, corners and gradients — extremely basic visual features. Later layers will combine these in specific patterns to produce more advanced features — simple shapes at first, then more complex ones — by composing the features from earlier layers together.
By replacing only the final layer, these earlier, general features can be reused to perform very specific tasks without having to redo all the work of producing them.
I like to compare this to learning a second language. Children spend maybe ten or fifteen years mastering their first language; a ten-month-old generally can't say many words, but makes sounds that are recognisably aligned to their parents' language; single words, then two-word phrases, then extremely simple sentences follow. Concepts like "you" and "me" require some alignment, because these are words that change meaning depending on who says them, and so toddlers will initially refer to themselves and the people they are talking to by name; names are fixed, and therefore easier.
By contrast, a teenager can be intelligible in basic conversations in a second language after a year, though they can't say much and will make mistakes no native speaker would — even a toddler. But they won't have to re-learn what the concepts of "me" and "you" are for.
Using a general image recognition system — in this case, Google's excellent Inception network—they were able to train the last layer on the set of hieroglyphs and get pretty good accuracy for the hundred most common hieroglyphs.
The accuracy on the other hieroglyphs was so bad, however, that they actually refused to tell us what it was. I can only imagine the depths of fail they plumbed, and quite possibly, we will never know, but evidently this approach sucked.
Enter Google's new framework, AutoML.
It's worth saying that AutoML uses exactly the same transfer learning approach: it starts from a high quality general image recogniser and trains a final layer on a small dataset. However, it automatically tunes the hyperparameters (a fancy word for "control dials") in a way that is difficult to achieve with a hand-optimised transfer learning approach.
This approach gave 77% accuracy over the 800 glyphs in use. That's pretty cool for such a hard problem.
The tool is now deployed and available for Egyptologists to use.
The first step is an automatic edge detector, which is a fairly conventional system using brightness differentials. The Egyptologist then has an opportunity to clean up the edges and smooth out any wrinkles. It's slightly ironic that this (which is a relatively well-trodden area of image analysis) is manual, but it's also easier for a human to guess what's going on from context.
The human-modified outlines are then fed into the model one-by-one, and predictions for each symbol are provided, with a probability attached to each. Very frequently, the system will identify the correct glyph in its top three guesses, if not in its first one.
After that, it's up to the Egyptologists again — a perfect example of human and machine working together.
One thing that fascinated me was that hieroglyphs' meanings are so conditional on their surroundings. This seems like a golden problem for machine translation type systems to solve. These commonly use networks such as an LSTM to predict sequences of text to infer meaning; this is, essentially, the problem being solved here.
A secondary advantage is that it may be possible to identify an ambiguous symbol from its context. This, again, is exactly the type of problem where LSTMs excel. I would love to see this tool augmented in this way, and doubtless this is in the roadmap.
I think this is a beautiful and imaginative use of machine learning that takes cutting-edge computing techniques and applies them to some of the oldest history we have.
One of the reasons I love this so much is that ancient Egyptian society, like many societies in the region, was a remarkably complex and sophisticated one. The library of Alexandria, before its tragic destruction, was renowned as a centre of scholarship and excellence in the ancient world. Those few texts that survive give us a glimpse of a rich, cosmopolitan and brilliant society. It seems particularly fitting that our most advanced thinking can bring us closer to theirs.