Enigma is a computer program which automatically generates cryptic crossword clues for any given input. I built the program as part of my PhD Thesis as a means of exploring the computer generation of creative language.

enigmaA valid cryptic clue must appear to make sense as a fragment of natural language, but it must also be possible for the reader to interpret it using the syntax and semantics of crossword convention and solve the puzzle that it sets. It should also have a fluent surface reading that conflicts with the crossword clue interpretation of the same text.

Consider, for example, the following clue for the word THESIS: Article with spies’ view (6) This clue sets a puzzle in which the (the definite article) must be combined with SIS (Special Intelligence Service) to create the word thesis (a view), but the reader is distracted by the apparent meaning and structure of the natural language fragment when attempting to solve it.

Sample Clues

Some sample clues generated by enigma.

  • Food store (4) (keep)
  • Still wild lionesses (9) (noiseless)
  • Chopped mace is top (4) (acme)
  • Save to stack (5) (hoard)
  • Drain fresh ewers (5) (sewer)
  • Slip plate above boiled peas (5) (lapse)
  • Twig that coins are lost (5) (scion)

How enigma works

For the clues to work they must have a valid puzzle reading and a fluent surface reading, in the words of Jonathan Crowther (Azed) each clue must look like a plausible “piece of English prose”. Because the semantic representation of the puzzle reading has no bearing on the semantics of the surface reading, the only linkage between the two layers of meaning is the text of the clue itself. There is therefore a search space of possible clue texts each of which is a valid cryptic clue puzzle representing the solution word (known as the light), and a very small fraction of the texts in that search space appear to be pieces of English prose; the great majority of them appear to be gibberish.

The system cannot generate all of the texts in the search space and then use Natural Language Processing tools to look for texts that are not gibberish since the search space is very large (typically containing between 1010 and 1018 possible texts). Instead the system uses syntactic and semantic constraints on the surface text as a heuristic to reduce the size of the search space as the texts are generated. This approach closely mirrors my intuition of the creative process through which cryptic clues are written, and domain expert commentary on clue-setting, for example in Ximenes on the Art of the Crossword by MacNutt.

The syntactic constraints are enforced by a chunk-based generator which constructs chunks of text to represent each of the elements of the puzzle and annotates each chunk with attachment sites which represent the syntactic relationships in which the chunk can participate. Each element of the puzzle, such as an anagram or a word written backwards, might be expressed by a large number of chunks but when the system attempts to merge these with the adjacent element of the puzzle many are discarded as the attachment sites do not fit. A more detailed overview of the process is given in the following paper:
Hardcastle, D. (2007). Cryptic Crossword Clues: Generating Text with a Hidden Meaning 11th European Workshop on Natural Language Generation, Schloss Dagstuhl, Germany. PDF

Since the system has no notion of the meaning of the surface text these syntactic contraints must also be underwritten with semantic selectional constraints, to prevent the system from generating clues with texts of the colourless green ideas variety. The system uses a variety of knowledge sources to apply these semantic constraints, the two largest of these being a check on the domain/range of lexical verbs and the domain of adjectives, and a check on the shared content of the words being used in the clue. The first of these checks is applied using a collocational semantic lexicon inferred by running a dependency parser over the British National Corpus, applying a coarse-grained word sense disambiguation algorithm to map the dependency domains into WordNet and then making cautious generalizations using the WordNet’s hyponymy structure. This process is described in:
Hardcastle, D. (2007). Generalizing Syntactic Collocates for Creative Language Generation Using Corpora in NLG and Machine Translation at MT Summit XI, Copenhagen. PDF
The checks on shared content are performed using a word-distance metric which I called the Russian Doll algorithm since it uses word distance data from a set of concentric windows over the corpus. The statistical measure applied is very similar to Church and Hanks’ MI-score, but the measure is novel both in its use of very collocational information from very wide windows (up to 1,000 words) and in the use of multiple concentric windows.
Hardcastle, D. (2005). An examination of word association scoring using distributional analysis in the British National Corpus: what is an interesting score and what is a useful system? Paper presented at Corpus Linguistics 2005, Birmingham. PDF
A key feature of both measures is that they address the ‘typicality trap’ to which many distributional measures in the corpus linguistics literature are prone. This means that enigma is supplied not just with information about prototypical domain information such as red fire engines or prototypical collocates such as doctor-hospital, but with information about what represents plausible or expected usage in English prose.


I ran a Turing-style test evaluation of the system in which clues generated by and selected by enigma were presented alongside a real clue for the same light from a newspaper and the subjects have to decide which is the real clue. Nearly 60 people participated in the experiment, on average subjects correctly identified the human-authored clue 70% of the time. The upper bound for the figure is 50% (if the generated and human-authored clues were indistinguishable) and the lower bound is 100% (if it is blindingly obvious which is which). The evaluation is now closed as I have submitted my thesis, the questions can be viewed here with the generated clues highlighted.

I also gave a list of 50 generated clues to a group of experts comprising well-known compilers, editors and commentators on cryptic crosswords who kindly agreed to look over the clues and provide some feedback. As a whole they were critical of the quality of the clues, although they all thought that some of the clues were at least passable, and in some cases good enough for publication. Overall the generated clues fell short where the system lost track of the larger picture, indeed the experts rated many of the longer clues as simply nonsensical. All of the experts pointed out that the clues were rather stale and amateurish, lacking inventiveness, wit and humour.

While the semantic selectional constraints that the system was equipped with performed relatively well at a clause and sub-clause level, the system had no strategic planning component to enable it to organise the presentation of the surface content and so the apparent meaning would often evaporate as clue length increased. It is not possible to provide the system with a conceptualisaton framework covering a large enough domain to address this problem.