Posted on • Originally published atblog.ianturton.com on
Finding anagrams of place names (in GB)
A little while agoAlasdair Rae asked if any one had combined an anagram engine with a list of place names.
Well, no one stepped forward so I thought it could be a fun project. And, it turns out it is quite fun though I got to think about data structures rather more than geography, but that is probably good for me.
I made the assumption that Alasdair was probably not interested in just permutations of letters but wanted actual words (such as would be used in a crossword clue). I also limited my search to single word anagrams as I can’t see a simple solution to finding multi word solutions.
First I stuffed the Ordnance Survey’s OpenNames data set into PostGIS (as who wants to be scanning hundreds of little csv files).
I then set up aGeoTool’sPostGIS datastore and grabbed the populated places.
Map<String, Object> params = new HashMap<String, Object>(); params.put(PostgisNGDataStoreFactory.DBTYPE.key, PostgisNGDataStoreFactory.DBTYPE.sample); params.put(PostgisNGDataStoreFactory.USER.key, "username"); params.put(PostgisNGDataStoreFactory.PASSWD.key, "password"); params.put(PostgisNGDataStoreFactory.SCHEMA.key, "opennames"); params.put(PostgisNGDataStoreFactory.DATABASE.key, "osdata"); params.put(PostgisNGDataStoreFactory.HOST.key, "127.0.0.1"); params.put(PostgisNGDataStoreFactory.PORT.key, "5432"); DataStore ds = DataStoreFinder.getDataStore(params); if (ds == null) { throw new RuntimeException("No datastore"); } SimpleFeatureSource fs = ds.getFeatureSource("opennames"); SimpleFeatureCollection features = fs.getFeatures(CQL.toFilter("type = 'populatedPlace'"));
I tried a naive approach of recursively finding every anagram possible from the name and looking each one up in aHashMap
of English words. Oddly, this took a long time so I thought (and Googled) some more and came up with the much more efficient way of sorting the letters in a word and using that as a key to all words that contained those letters. Then I could sort each place name’s letters and do a single lookup to find all the possible words that could be made with those letters. That speeded things up nicely.
To build the lookup table I made use of Google’sHashMultimap
(fromGuava) which allows you to create aMap
ofCollections
keyed on aString
.
private Map<String, Collection<String>> dict; public AnagramLookup() throws FileNotFoundException, IOException { //change this to point to your dictionary (one word per line) File f = new File("/usr/share/dict/british-english"); HashMultimap<String, String> indexedDictionary = HashMultimap.create(); try (BufferedReader buf = new BufferedReader(new FileReader(f))) { String line; // read each word in the dictionary while ((line = buf.readLine()) != null) { //strip out non letters String word = line.toLowerCase().replaceAll("\\W", ""); //store the word against the sorted key indexedDictionary.put(sort(word), word); } } dict = indexedDictionary.asMap(); }
Then all that is left to do is to iterate each populated place, grab it’s name and then remove all the non-letters and sort it’s letters and look the anagrams in theHashMap
. The final trick is to remove the name itself if it appears in the list of anagrams (i.e. the name itself is an English word).
try (SimpleFeatureIterator itr = features.features()) { while (itr.hasNext()) { SimpleFeature f = itr.next(); String name = (String) f.getAttribute("name1"); current = name.toLowerCase().replaceAll("\\W", ""); Collection<String> anagrams = getAnagrams(current); if(anagrams!=null && !anagrams.isEmpty()) { //remove the name itself if it happens to be a word anagrams.remove(current); if(!anagrams.isEmpty()) { results.put(name, new TreeSet<String>(anagrams)); } } } }
Results
It turns out that there are 6 11 letter anagrams for the list of GB place names.
- Balnadelson - belladonnas
- Fortis Green - reforesting
- Gilling East - legislating
- Green Plains - spenglerian
- Morningside - modernising
- Sharrington - harringtons
- Stone Corner - cornerstone
ASpenglerian is “of or relating to the theory of world history developed by Oswald Spengler which holds that all major cultures undergo similar cyclical developments from birth to maturity to decay”. While aHarrington is “a man’s short lightweight jacket with a collar and a zipped front.”
Other highlights for crossword setters include Aimes Green as menageries and Westlinton as tinseltown.
I have posted thefull list of anagrams and thecode to generate the list.
See thisfollow up for world names.
Top comments(0)
For further actions, you may consider blocking this person and/orreporting abuse