Introduction
This post is entirely off-topic, and the result of feverish thinking while running a high temperature as a result of Covid-19. I think it’s quite fun, though, so if you want a complete distraction from “real” work, and hardware details, read on.
As some of you may know, Google has just announced that they are working on a language to succeed C++ which they are calling “Carbon”1. That caused some Twitter discussion on the choice of the name, which prompted me to tweet “Plus any chemist will immediately go "Carbon, that's just C, right?"".
Later that night, while slumped out, but not sleeping, with a Covid-19 induced fever, that tweet came back haunt me as the more general idea that you can write “carbon” as C, Ar, B, O, N (Carbon, Argon, Boron, Oxygen, Nitrogen), or Ca, Rb, O, N (Calcium, Rubidium, Oxygen, Nitrogen). That leads to further questions about such “elemental encodings”, which we’ll discuss below.
What is an “Elemental Encoding”?
It’s a term I just made up for the idea of encoding text using the abbreviations used for chemical elements. I’m clearly not the first person to think of this, though, Google found me this which mentions that effectively the same idea was used in the titles of Breaking Bad!
How do we Look for Them?
This is an obviously recursive problem. First you look to see if you can encode the initial one or two letters of the word, and, if you can, then you recurse to look for a solution for the rest of it.
I’ll write this in Python 3, since it’s easy to write, and this is not code that needs to run fast!
First we need a list of the elements and their encodings, which we can snarf from the Wikipedia list of Chemical elements. I put that into a Python dictionary, like this :-
elements = {
"H": "Hydrogen",
"He": "Helium",
"Li": "Lithium",
... etc ...
"Lv": "Livermorium",
"Ts": "Tennessine",
"Og": "Oganesson",
}
Using a Python dictionary makes it possible to check whether a given abbreviation is present and also return information about the matching entry. If we only wanted to show abbreviations in our output, a Python set would suffice, but I want to be able to show whole element names, so need a dictionary to hold the full name associated with each abbreviation, whereas a set would only allow me to know whether a given abbreviation exists. Since we’re using a dictionary, we could extend this to return more information about each element, such as a tuple with the name and atomic number, which would make it easy to encode into a numeric list.
Given that initialised dictionary, the recursive meat of the code looks like this :-
def attempt(useNames, text, prefixLen):
"""Try a prefix of the given length
useNames if True requests use of the whole element
name,
text is the text to encode
prefixLen is the length of the prefix to attempt
to replace.
"""
numChars = len(text)
if prefixLen > numChars:
return []
potentialElement = text[0:prefixLen].capitalize()
elementName = elements.get(potentialElement, None)
if not elementName:
# Failed to find a matching element abbreviation
return []
tag = elementName if useNames else potentialElement
if numChars == prefixLen:
# We're at the end, no need to do anything more,
return [tag]
# There's still text to encode, so recurse on that.
tailSolutions = encode(useNames,text[prefixLen:])
return [tag + "," + \
soln for soln in tailSolutions] if \
tailSolutions else []
def encode(useNames, text):
"""Return all possible encodings of the given
text into element symbols, returning a list of
encodings which are either the element abbreviations,
or their names.
"""
results = []
if len(text) > 1:
# First try for a two letter element
results = attempt(useNames, text, 2)
# Now try for a single letter solution
results += attempt(useNames, text, 1)
return results
There are a couple of fun things to notice in there:-
The fact that Python has the useful
str.capitalize()
method, which capitalises the first letter and ensures all the others are in lower case. Since the element abbreviations all start with a capital, that does what we need to do before looking up a potential abbreviation.The use of the
dict.get()
method to look up the putative abbreviation but returnNone
, rather than throwing an exception, if there is no such abbreviation.
We can then wrap that up in some more, uninteresting, code to make this a command which we can execute, and start to play with. (The whole code is here, if you want it.)
Results
Element Names
Since I started all this with questions about the names of elements, let’s answer them. As the script already knows the names of the elements, I made its default action when it has no arguments be to encode all the element names.
Like this:-
$ python3 elemental.py
Actinium: no encoding
Aluminium: no encoding
Americium: no encoding
Antimony: no encoding
Argon: no encoding
Arsenic: 2 solutions => Argon,Selenium,Nickel,Carbon; Argon,Selenium,Nitrogen,Iodine,Carbon
Astatine: 1 solution => Arsenic,Tantalum,Titanium,Neon
Barium: no encoding
...
How Many Elements Can be Elementally Encoded?
$ python3 elemental.py | grep -c solution 15
What Are Those Encodings?
$ python3 elemental.py | grep solution
Arsenic: 2 solutions => Argon,Selenium,Nickel,Carbon; Argon,Selenium,Nitrogen,Iodine,Carbon
Astatine: 1 solution => Arsenic,Tantalum,Titanium,Neon
Bismuth: 2 solutions => Bismuth,Samarium,Uranium,Thorium; Boron,Iodine,Samarium,Uranium,Thorium
Carbon: 2 solutions => Calcium,Rubidium,Oxygen,Nitrogen; Carbon,Argon,Boron,Oxygen,Nitrogen
Copper: 2 solutions => Cobalt,Phosphorus,Phosphorus,Erbium; Carbon,Oxygen,Phosphorus,Phosphorus,Erbium
Iron: 1 solution => Iridium,Oxygen,Nitrogen
Krypton: 1 solution => Krypton,Yttrium,Platinum,Oxygen,Nitrogen
Neon: 1 solution => Neon,Oxygen,Nitrogen
Oganesson: 2 solutions => Oxygen,Gallium,Neon,Sulfur,Sulfur,Oxygen,Nitrogen; Oxygen,Gallium,Nitrogen,Einsteinium,Sulfur,Oxygen,Nitrogen
Phosphorus: 6 solutions => Phosphorus,Holmium,Sulfur,Phosphorus,Holmium,Ruthenium,Sulfur; Phosphorus,Holmium,Sulfur,Phosphorus,Hydrogen,Oxygen,Ruthenium,Sulfur; Phosphorus,Hydrogen,Osmium,Phosphorus,Holmium,Ruthenium,Sulfur; Phosphorus,Hydrogen,Osmium,Phosphorus,Hydrogen,Oxygen,Ruthenium,Sulfur; Phosphorus,Hydrogen,Oxygen,Sulfur,Phosphorus,Holmium,Ruthenium,Sulfur; Phosphorus,Hydrogen,Oxygen,Sulfur,Phosphorus,Hydrogen,Oxygen,Ruthenium,Sulfur
Silicon: 4 solutions => Silicon,Lithium,Cobalt,Nitrogen; Silicon,Lithium,Carbon,Oxygen,Nitrogen; Sulfur,Iodine,Lithium,Cobalt,Nitrogen; Sulfur,Iodine,Lithium,Carbon,Oxygen,Nitrogen
Silver: 2 solutions => Silicon,Livermorium,Erbium; Sulfur,Iodine,Livermorium,Erbium
Tennessine: 4 solutions => Tellurium,Nitrogen,Neon,Sulfur,Silicon,Neon; Tellurium,Nitrogen,Neon,Sulfur,Sulfur,Iodine,Neon; Tellurium,Nitrogen,Nitrogen,Einsteinium,Silicon,Neon; Tellurium,Nitrogen,Nitrogen,Einsteinium,Sulfur,Iodine,Neon
Tin: 1 solution => Titanium,Nitrogen
Xenon: 2 solutions => Xenon,Nobelium,Nitrogen; Xenon,Nitrogen,Oxygen,Nitrogen
$
So the “most elementally encodable element” is Phosphorus, which has six distinct encodings, two elements (Silicon and Tennessine) have 4, six elements (Arsenic, Bismuth, Carbon, Copper, Oganesson and Silver) have 2, and five elements (Astatine, Iron, Krypton, Neon, and Tin) have only a single encoding.2
Encoding Other Things
Obviously you can try any words you like, so I tried encoding my name, to reach the conclusion that none of my names are encodable!
One could even develop a whole new writing constraint (“Elemental writing”) in the spirit of Oulipo or Anthony Etherin’s work using only words that are elementally encodable.
What Did We Learn
Probably not much, but perhaps these are conclusions one can draw.
Python is a good language for small text-processing pieces of code like that needed here.
You can have weird ideas (and ask weird questions) when you have a fever.
You can’t stop programmers writing code!
You should do all you can to avoid Covid-19. Get vaccinated, wear a mask, take tests, and isolate to protect others.
And, finally : Beryllium Lutetium,Carbon,Potassium,Yttrium :-)
I use UK English element name spellings (as, it seems, does that Wikipedia page), so I checked “Aluminium”, not “Aluminum”, though it makes no difference. Neither spelling is elementally encodable.