LeetCode 726. Number of Atoms
Given a chemical formula (given as a string), return the count of each atom.
An atomic element always starts with an uppercase character, then zero or more lowercase letters, representing the name.
1 or more digits representing the count of that element may follow if the count is greater than 1. If the count is 1, no digits will follow. For example,
H2OandH2O2are possible, butH1O2is impossible.Two formulas concatenated together produce another formula. For example,
H2O2He3Mg4is also a formula.A formula placed in parentheses, and a count (optionally added) is also a formula. For example,
(H2O2)and(H2O2)3are formulas.Given a formula, output the count of all elements as a string in the following form: the first name (in sorted order), followed by its count (if that count is more than 1), followed by the second name (in sorted order), followed by its count (if that count is more than 1), and so on.
Formula =
K4(ON(SO3)2)2Output:
K4N2O14S4
Solution:
There are three cases then needs to be handled:
In case it is
)then next character can be either(or a digit. In both cases we multiply by either 1 or the digit. Before multiplying, add all the elements till(is encountered in stack.In case of
(then just add to stack.In case of any other character then just add to stack in the form of dictionary with its count added as value and element as key.
In the end just add all elements in the stack and sort it.
import reclass Solution: def countOfAtoms(self, formula): def add_up(formulas): ''' add up all the elements provided in a list''' result = {} for i in range(len(formulas)): result = {f: formulas[i].get(f, 0) + result.get(f, 0) for f in set(result).union(formulas[i])} return result def mul_up(formula, factor): '''multiply the formula with the factor''' result = {} for k, v in formula.iteritems(): result[k] = v*factor return result def get_matches(regex, string): return re.match(regex, string) formula = "(" + formula + ")" stack = [] i = 0 while i < len(formula): c = formula[i] if c == "(": stack.append(c) i += 1 continue elif (c == ")" and i == len(formula)-1): cc = stack.pop() list_add_open_element = [] while cc != "(": list_add_open_element.append(cc) cc = stack.pop() stack.append(add_up(list_add_open_element)) i += 1 continue elif (c == ")" and (i+1 < len(formula) and formula[i+1].isdigit())) or (c == ")" and (i+1 < len(formula) and formula[i+1] == "(")): cc = stack.pop() list_add_open_element = [] #list to add all element up to ( while cc != "(": list_add_open_element.append(cc) cc = stack.pop() r = add_up(list_add_open_element) j = i+1 factor = "" while j < len(formula) and formula[j].isdigit(): factor += formula[j] j += 1 if not factor: factor = 1 i += 1 else: i = j cc = mul_up(r, int(factor)) stack.append(cc) continue element = get_matches(ur"([A-Z][a-z]?\d*)", formula[i:]) key = get_matches(ur"([A-Z][a-z]?)", element.group()) value = re.search(ur"([0-9]+)", element.group()) stack.append({key.group():(1 if not value else int(value.group()))}) i += len(key.group()) + (len(value.group()) if value else 0) result = [(k, v) for k, v in stack[0].iteritems()] result.sort() return "".join(str(i) + (str(j) if int(j) > 1 else "") for i,j in result)1 Answer1
You have a couple of good ideas:
- Using regular expressions to assist with the parsing
- Splitting out some of the code into smaller functions
Unfortunately, those ideas were ineffectively applied, such that the main code is still very complicated and difficult to follow.
Iterating through the formula character by character (using yourwhile i < len(formula) loop) defeats the purpose of using regular expressions. Furthermore, you should not need tests likec == "(",c == ")", and.isdigit().
Rather, the parsing should be mainly driven byre.finditer(), using one regular expression that is constructed to classify everything that it can encounter:
- atomic element (possibly followed by a number)
- opening parenthesis
- closing parenthesis (possibly followed by a number)
Each of those tokens should have a named capture group to help you figure out what was matched.
Strictly speaking, your regex"([A-Z][a-z]?)" makes an assumption that does not meet the specification:
All atom names consist of lowercase letters, except for the first character which is uppercase.
Three-letter element abbreviations such asUue are possible.
Finally, youradd_up() function did not need to be written — you've just reinvented the addition method ofcollections.Counter.
Suggested solution
Unfortunately, LeetCode expects the solution to be packaged inside a weirdSolution class that is not really a class, but a namespace. (It calls thecountOfAtoms() "method" even though there is no meaningful constructor.) I've decided to tweak it into a@classmethod instead.
from collections import Counterimport reclass Solution(object): RE = re.compile( r'(?P<atom>[A-Z][a-z]*)(?P<atom_count>\d*)|' r'(?P<new_group>\()|' r'\)(?P<group_count>\d*)|' r'(?P<UNEXPECTED_CHARACTER_IN_FORMULA>.+)' ) @classmethod def atom_count(cls, stack, atom, atom_count='', **_): """Handle an atom with an optional count, e.g. H or Mg2""" stack[-1][atom] += (1 if atom_count == '' else int(atom_count)) @classmethod def new_group(cls, stack, **_): """Handle an opening parenthesis""" stack.append(Counter()) @classmethod def group_count(cls, stack, group_count='', **_): """Handle a closing parenthesis with an optional group count""" group_count = 1 if group_count == '' else int(group_count) group = stack.pop() for atom in group: group[atom] *= group_count stack[-1] += group @classmethod def countOfAtoms(cls, formula): stack = [] cls.new_group(stack) for m in cls.RE.finditer(formula): getattr(cls, m.lastgroup)(stack, **m.groupdict()) return ''.join( atom + (str(count) if count > 1 else '') for atom, count in sorted(stack.pop().items()) )You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.

