[Advent of Code 2021] Day 14 | Extended Polymerization

Part 1

The incredible pressures at this depth are starting to put a strain on your submarine. The submarine has polymerization equipment that would produce suitable materials to reinforce the submarine, and the nearby volcanically-active caves should even have the necessary input elements in sufficient quantities.

The submarine manual contains instructions for finding the optimal polymer formula; specifically, it offers a polymer template and a list of pair insertion rules (your puzzle input). You just need to work out what polymer would result after repeating the pair insertion process a few times.

For example:

NNCB

CH -> B
HH -> N
CB -> H
NH -> C
HB -> C
HC -> B
HN -> C
NN -> C
BH -> H
NC -> B
NB -> B
BN -> B
BB -> N
BC -> B
CC -> N
CN -> C

The first line is the polymer template - this is the starting point of the process.

The following section defines the pair insertion rules. A rule like AB -> C means that when elements A and B are immediately adjacent, element C should be inserted between them. These insertions all happen simultaneously.

So, starting with the polymer template NNCB, the first step simultaneously considers all three pairs:

The first pair (NN) matches the rule NN -> C, so element _C_ is inserted between the first N and the second N.
The second pair (NC) matches the rule NC -> B, so element _B_ is inserted between the N and the C.
The third pair (CB) matches the rule CB -> H, so element _H_ is inserted between the C and the B.

Note that these pairs overlap: the second element of one pair is the first element of the next pair. Also, because all pairs are considered simultaneously, inserted elements are not considered to be part of a pair until the next step.

After the first step of this process, the polymer becomes N_C_N_B_C_H_B.

Here are the results of a few steps using the above rules:

Template:     NNCB
After step 1: NCNBCHB
After step 2: NBCCNBBBCBHCB
After step 3: NBBBCNCCNBBNBNBBCHBHHBCHB
After step 4: NBBNBNBBCCNBCNCCNBBNBBNBBBNBBNBBCBHCBHHNHCBBCBHCB

This polymer grows quickly. After step 5, it has length 97; After step 10, it has length 3073. After step 10, B occurs 1749 times, C occurs 298 times, H occurs 161 times, and N occurs 865 times; taking the quantity of the most common element (B, 1749) and subtracting the quantity of the least common element (H, 161) produces 1749 - 161 = _1588_.

Apply 10 steps of pair insertion to the polymer template and find the most and least common elements in the result. What do you get if you take the quantity of the most common element and subtract the quantity of the least common element?

Proposed solution: create an insertion map for the polymers from the input data, take the initial template and apply 10 steps of the process; generate a new string each round with the inserted values

Time complexity: O(2ⁿ) where n is the number of steps in the process

Space complexity: O(2ⁿ)

Each step of the process roughly doubles the length of the string. Both time and space complexity are exponential functions.

#!/usr/bin/env python3
import sys

if len(sys.argv) != 2:
    print("Usage: {} <input file>".format(sys.argv[0]))
    sys.exit(1)

file_input = open(sys.argv[1], "r").read().strip().split("\n")

insertion_map = {}
for line in file_input[2:]:
    if line == "":
        continue
    template, insertion = line.split()[0], line.split()[-1]
    insertion_map[template] = insertion

start = file_input[0]
for i in range(10):
    result = ""
    for j in range(len(str(start)) - 1):
        template = start[j:j+2]
        if j == 0:
            result += template[0]
        result += insertion_map[template] + template[1]
    start = result

freq_count = {char:start.count(char) for char in set(start)}

least = min(freq_count.values())
most = max(freq_count.values())
print("Most - Least common = {}".format(most - least))

Exponential growth oh no

This works fine mainly because the number of steps is relatively small and 2¹⁰ is only 1024. The first block is populating the insertion dictionary mapping the input chars to their respective output characters.

insertion_map = {}
for line in file_input[2:]:
    if line == "":
        continue
    template, insertion = line.split()[0], line.split()[-1]
    insertion_map[template] = insertion

Data structure setup

The next block iterates through the steps; during the first step the first character of the string is added to the final result. Each step including the first, the insertion character is appended followed by the second character in the template.

start = file_input[0]
for i in range(10):
    result = ""
    for j in range(len(str(start)) - 1):
        template = start[j:j+2]
        if j == 0:
            result += template[0]
        result += insertion_map[template] + template[1]
    start = result

freq_count = {char:start.count(char) for char in set(start)}

least = min(freq_count.values())
most = max(freq_count.values())

This might get out of control...

Calculate the least and most frequent characters in the final string.

❯ python3 solution14.py input14
Most - Least common = 2937

Part 2

The resulting polymer isn't nearly strong enough to reinforce the submarine. You'll need to run more steps of the pair insertion process; a total of 40 steps should do it.

In the above example, the most common element is B (occurring 2192039569602 times) and the least common element is H (occurring 3849876073 times); subtracting these produces _2188189693529_.

Apply 40 steps of pair insertion to the polymer template and find the most and least common elements in the result. What do you get if you take the quantity of the most common element and subtract the quantity of the least common element?

Proposed solution: the previous solution will not work as we might encounter the heat death of the universe before the calculation is complete – or just run out of memory; however, similar to what I did in day 6, we can keep track of the individual states and their frequencies instead of the entire string; each single two-letter state will map to two individual two-letter states as output for a single step

Time complexity: O(n)

Space complexity: O(1)

The number of unique characters that can appear in the polymer has some impact on the space/time but it is by definition not a scaling factor within this problem so we consider this constant.

#!/usr/bin/env python3
import sys

if len(sys.argv) != 2:
    print("Usage: {} <input file>".format(sys.argv[0]))
    sys.exit(1)

file_input = open(sys.argv[1], "r").read().strip().split("\n")

insertion_map = {}
for line in file_input[2:]:
    if line == "":
        continue
    template, insertion = line.split()[0], line.split()[-1]
    insertion_map[template] = insertion

start = file_input[0]
templates = {}
for j in range(len(start)-1):
    template = start[j:j+2]
    templates[template] = templates.get(template, 0) + 1

freq_count = {char:start.count(char) for char in set(start)}
for i in range(40):
    for template, freq in dict(templates).items():
        templates[template] -= freq
        insertion = insertion_map[template]
        first, second = template[0] + insertion, insertion + template[1]
        templates[first] = templates.get(first, 0) + freq
        templates[second] = templates.get(second, 0) + freq
        freq_count[insertion] = freq_count.get(insertion, 0) + freq

least = min(freq_count.values())
most = max(freq_count.values())
print("Most - Least common = {}".format(most - least))

From exponential to linear and constant space

Our new frequency count dictionary contains the mapping between any two adjacent characters (in the polymer) and their frequencies. Each step, these frequencies are negated in favor of their respective outputs. For example, if the string "AB" appeared 30 times in the current step for the polymer, "AB" maps to "C" for the insertion character, and no other strings produce "AB" this step, then our freq_count dictionary would contain the following:

{
  ...
  "AB": 0,
  "AC": 30,
  "CB": 30,
  ...
}

This solution is very similar to the other exponential growth problem in Advent of Code 2021 Day 6 – Lanternfish. Instead of keeping track of all copies of the objects, the number of distinct states are limited so it's more efficient to track these and their respective frequencies. This only works because each state has a single path to the next state.

❯ python3 solution14.py input14
Most - Least common = 3390034818249

It computes!