Cryptopals Challenges Part 1: Repeated XOR, AES in ECB and more

This post is about the Cryptopals challenges, a collection of 48 cryptography challenges and my solution to them.

I’ve been looking for something to do over the weekends and came across this Reddit post from 3 years ago, asking for crypto challenges. The comments were filled with links to CTFs, wargames, and challenge sets. I started off with the top of the list.

The first module contains pretty trivial excercises but I found some to be helpful for later challenges. It’s also a good idea to build a foundation, I recommend Crypto101.

By no means am I a cryptography expert, it’s just a hobby, so take everything with a grain of salt. :)

I won’t post the flags, only my solutions.

Spoilers ahead!

Hex to base64

Many of the challenges in the first set can be done using tools like online converters/crackers but you are encouraged to write code.

The solution is trivial in python:

from base64 import b64encode

hex = "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d"

string = bytes.fromhex(hex)

print(b64encode(string))

Fixed XOR

As straightforward as it can get.

t1 = bytes.fromhex("1c0111001f010100061a024b53535009181c")
t2 = bytes.fromhex("686974207468652062756c6c277320657965")

xor = bytes(a ^ b for a, b in zip(t1, t2))
print(xor.hex())

Single byte-xor

I found that the code used in this challenge is pretty important. The statement hints to use frequency analysis, so I hacked up some spaghetti chi-squared testing god knows what. Needless to say I find the character frequency method to be quite inaccurate and tedious, so I simply looked for the message with only printable characters. Regex is my crude yet effective friend. c:

import re

t = bytes.fromhex("1b37373331363f78151b7f2b783431333d78397828372d363c78373e\
783a393b3736")
regex = re.compile(r"[^a-zA-Z\.?!:\n' ]")

for key in range(0x00, 0xff):
	xor = bytes(a ^ key for a in t)
	try:
		string = xor.decode('utf-8')
	except UnicodeDecodeError:
		continue
	if not regex.findall(string):
		print("Found it:", string)

Loads of single byte-xor

Just in case you did the last one by hand, don’t. The code from #3 proved to be useful for this challenge. I downloaded the file and saved it in the same directory as my code then read it into python with:

with open('4.txt', 'r') as f:
	texts = f.read().splitlines()

Using the same regex I looped through to find the plaintext.

for th in texts:
    t = bytes.fromhex(th)
    for key in range(0x00, 0xff):
        xor = bytes(a ^ key for a in t)
        try:
            string = xor.decode('utf-8')
        except UnicodeDecodeError:
            continue
        if not regex.findall(string):
            print("Found it:", string)

What’s this? Vigenere!?

I implemented this challenge with numpy thinking it would be run faster, haven’t checked but hopefully I’m right. First I’ll import numpy with the ceiling function for a later calculation.

import numpy as np
from math import ceil

To convert the bytes into numpy arrays properly it needs to be made into a bytearray and dtype has to be an 8 bit unsigned integer.

t1 = bytearray(b"""Burning 'em, if you ain't quick and nimble
I go crazy when I hear a cymbal""")
blocks = np.array(t1, dtype="uint8")

key = bytearray(b"ICE")
key = np.array(key, dtype="uint8")

The cipher text is then made into a (n, 3) array (if it’s not a multiple of 3 then the columns in the last row will be padded) and xor-ed with the key.

n = len(blocks)

blocks.resize((ceil(n / len(key)), len(key)))

xor = np.bitwise_xor(blocks, key)
print(xor.tobytes()[:n].hex())

First Crack

The statement walks you through the process but you’re expected to write the code. It tells you that you need to first guess the key length using something called the edit distance. I had an intuition about cracking Vigenere but had a hard time comprehending how this method would work. The explanation I found was that the edit distance of 2 random set of bits would generally be greater than those not uniformly random.

I used the following function to compute the edit distance:

Calculate edit distance ✓

def hamming_distance(x: bytes, y: bytes) -> int:
    diff, xor = 0, int.from_bytes((a ^ b for a, b in zip(x, y)), 'big')
    while xor:
        diff += 1
        xor &= xor - 1
    return diff

We will have to break the bytes into a range of sizes and compute a normalized edit distance. The statement says that the correct key size will give the smallest normalized edit distance, but I found that it wasn’t always the case. It was after I finding the flag that I got this to work more reliably.

I used the numpy method from #4 with the instructions on the statement to crack the ciphertext. As I couldn’t find the correct length, it was bruteforced.

def crack(cipher_data: bytes, key_size: int) -> (bytes, bytes):
    fits = [(999999, 0)] * key_size
    
    # Preprocess cipher data into blocks
    cipher_bytes = bytearray(cipher_data)
    blocks = np.array(cipher_bytes, dtype='uint8')
    blocks.resize((ceil(len(blocks) / key_size), key_size))
    
    # Transpose blocks
    blocks_tp = np.transpose(blocks)
    for i in range(0x0, 0xff):
        xor = np.bitwise_xor(blocks_tp,
                             np.array([i] * blocks_tp.shape[1], dtype='uint8'))
        for j, row in enumerate(xor):
            try:
                string = row.tobytes().decode('utf-8')
                regex = findall(r"[^a-zA-Z\.?!\n': ]", string)
                if len(regex) < fits[j][0]:
                    fits[j] = (len(regex), i)
            except ValueError:
                pass
    key = np.array(fits, dtype='uint8')
    decode = np.bitwise_xor(blocks, key[:,1])
    return key.tobytes(), decode.tobytes()

After cracking I realised that determining the key size would be more accurate if I averaged more edit distances. Eventually I got the key size and the closure I needed.

def key_size(cipher_data: bytes, start=2, end=40) -> list:
    '''
    Determines the key size for the given ciphertext by finding the
    smallest normalized Hamming distance
    '''
    candidates_stack = []
    for size in range(start, end):
        blocks = [cipher_data[i: i + size] for i in range(0, len(cipher_data), size)]
        ed = 0
        i = 0
        for a, b in zip(blocks, blocks[1:]):
            try:
                ed += hamming_distance(a, b)
                i += 1
            except AssertionError:
                break
        normalize = ed / (i * size)
        candidates_stack.append((size, normalize))
    candidates_stack.sort(key=lambda x: x[1])
    return candidates_stack

AES in the mode that shall not be named

me: import more modules than China importing planes.

Ya no. We only need the cryptography module. I’ve read about this mode in a book, it’s not a particular good mode when the section mentioning it has “naive” in its title. In ECB mode, the message is divided into blocks and encrypted individually without any additional actions. The issue here is that identical blocks will always map to the same output block, which we will see in #8.

The Base64 file had to be converted, I did so with OpenSSL.

openssl enc -d -in 7.txt -out challenge_7_data.bin -base64

Then this snippet gave me the result:

from cryptography.hazmat.primitives.ciphers import Cipher
from cryptography.hazmat.primitives.ciphers.algorithms import AES
from cryptography.hazmat.primitives.ciphers.modes import ECB
from cryptography.hazmat.backends import default_backend

# Setup parameters for Cipher
backend = default_backend()
key = b'YELLOW SUBMARINE'

# Read ciphertext bytes
with open('challenge_7_data.bin', 'rb') as f:
    ciphertext = f.read()

# Setup cipher instance
cipher = Cipher(AES(key), ECB(), backend)
dcrpt = cipher.decryptor()

plaintext = dcrpt.update(ciphertext) + dcrpt.finalize()
print(plaintext.decode('utf-8'))

Graduating from bootcamp?

The final challenge for the 1st set is about spotting a ciphertext encrypted in ECB. Using the intuition that in ECB mode, blocks will be duplicated, the solution seemed apparent to me. My approach is to simply find an inconsistent number of unique blocks.

I saved the file into the same directory as my code then wrote the solution.

with open('challenge_8.txt', 'r') as f:
    ciphertexts = map(bytes.fromhex, f.read().splitlines())

block_size = 16

for ciphertext in ciphertexts:
    n = len(ciphertext)
    blocks = set(ciphertext[i:i + block_size] for i in range(0, n, block_size))
    if len(blocks) < n / block_size:
        print("Found: ", ciphertext)

Closing thoughts

The challenges are pretty straightforward, mostly about implementation so far. For someone who hasn’t taken a course on cryptography, I think it’s a great way to learn. I’ll continue with a writeup on the next module and maybe implement the solutions in another language (I’m thinking Go), in the future.

nankeen