This post is about the Cryptopals challenges, a collection of 48 cryptography challenges and my solution to them.
I’ve been looking for something to do over the weekends and came across this Reddit post from 3 years ago, asking for crypto challenges. The comments were filled with links to CTFs, wargames, and challenge sets. I started off with the top of the list.
The first module contains pretty trivial excercises but I found some to be helpful for later challenges. It’s also a good idea to build a foundation, I recommend Crypto101.
By no means am I a cryptography expert, it’s just a hobby, so take everything with a grain of salt. :)
I won’t post the flags, only my solutions.
Many of the challenges in the first set can be done using tools like online converters/crackers but you are encouraged to write code.
The solution is trivial in python:
from base64 import b64encode hex = "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d" string = bytes.fromhex(hex) print(b64encode(string))
As straightforward as it can get.
t1 = bytes.fromhex("1c0111001f010100061a024b53535009181c") t2 = bytes.fromhex("686974207468652062756c6c277320657965") xor = bytes(a ^ b for a, b in zip(t1, t2)) print(xor.hex())
I found that the code used in this challenge is pretty important. The statement hints to use frequency analysis, so I hacked up some spaghetti chi-squared testing god knows what. Needless to say I find the character frequency method to be quite inaccurate and tedious, so I simply looked for the message with only printable characters. Regex is my crude yet effective friend. c:
import re t = bytes.fromhex("1b37373331363f78151b7f2b783431333d78397828372d363c78373e\ 783a393b3736") regex = re.compile(r"[^a-zA-Z\.?!:\n' ]") for key in range(0x00, 0xff): xor = bytes(a ^ key for a in t) try: string = xor.decode('utf-8') except UnicodeDecodeError: continue if not regex.findall(string): print("Found it:", string)
Just in case you did the last one by hand, don’t. The code from #3 proved to be useful for this challenge. I downloaded the file and saved it in the same directory as my code then read it into python with:
with open('4.txt', 'r') as f: texts = f.read().splitlines()
Using the same regex I looped through to find the plaintext.
for th in texts: t = bytes.fromhex(th) for key in range(0x00, 0xff): xor = bytes(a ^ key for a in t) try: string = xor.decode('utf-8') except UnicodeDecodeError: continue if not regex.findall(string): print("Found it:", string)
I implemented this challenge with numpy thinking it would be run faster, haven’t checked but hopefully I’m right. First I’ll import numpy with the ceiling function for a later calculation.
import numpy as np from math import ceil
To convert the bytes into numpy arrays properly it needs to be made into a bytearray and dtype has to be an 8 bit unsigned integer.
t1 = bytearray(b"""Burning 'em, if you ain't quick and nimble I go crazy when I hear a cymbal""") blocks = np.array(t1, dtype="uint8") key = bytearray(b"ICE") key = np.array(key, dtype="uint8")
The cipher text is then made into a (n, 3) array (if it’s not a multiple of 3 then the columns in the last row will be padded) and xor-ed with the key.
n = len(blocks) blocks.resize((ceil(n / len(key)), len(key))) xor = np.bitwise_xor(blocks, key) print(xor.tobytes()[:n].hex())
The statement walks you through the process but you’re expected to write the code. It tells you that you need to first guess the key length using something called the edit distance. I had an intuition about cracking Vigenere but had a hard time comprehending how this method would work. The explanation I found was that the edit distance of 2 random set of bits would generally be greater than those not uniformly random.
I used the following function to compute the edit distance:
Calculate edit distance ✓
def hamming_distance(x: bytes, y: bytes) -> int: diff, xor = 0, int.from_bytes((a ^ b for a, b in zip(x, y)), 'big') while xor: diff += 1 xor &= xor - 1 return diff
We will have to break the bytes into a range of sizes and compute a normalized edit distance. The statement says that the correct key size will give the smallest normalized edit distance, but I found that it wasn’t always the case. It was after I finding the flag that I got this to work more reliably.
I used the numpy method from #4 with the instructions on the statement to crack the ciphertext. As I couldn’t find the correct length, it was bruteforced.
def crack(cipher_data: bytes, key_size: int) -> (bytes, bytes): fits = [(999999, 0)] * key_size # Preprocess cipher data into blocks cipher_bytes = bytearray(cipher_data) blocks = np.array(cipher_bytes, dtype='uint8') blocks.resize((ceil(len(blocks) / key_size), key_size)) # Transpose blocks blocks_tp = np.transpose(blocks) for i in range(0x0, 0xff): xor = np.bitwise_xor(blocks_tp, np.array([i] * blocks_tp.shape, dtype='uint8')) for j, row in enumerate(xor): try: string = row.tobytes().decode('utf-8') regex = findall(r"[^a-zA-Z\.?!\n': ]", string) if len(regex) < fits[j]: fits[j] = (len(regex), i) except ValueError: pass key = np.array(fits, dtype='uint8') decode = np.bitwise_xor(blocks, key[:,1]) return key.tobytes(), decode.tobytes()
After cracking I realised that determining the key size would be more accurate if I averaged more edit distances. Eventually I got the key size and the closure I needed.
def key_size(cipher_data: bytes, start=2, end=40) -> list: ''' Determines the key size for the given ciphertext by finding the smallest normalized Hamming distance ''' candidates_stack =  for size in range(start, end): blocks = [cipher_data[i: i + size] for i in range(0, len(cipher_data), size)] ed = 0 i = 0 for a, b in zip(blocks, blocks[1:]): try: ed += hamming_distance(a, b) i += 1 except AssertionError: break normalize = ed / (i * size) candidates_stack.append((size, normalize)) candidates_stack.sort(key=lambda x: x) return candidates_stack
me: import more modules than China importing planes.
Ya no. We only need the cryptography module. I’ve read about this mode in a book, it’s not a particular good mode when the section mentioning it has “naive” in its title. In ECB mode, the message is divided into blocks and encrypted individually without any additional actions. The issue here is that identical blocks will always map to the same output block, which we will see in #8.
The Base64 file had to be converted, I did so with OpenSSL.
openssl enc -d -in 7.txt -out challenge_7_data.bin -base64
Then this snippet gave me the result:
from cryptography.hazmat.primitives.ciphers import Cipher from cryptography.hazmat.primitives.ciphers.algorithms import AES from cryptography.hazmat.primitives.ciphers.modes import ECB from cryptography.hazmat.backends import default_backend # Setup parameters for Cipher backend = default_backend() key = b'YELLOW SUBMARINE' # Read ciphertext bytes with open('challenge_7_data.bin', 'rb') as f: ciphertext = f.read() # Setup cipher instance cipher = Cipher(AES(key), ECB(), backend) dcrpt = cipher.decryptor() plaintext = dcrpt.update(ciphertext) + dcrpt.finalize() print(plaintext.decode('utf-8'))
The final challenge for the 1st set is about spotting a ciphertext encrypted in ECB. Using the intuition that in ECB mode, blocks will be duplicated, the solution seemed apparent to me. My approach is to simply find an inconsistent number of unique blocks.
I saved the file into the same directory as my code then wrote the solution.
with open('challenge_8.txt', 'r') as f: ciphertexts = map(bytes.fromhex, f.read().splitlines()) block_size = 16 for ciphertext in ciphertexts: n = len(ciphertext) blocks = set(ciphertext[i:i + block_size] for i in range(0, n, block_size)) if len(blocks) < n / block_size: print("Found: ", ciphertext)
The challenges are pretty straightforward, mostly about implementation so far. For someone who hasn’t taken a course on cryptography, I think it’s a great way to learn. I’ll continue with a writeup on the next module and maybe implement the solutions in another language (I’m thinking Go), in the future.