Crypto Cocktail Party : Blockchain Pt.1

Know just enough about crypto to show off at a cocktail party

Seungwoo Kim
Web3 Surfers

--

Every coin that’s listed today say that they’re changing the world. Every Web3 service on the market call themselves prophets of the new age. To critically assess these claims, and understand the bigger picture, it is essential to have at least a cocktail-party knowledge of Web3 and its various components.

And where better to start than that which enabled it all — blockchains?

The Introduction

Nothing really beats Wikipedia’s definition of a blockchain.

A blockchain is a growing list of records, called blocks, that are securely linked together using cryptography.

It’s basically a way of recording transactions in a (comparatively) immutable fashion. The key features of a blockchain are security and decentralization. Web3 products and services are built on top of blockchain technology, using it for anything and everything, ranging from simply recording transactions to automatically executing code.

Without any prior knowledge of what a blockchain is, all of the above is basically just a word dump. I just wanted to put it out there as a reference point. In this article, we’ll focus more on the security aspect of blockchains, and tackle the decentralization part in future articles.

The Analogy: an Omniscient Spreadsheet

Imagine you work at a bank, where your sole job is to keep track of every transaction that ever occurred and record it into a giant spreadsheet.

Now, this is obviously not how a bank works at all, but the idea is the same: some sort of record exists to track transactions in order to validate history, modify balances, and verify future purchases. With this disclaimer in mind, let’s proceed with our analogy.

We can expect this all-knowing spreadsheet to look something like this:

Simon originally paid $3,000 for his mocha latte

John sends Simon $10,000. Simon then promptly buys a $3,000 drink at Starbuckz, an imaginary knock-off coffee brand. The flow of cash and goods is evident — all is well.

Let’s imagine that Simon comes to his senses and realizes that maybe he didn’t get the best bang for his buck with the $3,000 mocha latte. He goes to the bank and claims that such an exchange never happened, and therefore his account should have $3,000 more. Well, too bad for Simon, because that record is written in stone in your spreadsheet.

Simon gets another idea. What’s $3,000 if you can get another $10,000 from John? He goes to the bank and declares that John never sent him the $10,000 as per their previous contract. Well, too bad for Simon, again, because that record is also written in stone in your spreadsheet.

So far, things seem pretty good.

The Problem

Simon’s not happy. With his remaining $7,000, he decides to hire the best hacker in town. Simon asks her to hack into the bank and modify his coffee price within the omniscient spreadsheet. The hacker obliges.

You come to work the next day and open up the spreadsheet:

The spreadsheet now says that Simon only paid $5 for his mocha latte. What an abomination.

Well, this is awkward. You know for a fact that Simon purchased a $3,000 drink. The people at Starbuckz know that Simon purchased a $3,000 drink. The only problem is that your single source of truth, the all-knowing spreadsheet, is dead set that Simon only paid $5. Now, Simon comes strolling in, claiming that his balance is $2,995 short. There is no way to call him out.

Or, let’s say, in an alternate universe, Simon doesn’t hire a hacker and just decides to move on. A few years pass, and your spreadsheet keeps growing and growing. Then, something terrible happens. Maybe you were sloppy with pressing the save button. Or maybe your bank’s system is malfunctioning. Or maybe it was a coordinated attack by North Korea. Who knows? One day, you open up your sheet, and everything is gone. Or everything is wrong. Or every name is replaced with Kim Jong-Un. Something happened and your sheet is past the point of no return. You’re fired, the bank files for bankruptcy, Starbuckz follows soon after, the world spirals into anarchy, and Simon finally gets justice for his overpriced mocha latte.

What then? We’re clearly in need of a better system.

Solution # 1 : Hash Functions

For starters, here’s the most glaring problem with your spreadsheet. There’s really nothing preventing somebody like Simon’s hacker from rewriting history. Anybody with access to the database can simply scroll up, and change, delete, or add a row. This is far from ideal.

Enter hash functions.

A hash function, according to Wikipedia, “is any function that can be used to map data of arbitrary size to fixed-size values,” a.k.a. hash. It is important to note that everything, from texts to images to videos, can be represented as data. Therefore, a hash function is simply a black box where you can put in anything — from your favorite TikTok to Dostoevsky’s Crime and Punishment — and get some value of fixed-size.

There are many hash functions out there, but the most popular — and the one used in most blockchains — is SHA-256, or the Secure Hash Algorithm 256. Here’s an example of SHA-256 working its magic (using this online hash tool).

hello world =>  b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9on an exceptionally hot evening... => 5d3c04352b4c6f318811675672e2423ca2861601c7ed1d5e937e11e74f8f37a0

FYI, the second input is the entirety of Crime and Punishment.

The utility of hash functions is enormous. We now have a way of representing any data, regardless of size, as a fixed-size value. For practical purposes, we can think of a data’s hash value as its fingerprint. It uniquely identifies the data in a concise and efficient way. Using hashes, we can easily compare, contrast, and verify data without having to look up the entire data. As we will see, this is an extremely useful concept for constructing immutable ledgers.

There are a few defining features of a good hash function that are worth mentioning.

Rule # 1 : One-way

You should be able to easily obtain a hash value given a piece of data by simply putting it into the hash function. This makes verifying whether a certain hash belongs to a data a trivial task.

On the other hand, it should be near-impossible to retrieve the inputted data given just its hash value.

Given the data, you can get the hash. Given the hash, you (most likely) can’t get the data

To stretch the fingerprint analogy a bit, it would be very easy to check if a fingerprint belongs to a particular person if you can just take the fingerprint of that person. However, if all you have is a fingerprint (and you do not have access to a massive fingerprint database for bidirectional mapping), it would be very difficult to figure out the person to whom it belongs.

Rule # 2 : Deterministic

Inputting the same data to the same hash function should result in the same hash, every single time. This adds to the “easy to verify” feature of hashes.

Randomness in a hash function would defeat the whole purpose. If a fingerprint belonged to me only about 80% of the time, that probably wouldn’t fly in court.

Rule # 3: Lack of collisions

A good hash function should have very few collisions. A collision is when two distinct pieces of data result in the same hash. With any hash function, there is always a chance that this will happen — there can be two people with the same fingerprint — but the probability of this should be very low.

If collisions were high, using fingerprints would be as efficient as using eye colors to determine the criminal. Unless you’re some kind of anime character, there is bound to be millions of other people who share the same eye color.

Rule # 4 : The Avalanche Effect

A good hash function should be subject to the avalanche effect. That is, a small modification to the original data should result in an unrecognizable transformation of its hash value. For example:

hello world => b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9hello vorld => dc834355d45cfc75d1ce2251c56fc28e3b4ae00adb182757b1683fb21fa380a9

Just changing the “w” to a “v” results in a vastly different hash. If this were not the case, and there was some recognizable pattern between modifications of the original data and their resulting hash values, it would be much easier to reverse-engineer the input from the output. This would violate the one-way principle above.

In summary, a hash value is a fixed-size, (mostly) unique identifier for any piece of data. It is easy to generate and verify, but very difficult to reproduce the source data from it.

Hashes as Links, or Chains

How can we use hashes to improve the security of our spreadsheet?

For every block of data, we can include the hash of the data that directly precedes it. By data, I mean everything — time, sender, receiver, you name it.

Note that the previous hash of the second row equals the hash of the first row’s data

Now, what would happen if the hacker tried to modify a past transaction in our spreadsheet? She would still be able to do that, but not without consequences. If the hacker modified the data, its hash value would be entirely different from what it was originally (the Avalanche effect). However, in our enhanced spreadsheet design, the next block of data would still contain the original hash value of the previous block, i.e. the one just modified. These won’t match, and we can quickly identify that something is wrong.

Modifying the second block changes its hash, and does not match previous hash of third block

The hacker could quickly go and fix the next block of data (block # 3), such that its previous hash now reflects the modified version. However, this would mean modifying the data of block # 3 (specifically, its previous hash value), which would completely change the hash of that block. Now, its next block (block # 4), which holds the original hash of block # 3, would show a mismatch, indicating that something is wrong.

This process goes on for as long as the chain. The hacker will have to modify the entire chain, starting from the data she wanted to modify in the first place. The further back that point is, and the longer the chain, the more difficult it would be for the hacker.

Fixing block # 3 would break block # 4, fixing block # 4 would break block # 5, so on and so forth…

The hacker would have to make all the modifications before a new block of data (transaction) is added. Otherwise, the list of blocks that she needs to modify would just keep growing and growing. If this was a robust network where many transactions occur during short periods of time, the hacker would have a very, very long night ahead.

Are we done? Hint: no

I don’t know about you, but the image above looks an awful lot like a chain of blocks, or, maybe, a blockchain.

Unfortunately, we’re not quite there yet.

The hacker could be a real dedicated craftsman. She may be so against overpriced coffee that she decides to take it on herself to modify the entire series of transactions starting from Simon’s $3,000 purchase. It’s risky, but it’s possible. After all, she just needs to modify a single database, one to which she already has access. If she’s fast enough to do it before anybody notices, she might just be able to pull it off.

So what’s next? How can you, the super rookie of the spreadsheet industry, further enhance your database to fend off miscreants like Simon and his hacker?

Stay tuned for the next article to find out!

--

--