Hash Functions: A Developer's Guide
What hashing actually is, why the properties matter, which algorithms are safe and which aren't, and how to use them correctly in real code — including the critical distinction between cryptographic hashes and password hashing.
What Hashing Is
A hash function takes an input of any size and produces a fixed-size output — the hash, digest, or checksum. Think of it as a fingerprint for data.
SHA-256("Hello, world!")
= "315f5bdb76d078c43b8ac0064e4a0164612b1fce77c869345bfc94c75894edd3"
SHA-256("Hello, world?") // one character different
= "74a3435c33e3c7e41e3b55f92b59fbc62e19b68dbe80c6c717efaa5b5d1c6b18"
SHA-256("") // empty string still produces a hash
= "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"The output length is always the same regardless of input size — SHA-256 always produces 64 hex characters (256 bits), whether you hash a single byte or a 10GB file.
Crucially, hashing is one-way. Given the hash output, you cannot reconstruct the input. This is different from encryption, which is designed to be reversed with a key. You hash things you never need to retrieve; you encrypt things you do.
The Four Properties That Matter
Cryptographic hash functions need to satisfy specific mathematical properties to be useful for security. Not all hashes have all of these — and which ones you need depends on your use case.
1. Deterministic
Same input, always same output. No randomness. This is how verification works — if you hash a file today and again tomorrow and get the same result, the file hasn't changed.
2. Avalanche Effect
A single bit change in the input should flip approximately half the bits in the output. This is not obvious behavior — you might expect a small change to produce a similar output. The opposite is true of good hash functions.
SHA-256("The quick brown fox")
= d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592
SHA-256("The quick brown Fox") // capital F
= 7c2949f77b5a49efa03af1b7c5f8e4b8f2e8174fe60e42a3a64de9b68c88f965
# Every bit is scrambled — you learn nothing about the input from the output3. Pre-image Resistance
Given a hash, it should be computationally infeasible to find any input that produces it. This is the "one-way" property. If you see the hash 315f5bdb..., you can't work backwards to find the input was "Hello, world!" — except by brute force (trying every possible input).
4. Collision Resistance
It should be infeasible to find two different inputs that produce the same hash. Since hash functions map infinite inputs to a fixed-size output, collisions mathematically must exist — but finding them should require astronomical computation.
// This should never happen in practice for a good hash:
hash("input A") === hash("input B") // where A !== B
// For MD5 and SHA-1, this is now achievable:
// Researchers have demonstrated crafted collision files in hours/minutesThe difference between "mathematically must exist" and "practically findable" is what separates broken from secure.
MD5 and Why It's Broken
MD5 (Message Digest 5) was designed by Ron Rivest in 1991. For a decade it was the go-to hash. Then the cracks started.
In 2004, researchers demonstrated the first practical MD5 collision — two different inputs producing the same 128-bit hash. By 2008, using cheap hardware, they created a rogue CA certificate trusted by all browsers by exploiting MD5 collisions. By 2012, the Flame malware forged Microsoft code-signing certificates using MD5 collisions.
| Output size | 128 bits (32 hex chars) |
| Speed | ~500 MB/s per core — very fast |
| Collision resistance | Broken. Collisions findable in seconds. |
| Pre-image resistance | Theoretically weakened but not yet broken in practice |
When MD5 is still fine: Non-security checksums where you're detecting accidental corruption (not tampering). Git used SHA-1 for content hashing for years — not for security, but for data integrity and deduplication. MD5 serves this use case and is faster than SHA-256.
When MD5 is never acceptable: Passwords. Digital signatures. File authenticity verification where tampering is a concern. TLS certificates. Anywhere an adversary might craft inputs.
Generate and verify MD5 checksums with our MD5 Generator.
SHA-1, SHA-256, SHA-512
SHA-1: Deprecated
SHA-1 produces a 160-bit (40 hex char) hash. For years it was the standard for TLS certificates, code signing, and Git. In 2017, Google's Project Zero team demonstrated the "SHAttered" attack — the first real-world SHA-1 collision, producing two different PDF files with identical SHA-1 hashes. The computation cost was roughly $100k in Google Cloud compute.
SHA-1 is now formally deprecated for security use. Most browsers reject SHA-1 certificates. Git is migrating to SHA-256. Legacy compatibility is the only remaining reason to encounter it.
SHA-256: The Current Standard
SHA-256 is part of the SHA-2 family, published by NIST in 2001. It produces a 256-bit (64 hex char) hash. No known practical attacks exist. It's used in TLS, Bitcoin's proof-of-work, code signing, and most modern security applications.
| Output size | 256 bits (64 hex chars) |
| Security level | 128-bit — considered unbreakable with current computing |
| Speed | ~200 MB/s per core |
| Use for | File integrity, digital signatures, HMAC, general security |
Generate SHA-256 hashes with our SHA-256 Generator.
SHA-512: More Margin
SHA-512 produces 512 bits (128 hex chars). It's not "more secure than SHA-256 needs to be" in any practical sense — 128-bit security is already beyond realistic attack. The reason to prefer it is that SHA-512 is internally 64-bit and actually runs faster than SHA-256 on 64-bit hardware, despite the larger output. It also has a larger security margin if you're paranoid about future advances.
| Output size | 512 bits (128 hex chars) |
| Security level | 256-bit — maximum practical security |
| Speed | ~300 MB/s per core on 64-bit (faster than SHA-256) |
| Use for | When SHA-256 isn't enough, high-security applications |
Generate SHA-512 hashes with our SHA-512 Generator.
SHA-3: Algorithm Diversity
SHA-3 uses an entirely different construction (Keccak sponge function) from SHA-2. It's not "better" in any measurable way for current use — SHA-256 has no known weaknesses. The value is algorithmic diversity: if a catastrophic flaw were found in SHA-2, SHA-3 would be unaffected. Organizations with very long-term security needs (cryptographic protocols that need to outlast hardware advances) may prefer SHA-3.
Comparison Table
| Algorithm | Output | Status | When to use |
|---|---|---|---|
| MD5 | 128 bit | Broken | Non-security checksums only |
| SHA-1 | 160 bit | Deprecated | Legacy compatibility only |
| SHA-256 | 256 bit | Current standard | General security, signatures, integrity |
| SHA-512 | 512 bit | Strong | High security, 64-bit systems |
| SHA-3-256 | 256 bit | Alternative | Algorithm diversity, long-term |
HMAC: Authenticated Hashing
A plain hash verifies data integrity — it tells you the data hasn't changed. But it doesn't tell you who created the hash. HMAC (Hash-based Message Authentication Code) adds a secret key to the hash, turning it into an authentication mechanism.
HMAC(key, message) = H(key XOR opad || H(key XOR ipad || message))
// In practice: only a party who knows the key can produce the correct HMAC
// Verifying: re-compute HMAC with your key, compare with constant-time comparisonHMAC is how JWT signatures work (HS256 = HMAC-SHA256), how API authentication tokens work, and how many session tokens are validated. The key insight: without knowing the key, you can't forge a valid HMAC even if you know the message and the hash algorithm.
// Node.js
const crypto = require('crypto');
function createHMAC(key, message) {
return crypto.createHmac('sha256', key)
.update(message)
.digest('hex');
}
function verifyHMAC(key, message, providedHMAC) {
const expected = createHMAC(key, message);
// IMPORTANT: use timingSafeEqual to prevent timing attacks
return crypto.timingSafeEqual(
Buffer.from(expected),
Buffer.from(providedHMAC)
);
}
const token = createHMAC('secret-key', 'user:12345:admin:false');
// "a3f4e..." — changes completely if any part of message or key changes# Python
import hmac, hashlib, secrets
def create_hmac(key: bytes, message: str) -> str:
return hmac.new(key, message.encode(), hashlib.sha256).hexdigest()
def verify_hmac(key: bytes, message: str, provided: str) -> bool:
expected = create_hmac(key, message)
# hmac.compare_digest is timing-safe
return hmac.compare_digest(expected, provided)==) returns early when it finds the first mismatch, leaking information about where the strings differ. crypto.timingSafeEqual() in Node.js and hmac.compare_digest() in Python always take the same time regardless.bcrypt and Argon2 for Passwords
This is the most important distinction in this article: cryptographic hash functions (SHA-256, etc.) are the wrong tool for hashing passwords.
They're too fast. On a single GPU you can compute billions of SHA-256 hashes per second. That means an attacker with a stolen password database and a GPU farm can try every password in a 10-character space in hours. SHA-256 doesn't add salt by default. And it doesn't have a cost parameter you can tune as hardware improves.
Password hashing algorithms are deliberately different: slow, memory-hard, and include built-in salting and work factors.
bcrypt
Designed in 1999, bcrypt remains excellent. The cost factor (typically 10–14) controls slowness — cost factor 12 means 212 = 4096 rounds of the underlying Blowfish cipher. At cost 12, hashing takes roughly 200–500ms on a typical server. An attacker testing a million passwords needs ~6 years per GPU (compared to milliseconds with SHA-256).
// Node.js
const bcrypt = require('bcrypt');
const COST_FACTOR = 12; // Tune so hashing takes ~300ms on your hardware
async function hashPassword(password) {
return bcrypt.hash(password, COST_FACTOR);
// Returns something like:
// "$2b$12$KIjJ5R3Kcz8M4Q2qP7N8OuJvLqZxY3B8M..." (includes salt)
}
async function verifyPassword(password, storedHash) {
return bcrypt.compare(password, storedHash);
// Extracts the salt from the stored hash automatically
}# Python
import bcrypt
def hash_password(password: str) -> bytes:
return bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))
def verify_password(password: str, hashed: bytes) -> bool:
return bcrypt.checkpw(password.encode(), hashed)Argon2
Argon2 won the Password Hashing Competition in 2015 and is the current recommendation for new systems. It's memory-hard: the algorithm requires a configurable amount of RAM, which limits how effectively attackers can parallelize on GPUs (GPU memory is limited and shared across cores). Use argon2id — the hybrid variant resistant to both time-space tradeoff attacks and side-channel attacks.
// Node.js
const argon2 = require('argon2');
const ARGON2_OPTIONS = {
type: argon2.argon2id,
memoryCost: 65536, // 64 MB RAM — attacker can't run many in parallel
timeCost: 3, // iterations
parallelism: 4,
};
async function hashPassword(password) {
return argon2.hash(password, ARGON2_OPTIONS);
}
async function verifyPassword(password, storedHash) {
return argon2.verify(storedHash, password);
}# Python (using argon2-cffi)
from argon2 import PasswordHasher
ph = PasswordHasher(
time_cost=3,
memory_cost=65536, # 64 MB
parallelism=4,
)
def hash_password(password: str) -> str:
return ph.hash(password)
def verify_password(password: str, stored_hash: str) -> bool:
try:
return ph.verify(stored_hash, password)
except Exception:
return FalseDecision Guide
| Situation | Use |
|---|---|
| New system, no constraints | Argon2id |
| Need broad language/framework support | bcrypt |
| NIST compliance required (FIPS 140) | PBKDF2-SHA256 with 310,000+ iterations |
| File integrity checksums | SHA-256 |
| API authentication tokens | HMAC-SHA256 |
| Passwords — never use these | SHA-256, MD5, or any unsalted hash |
Checksums and File Integrity
The most common non-password use of hashing: verifying that a file you downloaded or transferred hasn't been corrupted or tampered with.
# On Linux/macOS — hashing files
sha256sum ubuntu-24.04-desktop-amd64.iso
# Outputs: a1b2c3d4... ubuntu-24.04-desktop-amd64.iso
# Verify against published checksum
echo "a1b2c3d4... ubuntu-24.04-desktop-amd64.iso" | sha256sum --check
# ubuntu-24.04-desktop-amd64.iso: OK
# On macOS
shasum -a 256 filename.iso
# Windows PowerShell
Get-FileHash .\filename.iso -Algorithm SHA256Hashing a 10GB file in chunks is important — don't load it all into memory:
# Python — streaming file hash
import hashlib
def hash_file(filepath, algorithm='sha256'):
h = hashlib.new(algorithm)
with open(filepath, 'rb') as f:
for chunk in iter(lambda: f.read(65536), b''):
h.update(chunk)
return h.hexdigest()
checksum = hash_file('/path/to/large-file.iso')
print(checksum)// Node.js — streaming file hash
const crypto = require('crypto');
const fs = require('fs');
function hashFile(filepath) {
return new Promise((resolve, reject) => {
const hash = crypto.createHash('sha256');
const stream = fs.createReadStream(filepath);
stream.on('data', chunk => hash.update(chunk));
stream.on('end', () => resolve(hash.digest('hex')));
stream.on('error', reject);
});
}Verify downloads against published hashes with our Checksum Verifier.
Hash Tables vs Cryptographic Hashes
The word "hash" in "hash table" is related but different. Data structure hash functions optimize for speed and distribution, not security. They intentionally make no cryptographic guarantees and are often reversible or predictable.
| Property | Cryptographic Hash | Hash Table Function |
|---|---|---|
| Speed | Deliberately slower (more computation) | Extremely fast (key goal) |
| Collision resistance | Cryptographically strong | Minimized but not cryptographic |
| Reversibility | One-way by design | Doesn't matter |
| Distribution | Good, but not the focus | Critical — uniform distribution prevents clustering |
| Examples | SHA-256, bcrypt, Argon2 | FNV, MurmurHash, xxHash, CityHash |
Using a cryptographic hash like SHA-256 as your hash table function is wasteful — it's 100x slower than MurmurHash for no security benefit in this context. Using a hash table function like MurmurHash for password storage is catastrophic — it's trivially reversible and runs billions of times per second.
Right tool, right job.
Code Examples
Browser JavaScript (Web Crypto API)
// SHA-256 in the browser — no external libraries
async function sha256(message) {
const encoded = new TextEncoder().encode(message);
const hashBuffer = await crypto.subtle.digest('SHA-256', encoded);
const hashArray = Array.from(new Uint8Array(hashBuffer));
return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}
// SHA-512
async function sha512(message) {
const encoded = new TextEncoder().encode(message);
const hashBuffer = await crypto.subtle.digest('SHA-512', encoded);
const hashArray = Array.from(new Uint8Array(hashBuffer));
return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}
// Usage
const hash = await sha256('Hello, world!');
console.log(hash);
// "315f5bdb76d078c43b8ac0064e4a016..."Node.js
const crypto = require('crypto');
// String hashing
const hash = crypto.createHash('sha256').update('Hello').digest('hex');
// File hashing (streaming)
const stream = fs.createReadStream(filepath);
const hash = crypto.createHash('sha256');
stream.pipe(hash);
hash.on('finish', () => console.log(hash.read().toString('hex')));
// HMAC
const hmac = crypto.createHmac('sha256', 'secret')
.update('message')
.digest('hex');Python
import hashlib, hmac
# Basic hashing
sha256 = hashlib.sha256(b"Hello").hexdigest()
sha512 = hashlib.sha512(b"Hello").hexdigest()
md5 = hashlib.md5(b"Hello").hexdigest() # checksum use only
# All available algorithms
print(hashlib.algorithms_available)
# HMAC
mac = hmac.new(b"secret", b"message", hashlib.sha256).hexdigest()
# Timing-safe comparison
hmac.compare_digest(mac, provided_mac) # True/FalseTools
SHA-256 Generator
Hash strings or files with SHA-256 in your browser via Web Crypto API.
Generate SHA-256Quick Reference
Which hash to use?
- General security, signatures: SHA-256
- Maximum security margin: SHA-512
- Passwords: Argon2id or bcrypt
- API auth tokens: HMAC-SHA256
- Non-security checksums: MD5 (fast)
- Hash tables / data structures: MurmurHash, xxHash
Never use for security
- MD5 (collision attacks demonstrated 2004)
- SHA-1 (collision attack demonstrated 2017)
- CRC32 / CRC16 (not cryptographic)
- SHA-256 for passwords (too fast)
- Any unsalted hash for passwords