Password lists for real hackers

Look, if your cracking workflow starts and ends with rockyou.txt, this post is for you. That list came from a 2009 breach of a social gaming site. The users had zero password requirements. You are not auditing a social gaming site from 2009.

Let's talk about what actually works.

Why rockyou.txt Fails on Corporate Targets

Password coverage curve chart

rockyou.txt is a consumer list. No complexity rules, no lockout policies, no IT department breathing down anyone's neck. Passwords like jessica, iloveyou, 123456. Cute.

In an Active Directory environment with a real password policy? That list is mostly noise.

The ignis-sec PWDB project crunched 1 billion leaked credentials and the stats are not kind to rockyou fans:

Top 1,000 passwords cover just 6.6% of accounts
Top 1 million gets you to 36.28%
Top 10 million reaches 54%
Average password length: 9.48 characters
Only 7.08% of passwords have special characters
15% are all-lowercase

And the kicker: of the 14.3 million most common PWDB passwords, 11.5 million aren't in rockyou.txt at all. That's an 80% miss rate. Let that sink in.

The occurrence.100K.txt file in the ignis-sec repo makes this even clearer. Password popularity drops off a cliff after the top few thousand. List quality beats list size every time.

What Users Actually Do Under a Password Policy

Password transformation diagram

A typical corporate policy goes something like this:

8 characters minimum
One uppercase letter
One digit
No dictionary words (everyone ignores this one)
Can't reuse last 10 passwords

So what do humans do? They take a word they know, slap a capital on the front, and stick a number or ! on the end. Every time.

word -> Word1
word -> Word123
word -> Word!
word -> Word2024
word -> W0rd1!

The base words are seasons, months, sports teams, company names, kids' names. Stuff that's already in your wordlist. The transformed versions are not. That's the gap. Rules engines exist to close it.

Wordlists That Are Actually Worth Your Time

For Hash Cracking

ignis-10M.txt (ignis-sec/Pwdb-Public) Frequency-sorted from 1 billion real credentials. Far better modern coverage than rockyou. The 10M version is the sweet spot: big enough to matter, small enough to run fast with rules.

Kaonashi (\~2.35 GB) Debuted at RootedCON 2019. Comes with its own kaonashi.rule tuned to the list's character distribution. Very solid for NTLM/NTLMv2 from internal engagements.

CrackStation (\~4.2 GB) Huge and slow. Pull it out when the other lists have failed and the target is worth the GPU time.

SecLists Passwords/ subtree danielmiessler/SecLists is the toolkit that never gets old. The Common-Credentials/ and Leaked-Databases/ folders are well-maintained. The Default-Credentials/ folder still wrecks network gear regularly.

Language-specific lists If the engagement is in the Netherlands, Germany, France, or anywhere non-English: use a local list. dutch.txt will outperform rockyou.txt on a Dutch AD environment and it's a fraction of the size.

For Spraying (Hydra / CME / GoSpray)

Spraying means working within lockout limits, usually 3-5 attempts per window. Bigger lists are useless here.

Build a spray list of 50-200 entries max from:

Season + year: Winter2024, Lente2025
Company name + number: Acme1!, AcmeCorp2024
Generic IT onboarding classics: Welcome1, Welcome1!, ChangeMe1!
Month + year: January2024, Jan2024!
The target's own product or brand names

None of these are in any public list. You build them from OSINT. CUPP can help automate the generation once you've gathered the raw intel.

Pre-filtering Lists to Match the Policy

Before running rules, filter your base list down to candidates that already survive the target policy. Smaller list, less wasted time.

Quick grep filters

# Capital start, ends in ! or digit, 8+ chars
grep -E '^[A-Z].{6,}[!@#$0-9]$' ignis-10M.txt > corporate-filtered.txt

# Capital first, lowercase body, digit ending
grep -E '^[A-Z][a-z]{5,}[0-9]+$' ignis-10M.txt > cap-lower-digit.txt

# Common corporate suffixes
grep -E '^[A-Z][a-z]+(1|1!|123|2024|2025|!)$' ignis-10M.txt > likely-corporate.txt

Python for more control

import re

policy = re.compile(r'^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$')

with open('ignis-10M.txt') as f_in, open('policy-compliant.txt', 'w') as f_out:
    for line in f_in:
        pw = line.strip()
        if policy.match(pw):
            f_out.write(pw + '\n')

From a 10M list you'll typically get 200K-500K policy-compliant candidates. That's what you want running through your rules, not 10 million entries that'll never crack anything.

Rules: The Part Everyone Skips

Keep your base wordlists in plain lowercase. Let the rules engine do the transformations at runtime. One million base words plus a solid ruleset generates billions of candidates without storing any of them.

Hashcat rule basics

Rule	What it does	Example
`c`	Capitalize first letter	`password` -> `Password`
`u`	All uppercase	`password` -> `PASSWORD`
`l`	All lowercase	`PASSWORD` -> `password`
`$X`	Append X	`password` -> `password!`
`^X`	Prepend X	`password` -> `!password`
`sXY`	Replace X with Y	`password` -> `p@ssword`
`r`	Reverse	`password` -> `drowssap`

Each line in a rule file = one transformation applied to every word. Multiple commands on one line run in sequence.

Here's a practical corporate rule file:

# corporate.rule

# No change (baseline)
:

# Capital + digit
c $1
c $2
c $1 $2 $3

# Capital + year
c $2 $0 $2 $4
c $2 $0 $2 $5

# Capital + !
c $!
c $! $1
c $1 $!

# Leet substitutions
c so0
c sa@
c se3
c si1

# Leet + !
c so0 sa@ $!
c si1 se3 $!

# Year + ! (very common in AD)
c $1 $2 $3 $!
c $2 $0 $2 $3 $!
c $2 $0 $2 $4 $!

Run it:

hashcat -m 1000 hashes.ntlm wordlist.txt -r corporate.rule -O

Built-in rulesets worth knowing:

best64.rule - start here\, always
OneRuleToRuleThemAll.rule - thorough but slow\, good for overnight runs
d3adhob0.rule - tuned for corporate AD patterns
InsidePro-PasswordsPro.rule - broad substitution coverage
toggles5.rule - systematic case permutations

Stack multiple rules in one run:

hashcat -m 1000 hashes.ntlm wordlist.txt \
  -r best64.rule \
  -r corporate.rule \
  -O --status

John the Ripper

Rules go in john.conf under a named section:

[List.Rules:Corporate]
:
c
c Az"[0-9]"
c Az"2024"
c Az"2025"
c Az"!"
c Az"1!"
c Az"123!"
so0 c
sa@ c
c so0 $! $1

Run it:

john --wordlist=wordlist.txt --rules=Corporate hashes.txt

Sanity check what your rules are producing before you commit GPU hours:

john --wordlist=wordlist.txt --rules=Corporate --stdout | head -100

Genuinely underused. Always do this.

Building a spray list from rules

For Hydra/CME you need a static list. Generate it first:

# From hashcat
hashcat -m 1000 /dev/null wordlist.txt -r corporate.rule --stdout > spray-candidates.txt

# From John
john --wordlist=wordlist.txt --rules=Corporate --stdout > spray-candidates.txt

# Sort, dedup, trim
sort -u spray-candidates.txt | head -100 > final-spray.txt

# Spray via Hydra
hydra -L users.txt -P final-spray.txt smb://10.10.10.1

# Or CME (better for AD, handles lockout awareness)
crackmapexec smb 10.10.10.0/24 -u users.txt -p final-spray.txt --continue-on-success

Check lockout policy before you spray. net accounts /domain or pull it from BloodHound. One password across all users, wait out the window, repeat. CME's --continue-on-success keeps valid creds from getting buried in the output.

The Cracking Workflow

Got hashes from a DC or Responder capture? Run these stages in order.

Stage 1 - Quick wins (minutes)

hashcat -m 1000 hashes.txt ignis-10M.txt -O --status

Stage 2 - Corporate transforms (hours)

hashcat -m 1000 hashes.txt ignis-10M.txt -r best64.rule -O
hashcat -m 1000 hashes.txt ignis-10M.txt -r corporate.rule -O

Stage 3 - Bigger lists, targeted rules (overnight)

hashcat -m 1000 hashes.txt kaonashi.txt -r kaonashi.rule -O
hashcat -m 1000 hashes.txt crackstation.txt -r best64.rule -O

Stage 4 - Mask attacks for known policy shapes (weekend)

# Capital + 6 lowercase + 2 digits
hashcat -m 1000 hashes.txt -a 3 ?u?l?l?l?l?l?l?d?d

# Capital + lowercase + special + digit
hashcat -m 1000 hashes.txt -a 3 ?u?l?l?l?l?l?l?d?s

The Point

Password cracking is a vocabulary problem, not a compute problem. People don't invent passwords. They pick a word and bend it to fit the policy. Your job is to model that bending.

Good base list + smart rules beats a giant static wordlist every time.