The future of password cracking: Passphrase cracking

The future of password cracking: Passphrase cracking

November 20, 2025
Employee uses Multi-Factor Authentication to access work applications

(Note: Very much work in progress and much more research and hash cracking is needed to answer some of the questions posed below)

Welcome to the next instalment of The Art of Password Cracking (with science!).

Anyone starting a career in cyber security soon realises that often the weakest link in any system turns out to be weak passwords. They will often find themselves sounding like a broken record with the following refrain: “We found weak passwords and were able to access X, Y and Z, please increase your password complexity.”

Anyone taking up this advice to increase their security has two choices: 1) either go the way of randomly generated complex passwords using password managers (or post-it notes!) or 2) come up with increasingly complex yet still memorable passwords in the form of passphrases. Adopting passphrases in place of passwords has been advocated for a while now, for example by NIST [1], the Canadian government [2] and the already 12-year-old xkcd comic [3].

  1. https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-63b.pdf
  2. https://www.cyber.gc.ca/en/guidance/best-practices-passphrases-and-passwords-itsap30032
  3. https://www.explainxkcd.com/wiki/index.php/936:_Password_Strength

Those encouraging the use of passphrases often indicate that it pays to select at least 4 words, and at random if at all possible.

From an attacker’s point of view, there are indeed serious challenges in discovering a user’s passphrase if care is taken to choose a good one. When joining words the following considerations quickly cause the number of permutations that could be tried to balloon exponentially out of control:

1) What (type of) wordlist will we use as a basis for our attacks?

If we use a frequency-based wordlist (the most commonly used words are tried first), then what source will we use? A commonly referred to resource has graded words according to frequency in different languages based on their occurrence in subtitles. Given that the published data is from 2018, would there be mileage in updating this manually? Would another source be better, say Wikipedia dumps, or news reports perhaps?

2) Do we use phrases themselves as a source?

We could generate phrases again from media subtitles perhaps, Wikipedia dumps or copy-right free books, so that we can catch passwords such as “Sayhellotomylittlefriend01”, “Citationneeded!” or “Tobeornottobe01!”

3) What permutations between word boundaries do we test for?

Rule files tend to add characters to the start or end of a candidate password, and in a way, they assume that a single word is modified. However, since we’re using multiple words concatenated, and a rule file will not know where the boundaries between words are, we need to implement basic forms of reasonable permutations in the passphrase generator itself. Testing for lowercase vs capitalised words seems decent at a minimum, but even doing that for 3 words increases the time to attack any list by a factor of 2^3 = 8 times. Now consider if we want to test for the use of spaces, a number or hyphenation and we could balloon our attack time by at least a factor of 10^3 = 1000 times.

4) Do we use any ruleset files along with our generated passphrases?

Some tests indicate that it’s difficult to pump out passphrases fast enough for hashcat to digest them at the speed it’s capable of hashing NTLM hashes for example. Given an example of a limit of generating 3M phrases/s, and a capability of 3 GH/s (gigahashes), it would pay to have a rule file of at least 1000 rules to make up for the lack of phrases generated and not waste any unused CPU/GPU cycles during the attack. If that’s the case, one might as well incorporate the very useful OneRuleToRuleThemStill, which comes in at just above 50,000 rules. Perhaps further research in thinning out the ruleset would pay off in this respect.

5) Would there be any mileage in determining common ways that either sentences or phrases may be constructed, so one could build a wordlist not based on overall frequency, but on the frequency of their position in a sentence? This would then allow human sentences to be tried before Yoda sentences (“IAmPowerful” vs “PowerfulliAm”)

Let’s discuss and try to make some observations on password cracking runs done so far. We’ve used an export from a large client company, and generated passphrases using the subtitles-based word frequency as generated by the research done here [4] which used the data at OpenSubtitles [5].

4. https://opus.nlpl.eu/OpenSubtitles2018.php
5. http://www.opensubtitles.org/

We don’t use the wordlist as is but prefix it with as many context-specific words as we can come up with, such as the company and subsidiaries’ names in various full and abbreviated forms. We then append the wordlist and start generating passphrases. Using hashcat ruleset matching logging is easier as it will show the source word, a rule that caused a password crack hit and the resultant actual password. Looking through this it appears some rules mangle the original word beyond recognition.

Using a 3D visual aid (Three.js [6]) we will create a plot where each axis represents the frequency-based wordlist used for a particular cracking attempt. We will plot two source word combinations in yellow along the X/Y axis, and three source word combinations in white within the cube shown. The most frequent actual word beyond a, the, an, etc was the word “love” and any password containing it has been coloured pink. The second most encountered word is “welcome” and a password containing it is coloured green. The blue boxes encompass the custom keywords which were prefixed to the list, hence any 2D word present in one of the tight bars or a 3D point present in any blue bar indicates it includes a custom keyword as part of its source words. Finally, words which could not be reliably traced back to a specific set of words are coloured increasingly darker grey, according to how many letters (up to 4) were still unaccounted for once 3 words had been found.

From running a three-word concatenation of the medium-sized wordlist containing 3k words, the following observations can be made:

  • Due to the caveat mentioned below of 1 or 2-letter words being present in the list, a large number of uncertain guesses are found, resulting in long grey strings spread throughout the 3D space.

From running a two-word concatenation of the largest frequency order wordlist containing 50k words, the following observations can be made:

  • 3D points (other than yellow) seem to be a result of hits found using smaller dictionary runs, or results from regular wordlist cracks, as this passphrase cracking run was limited to 2 words only due to its size, which already took nearly 2 days of cracking time.

  • There is a lattice pattern visible near the origin in the yellow plan, indicating that there are a lot of words that seem to be both popular as start and end words as well.

Further analysis of the results has been done using some 3D visualisations to help explain certain things.

Several cracking attempts were done with varying lengths of dictionaries order by word frequency, from the source quoted above. For the smaller sizes, up to 3 words were attempted, while for the largest size, only a 2-word combination could be attempted as this by itself already took the better part of 2 weeks. All results were then plotted in a 3D fashion, with each axis representing the same dictionary. Due to the inherent bottlenecks present in the passphrase generation, the rules file was activated during cracking as well. Unfortunately, this tends to add and/or remove characters at certain places in passwords, and thus this sometimes makes it difficult to find the original words comprising the final password. An attempt was made to find most of the successful passphrase cracks using a script that tries to find the best match for any given set of passwords found and the dictionary used. This seems to work around 80% of the time, so it’s certainly not perfect. It would, in theory, be possible to reconstruct the exact component words of a passphrase by analysing the rules that ultimately generated the crack, as given by hashcat’s rules debugging, but for some reason in my setup, this file is only sometimes generated even though I always specify it should always be created. This may need further debugging on my setup.

For now, let’s get to the good stuff. As a general guide on what can be seen in the videos below, here is how they are constructed. The cracking attempts were made using a recent AD NTDS dump of a large client’s AD environment that was the subject of a test by BTL. This comprised nearly 200k unique hashes, although these contained a good amount of historical hashes for all users. This only helped cracking attempts though, as the ruleset has more chance to catch any of the passwords in the set “MyPassword01, Mypassword02, …” that a particular user may use over time. In the 3D plots, the x-axis looks red, but that is only because passwords found to be based on only a single word have been plotted in red, with their y and z coordinates left zero. As there seem to be few gaps in this red line if any, the first observation is that nearly all dictionary words seem to have been used. Where passwords were found to contain 2 base words, these are plotted in yellow with their z coordinate zero, hence they form a plane above the red x-axis. We will further describe some features highlighted in the video:- #1 Several planes can be seen, in a corn row fashion. This indicates very popular words that seem to have been used a lot as the third word for many passwords. Some preliminary research indicates …

  • #2 The same things can be seen in the yellow plane, as this exhibits a lattice type of structure near the origin. This indicates there a several sets of words that seem to be popular both as first and second words.
  • #3: A diagonal line seen in the yellow plane going up to the corner indicates a password comprised of a single word repeated. Some rules in the ruleset even perform a basic set of manipulations so these are not all necessarily results from these password cracking attempts as a single word attempt together with the ruleset was used previously already.
  • #5: Each dictionary file was prefixed with some form of the company name, some of their subsidiaries (whose AD domains have been merged/imported into the main domain), some supplier/support company names that have accounts in the company AD, as well as some abbreviations and acronyms that are associated with the aforementioned. This amounts to about 50-odd words. All passwords that contained any of these 50 words have been coloured blue. As can be seen in the main, as well as other videos is that they comprise a significant amount of all passwords found, and hence such efforts are always recommended in any serious full AD password audit. One fault of this graph is that it will only count 1 set of passphrase components one time, whereas if multiple passwords were found, the ball could be drawn larger in the graph. This would require additional processes as this should only happen if the additional matches were against a different user, instead of the same user’s historical hashes.
  • #4: Besides the word password and company names etc, the next 2 commonly used words appear to be “welcome” and “love”. Since some passwords appear to be a window into the user’s soul, this seems to be a good thing. These 2 words have been coloured green and pinkish/purple in the graph just to illustrate their number compared to the total. Especially in the 1k video, it can be observed how the green dots all occupy 3 different slices across the cube.
  • $6: Some users did not just create a password by repeating a word twice, but thrice, and hence a diagonal line from the original, goes out to the far opposite corner of the cube.
  • #7: In the 200k video’s yellow plane, a sort of parabolic curve against the yellow dots can be seen. This would indicate that the set of words most commonly used as the first word in a passphrase is much smaller, and indeed much related to word frequency, than is the case for the second word of a passphrase, which appears to come from a much broader and spread out selection of the dictionary. We hope you found the analysis entertaining and enlightening and perhaps it has given you some ideas to try in your password-cracking attempts. Happy hashcat-ing :slightly_smiling_face:.

Caveats encountered so far:

  • Some longer words are not part of the frequency wordlist, yet are sometimes cracked by chance due to being composed of smaller words
  • When checking the output of the passphrase analysis script (which is supposed to recover/guess the source words that caused a match) it can often be seen that hits could be made using non-word entries in the lists that are composed of seemingly random single or two letter words that I don’t recognise or see a reason for including. Perhaps these cover cases where acronyms or abbreviations are used in titles and other occasions
  • The meaning of numbers in passphrases can be ambiguous. Is the number 10 meant to indicate some counter where previously was 9? Or does it represent ‘to’? The same ambiguity is there for the number ‘2’ itself. Again, inconclusive or at least slightly inaccurate or unintended results abound.
  • Some rules part of popular rule files can delete characters as well, resulting in a cracked password but information is lost of what words comprised the original password.

It seems very much out of reach to attempt passphrase cracking beyond 3 words in length. Even a minimal dictionary with 1000 words and a hard limit of generating phrases at 3M/s would mean a minimal 4-word sentence attack with capitalisation of words as the only mutation done will take 2^4 * 1000 ^ 4 / 3M =~ 2 months.

Share with your network

Related Articles
  • Attacking Cognito-based Authentication & Authorisation

  • Part 8: Android Mobile Pen Testing

  • Part 7: Android Mobile Pen Testing