From Tiny Python Projects by Ken Youens-Clark

This article delves into how to make really memorable and secure passwords—with Python!


Take 40% off Tiny Python Projects by entering fccclark into the discount code box at checkout at manning.com.


It’s not easy to create passwords which are both difficult to guess and easy to remember. An XKCD comic (https://xkcd.com/936/) describes an algorithm which provides both security and recall by suggesting that a password be composed of “four random common words.” For instance, the comic suggests that the password composed of the words “correct,” “horse,” “battery,” and “staple” provides “~44 bits of entropy” which requires around 550 years for a computer to guess given 1,000 guess per second.

We’re going to write a program called password.py that generates these passwords by randomly combining the words from some input files. Many computers have a file that lists thousands of English words each on a separate line. On most of my systems, I can find this at /usr/share/dict/words, and it contains over 235,000 words! As the file can vary by system, I’ve added a version the repo so that we can use the same file. This file is a little large, so I’ve compressed to inputs/words.txt.zip. You should unzip it before using it:

 
 $ unzip inputs/words.txt.zip
  

Now we should both have the same inputs/words.txt file so that this is reproducible for you:

 
 $ ./password.py ../inputs/words.txt --seed 14
 CrotalLeavesMeeredLogy
 NatalBurrelTizzyOddman
 UnbornSignerShodDehort
  

Well, OK, maybe those aren’t going to be the easiest to remember. Perhaps instead we should be a bit more judicious about the source of our words. The passwords above were created from the default word dictionary /usr/share/dict/words on my system (which I’ve included in the GitHub repo as inputs/words.zip). This dictionary lists over 235,000 words from the English language. The average speaker tends to use a small fraction of that, somewhere between 20,000 and 40,000 words.

We can generate more memorable words by drawing from a piece of English text such as the US Constitution. Note that to use a piece of input text in this way, we need to remove all punctuation. We also ignore shorter words with fewer than four characters:

 
 $ ./password.py --seed 8 ../inputs/const.txt
 DulyHasHeadsCases
 DebtSevenAnswerBest
 ChosenEmitTitleMost
  

Another strategy for generating memorable words could be to limit the pool of words to more interesting parts of speech like nouns, verbs, and adjectives taken from texts like novels or poetry. I’ve included a program I wrote called harvest.py that uses a Natural Language Processing library in Python called “spaCy” (https://spacy.io) which extracts those parts of speech into files that we can use as input to our program. I ran the harvest.py program on some texts and placed the outputs into directories in the GitHub repo.

Here’s the output from the nouns from the US Constitution:

 
 $ ./password.py --seed 5 const/nouns.txt
 TaxFourthYearList
 TrialYearThingPerson
 AidOrdainFifthThing
  

Here are passwords generated from The Scarlet Letter by Nathaniel Hawthorne:

 
 $ ./password.py --seed 1 scarlet/verbs.txt
 CrySpeakBringHold
 CouldSeeReplyRun
 WearMeanGazeCast
  

And here are some generated from William Shakespeare’s sonnets:

 
 $ ./password.py --seed 2 sonnets/adjs.txt
 BoldCostlyColdPale
 FineMaskedKeenGreen
 BarrenWiltFemaleSeldom
  

If this isn’t a strong enough password, we also provide a --l33t flag to further obfuscate the text by:

  1. Passing the generated password through the ransom.py algorithm
  2. Substituting various characters with given table
  3. Adding a randomly selected punctuation character to the end

Here’s what the Shakespearean passwords look like with this encoding:

 
 $ ./password.py --seed 2 sonnets/adjs.txt --l33t
 B0LDco5TLYColdp@l3,
 f1n3M45K3dK3eNGR33N[
 B4rReNW1LTFeM4l3seldoM/
  

In this exercise, you’ll:

  • Take an optional list of input files as positional arguments.
  • Use a regular expression to remove non-word characters.
  • Filter words by some minimum length requirement.
  • Use sets to create unique lists.
  • Generate some given number of passwords by combining some given number of randomly selected words.
  • Optionally encode text using a combination of algorithms we’ve previously written.

Writing password.py

Our program is called password.py and creates some --num number of passwords (default 3) each created by randomly choosing some --num_words (default 4) from a unique set of words from one or more input files (default /usr/share/dict/words). As it uses the random module, the program also accepts a random --seed argument. The words from the input files need to be a minimum length of some --min_word_len (default 4) after removing any non-characters.

As always, your first priority is to sort out the inputs to your program. Don’t move ahead until your program can produce this usage with the -h or --help flags and can pass the first seven tests:

 
 $ ./password.py -h
 usage: password.py [-h] [-n num_passwords] [-w num_words] [-m mininum]
                    [-x maximumm] [-s seed] [-l]
                    FILE [FILE ...]
  
 Password maker
  
 positional arguments:
   FILE                  Input file(s)
  
 optional arguments:
   -h, --help            show this help message and exit
   -n num_passwords, --num num_passwords
                         Number of passwords to generate (default: 3)
   -w num_words, --num_words num_words
                         Number of words to use for password (default: 4)
   -m mininum, --min_word_len mininum
                         Minimum word length (default: 3)
   -x maximumm, --max_word_len maximumm
                         Maximum word length (default: 6)
   -s seed, --seed seed  Random seed (default: None)
   -l, --l33t            Obfuscate letters (default: False)
  

The words from the input files are title-cased (first letter uppercase, the rest lowercased) which we can achieve using the str.title() method. This makes it easier to see and remember the individual words in the output. Note that we can vary the number of words included in each password as well as the number of passwords generated:

 
 $ ./password.py --num 2 --num_words 3 --seed 9 sonnets/*
 QueenThenceMasked
 GullDeemdEven
  

The --min_word_len argument helps to filter out shorter, less interesting words like “a,” “an,” and “the.” If you increase this value, then the passwords change quite drastically:

 
 $ ./password.py -n 2 -w 3 -s 9 -m 10 -x 20 sonnets/*
 PerspectiveSuccessionIntelligence
 DistillationConscienceCountenance
  

The --l33t flag is a nod to “leet”-speak where 31337 H4X0R means “ELITE HACKER”[1]. When this flag is present, we’ll encode each of the passwords, first by passing the word through the ransom algorithm we wrote:

 
 $ ./ransom.py MessengerRevolutionImportune
 MesSENGeRReVolUtIonImpoRtune
  

Then we’ll use the following substitution table to substitute characters:

 
 a => @
  A => 4
  O => 0
  t => +
  E => 3
  I => 1
  S => 5
  

To cap it off, we’ll use random.choice to select one character from string.punctuation to add to the end:

 
 $ ./password.py --num 2 --num_words 3 --seed 9 --min_word_len 10 --max_word_len 20 sonnets/* --l33t
 p3RsPeC+1Vesucces5i0niN+3lL1Genc3$
 D1s+iLl@+ioNconsc1eNc3coun+eN@Nce^
  

Here’s the string diagram to summarize the inputs:



Creating a unique list of words

Let’s start off by making our program print the name of each input file:

 
 def main():
     args = get_args()
     random.seed(args.seed)   
  
     for fh in args.file:     
         print(fh.name)       
  

Always set random.seed right away as it globally affects all actions by the random module.

Iterate through the file arguments.

Print the name of the file.

We can run it with the default:

 
 $ ./password.py ../inputs/words.txt
 ../inputs/words.txt
 Or with some of the other inputs:
 $ ./password.py scarlet/*
 scarlet/adjs.txt
 scarlet/nouns.txt
 scarlet/verbs.txt
  

Our first goal is to create a unique list of words we can use for sampling. The elements in a list don’t have to be unique, and we can’t use it. The keys of a dictionary are unique and this is a possibility:

 
 def main():
     args = get_args()
     random.seed(args.seed)
     words = {}                                   
  
     for fh in args.file:                         
         for line in fh:                          
             for word in line.lower().split():    
                 words[word] = 1                  
  
         print(words)
  

Create an empty dict to hold the words.

Iterate through the files.

Iterate through the lines of the file.

Lowercase the line and split it on spaces into words.

Set the key words[word] equal to 1 to indicate we saw it. We’re only using a dict to get the unique keys. We don’t care about the values, and you could use whatever value you like.

If you run this on the US Constitution, you should see a fairly large list of words (some output elided here):

 
 $ ./password.py ../inputs/const.txt
 {'we': 1, 'the': 1, 'people': 1, 'of': 1, 'united': 1, 'states,': 1, ...}
  

I can spot one problem in that the word 'states,' has a comma attached to it. If we try in the REPL with the first bit of text from the Constitution, we can see the problem:

 
>>> 'We the People of the United States,'.lower().split()
['we', 'the', 'people', 'of', 'the', 'united', 'states,']
 

How can we get rid of punctuation?

Cleaning the text

We’ve seen several times that splitting on spaces leaves punctuation, but splitting on non-word characters can break contracted words like “Don’t” in two. I’d like to create a function that cleans a word. First, I’ll imagine the test for it. Note that in this exercise, I’ll put all my unit tests into a file called unit.py which I can run with pytest -xv unit.py.

Here’s the test for our clean function:

 
 def test_clean():
     assert clean('') == ''                
     assert clean("states,") == 'states'   
     assert clean("Don't") == 'Dont'       
  

It’s always good to test your functions on nothing to make sure it does something sane.

The function should remove punctuation at the end of a string.

The function shouldn’t split a contracted word in two.

I would like to apply this to all the elements returned by splitting each line into words, and map is a fine way to do this. We often use a lambda when writing map:



Notice that I don’t need to write a lambda for the map because the clean function expects a single argument:



See how it integrates with the code:

 
 def main():
     args = get_args()
     random.seed(args.seed)
     words = {}
  
     for fh in args.file:
         for line in fh:
             for word in map(clean, line.lower().split()):     
                 words[word] = 1
  
     print(words)
  

Use map to apply the clean function to the results of splitting the line on spaces. No lambda is required because clean expects a single argument.

If I run this on the US Constitution again, I see that 'states' has been fixed:

 
 $ ./password.py ../inputs/const.txt
 {'we': 1, 'the': 1, 'people': 1, 'of': 1, 'united': 1, 'states': 1, ...}
  

I’ll leave it to you to write the clean function which satisfies the test.

Using a set

A better data structure than a dict to use for our purposes is called a set, and you can think of it like a unique list or the keys of a dict. Here’s how we could change our code to use a set to keep track of unique words:

 
 def main():
     args = get_args()
     random.seed(args.seed)
     words = set()                                           
  
     for fh in args.file:
         for line in fh:
             for word in map(clean, line.lower().split()):
                 words.add(word)                             
  
     print(words)
  

Use the set function to create an empty set.

Use set.add to add a value to a set.

If you run this code now, you’ll see a slightly different output where Python shows you a data structure in curly brackets ({}) that makes you think of a dict but you’ll notice that the contents look more like a list:

 
 $ ./password.py ../inputs/const.txt
 {'', 'impartial', 'imposed', 'jared', 'levying', ...}
  

We’re using sets here only for the fact that they easily allow us to keep a unique list of words, but sets are much more powerful than this. For instance, you can find the shared values between two lists by using the set.intersection method:

 
 >>> nums1 = set(range(1, 10))
 >>> nums2 = set(range(5, 15))
 >>> nums1.intersection(nums2)
 {5, 6, 7, 8, 9}
  

You can read help(set) in the REPL or the documentation online to learn about all the amazing things you can do with sets.

Filtering the words

If we look again at the output, we’ll see that the empty string is the first element:

 
 $ ./password.py ../inputs/const.txt
 {'', 'impartial', 'imposed', 'jared', 'levying', ...}
  

We need a way to filter out unwanted values like strings which are too short. In the “Rhymer” exercise, we looked at the filter function which is a higher-order function that takes two arguments:

  1. A function that accepts one element and returns True if the element should be kept or False if the element should be excluded.
  2. Some “iterable” (like a list or map) that produces a sequence of elements to be filtered.

In our case, we want to accept only words that have a length greater or equal to the --min_word_len argument. In the REPL, I can use a lambda to create an anonymous function which accepts a word and then compares that word’s length to a min_word_len. The result of that comparison is either True or False. Only words with a length of 4 or greater are allowed through, and this has the effect of removing words like them empty string or the English articles. Remember that filter is lazy, and I coerce it using the list function in the REPL to see the output:

 
 >>> shorter = ['', 'a', 'an', 'the', 'this']
 >>> min_word_len = 3
 >>> max_word_len = 6
 >>> list(filter(lambda word: min_word_len <= len(word) <= max_word_len, shorter))
 ['the', 'this']
 It will also remove longer words:
 >>> longer = ['that', 'other', 'egalitarian', 'disequilibrium']
 >>> list(filter(lambda word: min_word_len <= len(word) <= max_word_len, longer))
 ['that', 'other']
  

One way we could incorporate the filter() is to create a word_len() function that encapsulates the above lambda. Note that I defined it inside the main() in order to create a closure because I want to include the values of args.min_word_len and args.max_word_len:

 
 def main():
     args = get_args()
     random.seed(args.seed)
     words = set()
  
     def word_len(word):                                                     
         return args.min_word_len <= len(word) <= args.max_word_len
  
     for fh in args.file:
         for line in fh:
             for word in filter(word_len, map(clean, line.lower().split())): 
                 words.add(word)
  
     print(words)
  

This function will return True if the length of the given word is in the allowed range.

We can use word_len (without the parentheses!) as the function argument to filter().

We can again try our program to see what it produces:

 
 $ ./password.py ../inputs/const.txt
 {'measures', 'richard', 'deprived', 'equal', ...}
  

Try it on multiple inputs such as all the nouns, adjectives, and verbs from The Scarlet Letter:

 
 $ ./password.py scarlet/*
 {'walk', 'lose', 'could', 'law', ...}
  

Titlecasing the words

We used the line.lower() function to lowercase all the input, but the passwords we generate need each word to be in “Title Case” where the first letter is uppercase and the rest of the word is lower. Can you figure out how to change the program to produce this output?

 
 $ ./password.py scarlet/*
 {'Dark', 'Sinful', 'Life', 'Native', ...}
  

Now we have a way to process any number of files to produce a unique list of title-cased words which have non-word characters removed and have been filtered to remove the ones which are too short.

Sampling and making a password

We’re going to use the random.sample() function to randomly choose some —num number of words from our set to create an unbreakable yet memorable password. We’ve talked before about the importance of using a random seed to test that our “random” selections are reproducible. It’s also quite important that the items from which we sample always be ordered in the same way so that the same selections are made. If we use the sorted() function on a set, we get back a sorted list which is perfect for using with random.sample(). I can add this line to the code from before:

 
 words = sorted(words)
 print(random.sample(words, args.num_words))
  

Now when I run it with The Scarlet Letter input, I will get a list of words that might make an interesting password:

 
 $ ./password.py scarlet/*
 ['Lose', 'Figure', 'Heart', 'Bad']
  

The result of random.sample() is a list that you can join on the empty string in order to make a new password:

 
 >>> ''.join(random.sample(words, num_words))
 'TokenBeholdMarketBegin'
  

You will need to create args.num of passwords. How will you do that?

l33t-ify

The last piece of our program is to create a l33t function which obfuscates the password. The first step is to convert it with the same algorithm we wrote for ransom.py. I’m going to create a ransom function for this, and here’s the test which is in unit.py. I’ll leave it to you to create the function that satisfies this test[2]:

 
 def test_ransom():
     state = random.getstate()
     random.seed(1)
     assert (ransom('Money') == 'moNeY')
     assert (ransom('Dollars') == 'DOLlaRs')
     random.setstate(state)
  

Save the current global state.

Set the random.seed() to a known value for the test.

Restore the state.

Next, I substitute some of the characters according to the following table:

 
 a => @
  A => 4
  O => 0
  t => +
  E => 3
  I => 1
  S => 5
  

I wrote a l33t function which combines the ransom with the substitution above and finally adds a punctuation character by appending random.choice(string.punctuation). Here’s the test_l33t function you can use to write your function:

 
 def test_l33t():
     state = random.getstate()
     random.seed(1)
     assert l33t('Money') == 'moNeY{'
     assert l33t('Dollars') == 'D0ll4r5`'
     random.setstate(state)
  

Putting it all together

Without giving away the ending, I’d like to say that you need to be careful about the order of operations that include the random module. My first implementation printed different passwords given the same seed when I used the --l33t flag. Here was the output for plain passwords:

 
 $ ./password.py -s 1 -w 2 sonnets/*
 EagerCarcanet
 LilyDial
 WantTempest
  

I expected the exact same passwords only encoded. Here’s what my program produced instead:

 
 $ ./password.py -s 1 -w 2 sonnets/* --l33t
 3@G3RC@rC@N3+{
 m4dnes5iNcoN5+4n+|
 MouTh45s15T4nCe^
  

The first password looks OK, but what are those other two? I modified my code to print both the original password and the l33ted one:

 
 $ ./password.py -s 1 -w 2 sonnets/* --l33t
 3@G3RC@rC@N3+{ (EagerCarcanet)
 m4dnes5iNcoN5+4n+| (MadnessInconstant)
 MouTh45s15T4nCe^ (MouthAssistance)
  

The random module uses a global state to make each of its “random” choices. In my first implementation, I modified this state after choosing the first password by immediately modifying the new password with the l33t function. Because the l33t function also uses random functions, the state was altered for the next password. The solution was to first generate all the passwords and then to l33t them, if necessary.

Those are all the pieces you should need to write your program. You have the unit tests to help you verify the functions, and you have the integration tests to ensure your program works as a whole. This is the last program; give it your best shot before looking at the solution!

Solution

 
 #!/usr/bin/env python3
 """Password maker, https://xkcd.com/936/"""
  
 import argparse
 import random
 import re
 import string
  
  
 # --------------------------------------------------
 def get_args():
     """Get command-line arguments"""
  
     parser = argparse.ArgumentParser(
         description='Password maker',
         formatter_class=argparse.ArgumentDefaultsHelpFormatter)
  
     parser.add_argument('file',
                         metavar='FILE',
                         type=argparse.FileType('r'),
                         nargs='+',
                         help='Input file(s)')
  
     parser.add_argument('-n',
                         '--num',
                         metavar='num_passwords',
                         type=int,
                         default=3,
                         help='Number of passwords to generate')
  
     parser.add_argument('-w',
                         '--num_words',
                         metavar='num_words',
                         type=int,
                         default=4,
                         help='Number of words to use for password')
  
     parser.add_argument('-m',
                         '--min_word_len',
                         metavar='mininum',
                         type=int,
                         default=3,
                         help='Minimum word length')
  
     parser.add_argument('-x',
                         '--max_word_len',
                         metavar='maximumm',
                         type=int,
                         default=6,
                         help='Maximum word length')
  
     parser.add_argument('-s',
                         '--seed',
                         metavar='seed',
                         type=int,
                         help='Random seed')
  
     parser.add_argument('-l',
                         '--l33t',
                         action='store_true',
                         help='Obfuscate letters')
  
     return parser.parse_args()
  
  
 # --------------------------------------------------
 def main():
     args = get_args()
     random.seed(args.seed)                                                        
     words = set()                                                                 
  
     def word_len(word):                                                           
         return args.min_word_len <= len(word) <= args.max_word_len
  
     for fh in args.file:                                                          
         for line in fh:                                                           
             for word in filter(word_len, map(clean, line.lower().split())):       
                 words.add(word.title())                                           
  
     words = sorted(words)                                                         
     passwords = [                                                                 
         ''.join(random.sample(words, args.num_words)) for _ in range(args.num)
     ]
  
     if args.l33t:                                                                 
         passwords = map(l33t, passwords)                                          
  
     print('\n'.join(passwords))                                                   
  
  
 # --------------------------------------------------
 def clean(word):                                                                  
     """Remove non-word characters from word"""
  
     return re.sub('[^a-zA-Z]', '', word)                                          
  
  
 # --------------------------------------------------
 def l33t(text):                                                                   
     """l33t"""
  
     text = ransom(text)                                                           
     xform = str.maketrans({                                                       
         'a': '@', 'A': '4', 'O': '0', 't': '+', 'E': '3', 'I': '1', 'S': '5'
     })
     return text.translate(xform) + random.choice(string.punctuation)              
  
  
 # --------------------------------------------------
 def ransom(text):                                                                 
     """Randomly choose an upper or lowercase letter to return"""
  
     return ''.join(                                                               
         map(lambda c: c.upper() if random.choice([0, 1]) else c.lower(), text))
  
  
 # --------------------------------------------------
 if __name__ == '__main__':
     main()
  

Set the random.seed to the given value or the default None which is the same as not setting the seed.

Create an empty set to hold all the unique of words we’ll extract from the texts.

Iterate through each open file handle.

Iterate through each line of text in the file handle.

Iterate through each word generated by splitting the line on spaces, removing non-word characters with the clean function, and filtering for words greater or equal in length to the given minimum.

Titlecase the word before adding it to the set.

Use the sorted function to order words into a new list.

Initialize an empty list to hold the passwords we create.

Use a for loop with a range to create the correct number of passwords. Because I don’t need the value from range, I can use the _ to ignore the value.

Make a new password by joining a random sampling of words on the empty string.

Now that all the passwords have been created, it’s safe to call the l33t function if required. If we had used it in the above loop, itwould’ve altered the global state of the random module and we’d have gotten different passwords.

If the l33t flag is present, obfuscate the password; otherwise, print it as-is.

Define a function to “clean” a word.

Use a regular expression to substitute the empty string for anything which isn’t an English alphabet character.

Define a function to l33t a word.

First use the ransom function to randomly capitalize letters.

Make a translation table/dict for character substitutions.

Use the str.translate function to perform the substitutions, append a random piece of punctuation.

Define a function for the ransom algorithm.

Return a new string created by randomly upper- or lowercasing each letter in a word.

Discussion

Well, that was it. The last exercise! I hope you found it challenging and fun. Let’s break it down a bit. Nothing new was in get_args; let’s start with the auxiliary functions:

Cleaning the text

I chose to use a regular expression to remove any characters that are outside the set of lowercase and uppercase English characters:

 
 def clean(word):
     """Remove non-word characters from word"""
     return re.sub('[^a-zA-Z]', '', word)                  
  

The re.sub function substitutes any text matching the pattern (the first argument) found in the given text (the third argument) with the value given by the second argument.

Recall from the “Gematria” exercise that we can write the character class [a-zA-Z] to define the characters in the ASCII table bounded by those two ranges. We can then negate or complement that class by placing a caret ^ as the first character inside that class, and it [^a-zA-Z] can be read as “any character not matching a to z or A to Z.”

It’s perhaps easier to see it in action in the REPL. In this example, only the letter “AbCd” is left from the text “A1b*C!d4”:

 
 >>> import re
 >>> re.sub('[^a-zA-Z]', '', 'A1b*C!d4')
 'AbCd'
  

If the only goal were to match ASCII letters, it’s possible to solve it by looking for membership in string.ascii_letters:

 
 >>> import string
 >>> text = 'A1b*C!d4'
 >>> [c for c in text if c in string.ascii_letters]
 ['A', 'b', 'C', 'd']
  

It honestly seems like more effort to me. Besides, if the function needed to be changed to allow, say, numbers and a few specific pieces of punctuation, then the regular expression version becomes significantly easier to write and maintain.

A king’s ransom

The ransom function was taken straight from the ransom.py program, and there isn’t too much to say about it except, hey, look how far we’ve come! What was an entire idea for a article is now a single line in a much longer and more complicated program:

 
 def ransom(text):
     """Randomly choose an upper or lowercase letter to return"""
     return ''.join(                                                               
         map(lambda c: c.upper() if random.choice([0, 1]) else c.lower(), text))   
  

Use map iterate through each character in the text and select either the upper- or lowercase version of the character based on a “coin” toss using random.choice to select between a “truthy” value (1) or a “falsey” value (0).

Join the resulting list from the map on the empty string to create a new str.

How to l33t

The l33t function builds on the ransom and then adds a text substitution. I like the str.translate version of that program, and I used it again here:

 
 def l33t(text):
     """l33t"""
     text = ransom(text)                                              
     xform = str.maketrans({                                          
         'a': '@', 'A': '4', 'O': '0', 't': '+', 'E': '3', 'I': '1', 'S': '5'
     })
     return text.translate(xform) + random.choice(string.punctuation) 
  

First randomly capitalize the given text.

Make a translation table from the given dict which describes how to modify one character to another. Any characters not listed in the keys of this dict are ignored.

Use the str.translate method to make all the character substitutions. Use random.choice to select one additional character from string.punctuation to append to the end.

Processing the files

Now to apply these to the processing of the text. To use these, we need to create a unique set of all the words in our input files. I wrote this bit of code both with an eye on performance and for style:

 
 words = set()
 for fh in args.file:
     for line in fh:
         for word in filter(word_len, map(clean, line.lower().split())):
             words.add(word.title())
  

Iterate through each open file handle.

Read the file handle line-by-line with a for loop, not with a method like fh.read() which will read the entire contents of the file at once.

Reading this code actually requires starting at the end where we split the line.lower() on spaces. Each word from str.split() goes into clean() which then must pass through the filter() function.

Titlecase the word before adding it to the set.

Here’s a diagram of that for line:



  1. line.lower() will return a lowercase version of line.
  2. The str.split() method will break the text on whitespace to return words.
  3. Each word is fed into the clean() function to remove any character that is not in the English alphabet.
  4. The cleaned words are filtered by the word_len() function.
  5. The resulting word has been transformed, cleaned, and filtered.

If you don’t like the map and filter functions, rewrite the code in a more traditional way:

 
 words = set()
 for fh in args.file:                                                 
     for line in fh:                                                  
         for word in line.lower().split():                            
             word = map(clean)                                        
             if args.min_word_len <= len(word) <= args.max_word_len:  
                 words.add(word.title()                               
  

Iterate through each open file handle.

Iterate through each line of the file handle.

Iterate through each “word” from splitting the lowercased line on spaces.

Clean the word up.

If the word is long enough,

Then add the titlecased word to the set.

Whichever way you choose to process the files, at this point you should have a complete set of all the unique, titlecased words from the input files.

Sampling and creating the passwords

As noted above, it’s vital to sort the words for our tests to verify that we’re making consistent choices. If you only wanted random choices and didn’t care about testing, you don’t need to worry about sorting – but then you’d also be a morally deficient person for not testing – perish the thought! I chose to use the sorted function as there’s no other way to sort a set:

 
 words = sorted(words)   
  

Because there’s no set.sort function, sets are ordered internally by Python. Calling sorted on a set creates a new, sorted list.

We need to create some given number of passwords, and I thought it might be easiest to use a for loop with a range. In my code, I used for _ in range(…) because I don’t need to know the value each time through the loop. The _ is a way to indicate that you’re ignoring the value. It’s fine to say for i in range(…) if you want, but some linters might complain if they see that your code declares the variable i but never uses it. That could legitimately be a bug, and it’s best to use the _ to show that you mean to ignore this value.

Here’s the first way I wrote the code that led to the bug I mentioned in the discussion where different passwords are chosen even when I use the same random seed. Can you spot the bug?

 
 for _ in range(args.num): 
     password = ''.join(random.sample(words, args.num_words)) 
     print(l33t(password) if args.l33t else password)         
  

Iterate through the args.num of passwords to create.

Each password is based on a random sampling from our words, and we choose the value given in args.num_words. The random.sample function returns a list of words that we join on the empty string to create a new string.

If the args.l33t flag is True, then we’ll print the l33t version of the password; otherwise, we’ll print the password as-is. This is the bug! Calling l33t here modifies the global state used by the random module, and the next time we call random.sample we get a different sample.

The solution is to separate the concerns of generating the passwords and possibly modify them:

 
 passwords = [                                                             
     ''.join(random.sample(words, args.num_words)) for _ in range(args.num)
 ]
  
 if args.l33t:                                                             
     passwords = map(l33t, passwords)
  
 print('\n'.join(passwords))                                               
  

Use a list comprehension iterate through range(args.num) to generate the correct number of passwords.

If the args.leet flag is True, then use the l33t() function to modify the passwords.

Print the passwords joined on newlines.

I’ll leave you with the following thought:

Any code of your own that you haven’t looked at for six or more months might as well have been written by someone else. – Eagleson’s Law

Review

This exercise kind of has it all. Validating user input, reading files, using a new data structure in the set, higher-order functions with map and filter, random values, and lots of functions and tests! I hope you enjoyed programming it, and maybe you’ll even use the program to generate your new passwords. Be sure to share those passwords with your author, like the ones to your bank account and favorite shopping sites!

Going Further

  • The substitution part of the l33t function changes every available character which perhaps makes the password too difficult to remember. It would be better to modify only maybe 10% of the password.
  • Create programs that combine other skills you’ve learned. Like maybe a lyrics generator that randomly selects lines from a files of songs by your favorite bands, then encodes the text with the “Kentucky Friar,” then changes all the vowels to one vowel with “Apples and Bananas,” and then SHOUTS IT OUT with “The Howler”?

Congratulations, you are now 733+ HAX0R!

That’s all for this article. If you want to see more, you can preview the book’s contents on our browser-based liveBook reader here.

 


[1] See the Wiki page https://en.wikipedia.org/wiki/Leet or the Cryptii translator https://cryptii.com/

[2] You can run pytest -xv unit.py to run the unit tests. The program will import the various functions from your password.py file to test. Open unit.py and inspect it to understand how this happens!