From Tiny Python Projects by Ken Youens-Clark

Everyone loves Mad Libs! And everyone loves Python. This article shows you how to have fun with both and learn some programming skills along the way.

Take 40% off Tiny Python Projects by entering fccclark into the discount code box at checkout at

When I was a wee lad, we used to play at Mad Libs for hours and hours. This was before computers, mind you, before televisions or radio or even paper! No, scratch that, we had paper. Anyway, the point is we only had Mad Libs to play, and we loved it! And now you must play!

We’ll write a program called which reads a file given as a positional argument and finds all the placeholders noted in angle brackets like <verb> or <adjective>. For each placeholder, we’ll prompt the user for the part of speech being requested like “Give me a verb” and “Give me an adjective.” (Notice that you’ll need to use the correct article.) Each value from the user replaces the placeholder in the text, and if the user says “drive” for “verb,” then <verb> in the text replaces with drive. When all the placeholders have been replaced with inputs from the user, print out the new text.

For instance, here’s a version of the “fox” text:

 $ cat inputs/fox.txt
 The quick <adjective> <noun> jumps <preposition> the lazy <noun>.

When the program runs with this file as the input, it asks for each of the placeholders and then prints the silliness:

 $ ./ inputs/fox.txt
 Give me an adjective: surly
 Give me a noun: car
 Give me a preposition: under
 Give me a noun: bicycle
 The quick surly car jumps under the lazy bicycle.

By default, this is an interactive program that uses the input prompt to ask the user for their answers, but, for testing purposes, you have an option for -i or --inputs and the test suite can pass in all the answers and bypass the interactive input calls:

 $ ./ inputs/fox.txt -i surly car under bicycle
 The quick surly car jumps under the lazy bicycle.

In this exercise, you will:

  • Learn about greedy matching
  • Use re.findall to find all matches for a regex
  • Use re.sub to substitute found patterns with new text
  • Explore ways to write without using regular expressions.


To start off, use to create the program or copy template/ to mad_libs/ You should define the positional file argument as type=argparse.FileType('r'). The -i or --inputs option should use nargs='*' to define a list of zero or more str values.

First modify your until it produces the following when given no arguments or the -h or --help flag:

 $ ./ -h
 usage: [-h] [-i [input [input ...]]] FILE
 Mad Libs
 positional arguments:
   FILE                  Input file
 optional arguments:
   -h, --help            show this help message and exit
   -i [input [input ...]], --inputs [input [input ...]]
                         Inputs (for testing) (default: None)

If the given file argument doesn’t exist, the program should error out:

 $ ./ blargh
 usage: [-h] [-i [str [str ...]]] FILE error: argument FILE: can't open 'blargh': \
 [Errno 2] No such file or directory: 'blargh '

If the text of the file contains no <> placeholders, it should print a message and exit with an error value. Note this doesn’t need to print a usage, and you don’t have to use parser.error as in previous exercises:

 $ cat no_blanks.txt
 This text has no placeholders.
 $ ./ no_blanks.txt
 "no_blanks.txt" has no placeholders.

Here’s a string diagram to help you visualize the program:

Using regular expressions to find the pointy bits

The first thing we need to do is read the input file:

 >>> text = open('inputs/fox.txt').read().rstrip()
 >>> text
 'The quick <adjective> <noun> jumps <preposition> the lazy <noun>.'

We need to find all the <…> bits; let’s use a regular expression. We can find a literal < character:

 >>> import re
 >>>'<', text)
 <re.Match object; span=(10, 11), match='<'>

Now let’s find that bracket’s mate. The . means “anything,” and we can add a + after it to mean “one or more”. I’ll capture the match to make it easier to see:

 >>> match ='(<.+>)', text)
 '<adjective> <noun> jumps <preposition> the lazy <noun>'

Hmm, that matched all the way to the end of the string instead of stopping at the first available >. It’s common when we use * or + for zero/one or more that the regex engine is “greedy” on the or more part. The pattern matches beyond where we want them to, but they are technically matching exactly what we describe. Remember that . means anything, and a right angle bracket is anything. It matches as many characters as possible until it finds the last right angle to stop which is why this pattern is called “greedy.”

We can make the regex “non-greedy” by changing + to +?:

 >>>'<.+?>', text)
 <re.Match object; span=(10, 21), match='<adjective>'>

Rather than using . for “anything,” it’s more accurate to say that we want to match one or more of anything which is neither of the angle brackets. The character class [<>]matches either bracket. We can negate (or complement) the class by putting a caret (^) as the first character and we have [^<>]. This matches anything which isn’t a left or right-angle bracket:

 >>>'<[^<>]+>', text)
 <re.Match object; span=(10, 21), match='<adjective>'>

Why do we have both brackets inside the negated class? Wouldn’t the right bracket be enough? Well, I’m guarding against unbalanced brackets. With only the right bracket, it matches this text:

 >>>'<[^>]+>', 'foo <<bar> baz')
 <re.Match object; span=(4, 10), match='<<bar>'>

But with both brackets in the negated class, it finds the correct, balanced pair:

 >>>'<[^<>]+>', 'foo <<bar> baz')
 <re.Match object; span=(5, 10), match='<bar>'>

We’ll add two sets of parentheses (), one to capture the entire placeholder pattern:

 >>> match ='(<([^<>]+)>)', text)
 >>> match.groups()
 ('<adjective>', 'adjective')

And another for the string inside the <>:

A handy function called re.findall returns all matching text groups as a list of tuple values:

 >>> from pprint import pprint
 >>> matches = re.findall('(<([^<>]+)>)', text)
 >>> pprint(matches)
 [('<adjective>', 'adjective'),
  ('<noun>', 'noun'),
  ('<preposition>', 'preposition'),
  ('<noun>', 'noun')]

Note that the capture groups are returned in the order of their opening parentheses, so the entire placeholder is the first member of each tuple and the contained text is the second. We can iterate over this list, unpacking each tuple into variables:

 >>> for placeholder, name in matches:
 ...     print(f'Give me {name}')
 Give me adjective
 Give me noun
 Give me preposition
 Give me noun

Figure 1. Because the list contains 2-tuples, we can unpack them into two variables in the for loop.

You should insert the correct article (“a” or “an”, see the “Crow’s Nest” exercise) to use as the prompt for input.

Halting and printing errors

If you find no placeholders in the text, you need to print an error message. It’s common to print error message to STDERR (standard error), and the print function allows us to specify a file argument. We’ll use sys.stderr which is like an already open file handle (no need to open it):

 print('This is an error!', file=sys.stderr)

If there are no placeholders, then we should exit the program with an error value to indicate to the operating system which our program failed to run properly. In the Unix world, the normal exit value is 0 as in “zero errors,” and we need to exit with some int value which isn’t 0. I always use 1:


One of the tests checks if your program can detect missing placeholders and if your program exits correctly.

Getting the values

For each one of those parts of speech, you need a value that comes either from the --inputs argument or directly from the user. If we have nothing for --inputs, then you can use the input function to get some answer from the user. The function takes a str value to use as a prompt:

 >>> value = input('Give me an adjective: ')
 Give me an adjective: blue

And returns a str value of whatever the user typed before hitting the Return key:

 >>> value

If you have values for the inputs, use those and don’t bother with the input function. Assume that you’re always given the correct number of inputs for the number of placeholders in the text.

The inputs are provided in the same order as the placeholders they replace.

Assume this:

 >>> inputs = ['surly', 'car', 'under', 'bicycle']

You need to remove and return the first string, “surly,” from inputs. The list.pop method is what you need, but it wants to remove the last element by default:

 >>> inputs.pop()

The list.pop method takes an optional argument to indicate the index of the element you want to remove. Can you figure out how to make that work?

Substituting the text

When you have values for each of the placeholders, you need to substitute them into the text. I suggest you look into the re.sub function that substitutes text matching a given regular expression for some given text. I recommend you read help(re.sub):

 sub(pattern, repl, string, count=0, flags=0)
     Return the string obtained by replacing the leftmost
     non-overlapping occurrences of the pattern in string by the
     replacement repl.

I don’t want to give away the ending, but you need to use a pattern similar to the one above to replace each <placeholder> with each value.

Note that it’s not a requirement to use the re functions to solve this. I challenge you, in fact, to try writing a manual solution that doesn’t use the re module at all! Now go write the program and use the tests to guide you!


 #!/usr/bin/env python3
 """Mad Libs"""
 import argparse
 import re
 import sys
 # --------------------------------------------------
 def get_args():
     """Get command-line arguments"""
     parser = argparse.ArgumentParser(                                                              
         description='Mad Libs',
                         help='Input file')
                         help='Inputs (for testing)',
     return parser.parse_args()
 # --------------------------------------------------
 def main():
     """Make a jazz noise here"""
     args = get_args()
     inputs = args.inputs
     text =                                           
     blanks = re.findall('(<([^<>]+)>)', text)                                  
     if not blanks:                                                             
         print(f'"{}" has no placeholders.', file=sys.stderr)     
     tmpl = 'Give me {} {}: '                                                   
     for placeholder, pos in blanks:                                            
         article = 'an' if pos.lower()[0] in 'aeiou' else 'a'                          
         answer = inputs.pop(0) if inputs else input(tmpl.format(article, pos)) 
         text = re.sub(placeholder, answer, text, count=1)                      
 # --------------------------------------------------
 if __name__ == '__main__':

The file argument should be a readable file.

The --inputs option may have zero or more strings.

Read the input file, stripping off the trailing newline.

Use a regex to find all the matches for a left angle bracket followed by one or more of anything whichisn’t a left or right-angle bracket followed by a right angle bracket. Use two capture groups to capture the entire expression and the text inside the brackets.

If there are no placeholders….

Print a message to STDERR that the given file name contains no placeholders.

Exit the program with a non-zero status to indicate an error to the operating system.

Create a string template for the prompt to ask for input from the user.

Iterate through the blanks, unpacking each tuple into variables.

Choose the correct article based on the first letter of the name of the part of speech (pos), “an” for those starting with a vowel and “a” otherwise.

If there are inputs, remove the first one for the answer, otherwise use the input to prompt the user for a value.

Replace the current placeholder text with the answer from the user. Use count=1 to ensure that only the first value is replaced. Overwrite the existing value of text to replace all the placeholders by the end of the loop.

Print the resulting text to STDOUT.


Defining the arguments

If you define the file with type=argparse.FileType('r'), then argparse verifies that the value is a file, creating an error and usage if it isn’t, and then open it for you. Quite the time saver. I also define --inputs with nargs='*' to get any number of strings as a list. If nothing is provided, the default value is None; be sure you don’t assume it’s a list and try doing list operations on a None.

Substituting with regular expressions

A subtle bug waits for you to use re.sub. Suppose we replaced the first <adjective> with “blue” and we have this:

 text = 'The quick blue <noun> jumps <preposition> the lazy <noun>.'

Now we want to replace <noun> with “dog,” and try this:

 >>> text = re.sub('<noun>', 'dog', text)

Let’s check on the value of text now:

 >>> text
 'The quick blue dog jumps <preposition> the lazy dog.'

Because there were two instances of the string <noun>, both got replaced with “dog.”

We must use count=1 to ensure that only the first occurence changes:

 >>> text = 'The quick blue <noun> jumps <preposition> the lazy <noun>.'
 >>> text = re.sub('<noun>', 'dog', text, count=1)
 >>> text
 'The quick blue dog jumps <preposition> the lazy <noun>.'

And now we can keep moving to replace the other placeholders.

Finding the placeholders without regular expressions

I trust the explanation of the regex solution in the introduction was sufficient. I find that solution fairly elegant, but it’s certainly possible to solve this without using regexes. Here’s how I might solve it manually.

First I need a way to search the text for <…>. I start off by writing a test that helps me imagine what I might give to my function and what I might expect in return for both good and bad values. I decided to return None when the pattern is missing and to return a tuple of (start, stop) indices when the pattern is present:

 def test_find_brackets():
     """Test for finding angle brackets"""
     assert find_brackets('') == None                 
     assert find_brackets('<>') == None               
     assert find_brackets('<x>') == (0, 2)            
     assert find_brackets('foo <bar> baz') == (4, 8

Because there’s no text, it should return None.

Angle brackets lack any text inside, and this should return None.

The pattern should be found at the beginning of a string.

The pattern should be found further into the string.

Now to write the code that satisfies that test. Here’s what I wrote:

 def find_brackets(text):
     """Find angle brackets"""
     start = text.index('<') if '<' in text else -1                           
     stop = text.index('>') if start >= 0 and '>' in text[start + 2:] else -1 
     return (start, stop) if start >= 0 and stop >= 0 else None               

Find the index of the left bracket if one is found in the text.

Find the index of the right bracket if one is found starting two positions after the left.

If both brackets were found, return a tuple of their start and stop positions, otherwise return None.

This function works well enough to pass the given tests, but it’s not quite correct because it returns a region that contains unbalanced brackets:

 >>> text = 'foo <<bar> baz'
 >>> find_brackets(text)
 [4, 9]
 >>> text[4:10]

That may seem unlikely, but I chose angle brackets to make you think of HTML tags like <head> and <img>. HTML is notorious for being incorrect, maybe because it was hand-generated by a human who messed up a tag or because some tool that generated the HTML had a bug. The point is that most web browsers have to be fairly relaxed in parsing HTML, and it’s not unexpected to see a malformed tag like <<head> instead of the correct <head>.

The regex version, on the other hand, specifically guards against matching internal brackets by using the class [^<>] to define text that can’t contain any angle brackets. I could write a version of find_brackets that finds only balanced brackets, but, honestly, it’s not worth it. This function points out that one of the strengths of the regex engine is that it can find a partial match (the first left bracket), see that it’s unable to make a complete match, and start over (at the next left bracket). Writing this is tedious and, frankly, not that interesting.

Still, this function works for all the given test inputs. Note that it only returns one set of brackets at a time. This is because I’ll alter the text after I find each set of brackets which is likely change the start and stop positions of any following brackets, and it’s best to handle one set at a time.

Here’s how I’d incorporate it into the main function:

 def main():
     args = get_args()
     inputs = args.inputs
     text =
     had_placeholders = False               
     tmpl = 'Give me {} {}: '               
     while True:                            
         brackets = find_brackets(text)     
         if not brackets:                   
         start, stop = brackets             
         placeholder = text[start:stop + 1] 
         pos = placeholder[1:-1]            
         article = 'an' if pos.lower()[0] in 'aeiou' else 'a' 
         answer = inputs.pop(0) if inputs else input(tmpl.format(article, pos)) 
         text = text[0:start] + answer + text[stop + 1:]      
         had_placeholders = True            
     if had_placeholders:                   
         print(f'"{}" has no placeholders.', file=sys.stderr) 

Create a variable to track whether we find placeholders. Assume the worst.

Create a template for the input prompt.

Start an infinite loop. The while continues as long as it has a “truthy” value as True will always be.

Call the find_brackets function with the current value of text.

If the return is None, then this is “falsey.”

If there are no brackets found, use break to exit the infinite while loop.

Now that we know brackets isn’t None, unpack the start and stop values.

Find the entire <placeholder> value by using a string slice with the start and stop values, adding one to the stop to include that index.

The “part of speech” is the bit inside, and this extracts adjective from <adjective>.

Choose the correct article for the part of speech.

Get the answer from the inputs or from an input call.

Overwrite the text using a string slice up to the start, the answer, and then the rest of the text from the stop.

Note that we saw a placeholder.

We exit the loop when we no longer find placeholders. Check if we ever saw one.

If we did see a placeholder, print the new value of the text.

If we never saw a placeholder, print an error message to STDERR.

Exit with a non-zero value to indicate an error.


  • Regular expressions are almost like functions where we describe the patterns we want to find. The regex engine does the work of trying to find the patterns, handling mismatches, and starting over to find the pattern in the text.
  • Regex patterns with * or + are “greedy” in that they match as many characters as possible. Adding a ? after them makes them “not greedy” to match as few characters as possible.
  • The re.findall function returns a list of all the matching strings or capture groups for a given pattern.
  • The re.sub function substitutes a pattern in some text with new text.

Going Further

  • Extend your code to find all the HTML tags enclosed in <…> and </…> in a web page you download from the Internet.

That’s all for this article. If you want to see more, you can preview the book’s contents on our browser-based liveBook reader here.