Regular expressions in Python

Understanding Regular expressions in Python

Regular expressions is a combination of characters representing a particular pattern. This pattern can be used to search for certain strings or words from larger string or textual data. When it comes to Python, Python is a scripting language which is also used for automating development operations and hence requires text processing using regular expressions. In this tutorial, we’ll see how to get started with using Regular expressions in Python.

Getting Started with Regular Expressions in Python

In order to get started with Regular expressions in Python, we’ll import a re built library from Python.


 import re 

Once the library has been imported, we’ll define our regular expression pattern.


 pattern = '^(\d+)$' 

Next try to match the pattern to a test string.

test_string = '100'
result = re.match(pattern,test_string)
if result:
    print 'Match success'
else:
    print 'No match' 

Save the above changes and try running the above Python code. You should have a ‘Match success’ message printed on screen. Now let’s analyze the regular expression.


pattern = '^(\d+)$'
# ^ defines the start of the line
# Parenthesis defines a capturing group
# \d represents one or more digits
# $ defines the end of the line

I hope the above comments clarifies the regular expression. Try to modify the test_string to include a non digit and the Python code would print a No match message. Next we’ll have a look at a couple of regular expression scenarios and how to handle them in Python.

Regular Expression to Find Between Brackets

This is one of the most common scenarios. Suppose we need to find a number from a string. For example, we need to find out (999) from a string:


Consumer number is (999) 

Let’s start by writing the regular expression to find out the number from the string. We would search for a pattern where the searched number starts and ends with a parenthesis and has anything before and after that parenthesis. Here is how the regular expression would look:


^.*?\((\d+)\).*?$

Breaking down the above regular expression;


^      signifies the start of the string
.*?    signifies anything after that
\(     signifies the opening parenthesis
(\d+)  signifies one more digits
\)     signifies closing parenthesis
.*?    signifies anything after the closing parenthesis
$      signifies the end

Let’s add the above regular expression to Python code and test the regular expression. Modify the Python code as shown:


import re
pattern = '^.*?\((\d+)\)$'
result = re.match(pattern,'Consumer number is (999)')
if result:
    print 'Match success\n'
    print 'Consumer number is ' + str(result.group(1))
else:
    print 'No match'

Try running the above code and you should have the following output:


Match success
Consumer number is 999

Finding Patterns inside Files Using FindAll

Sometime you might be having a whole lot of text to find the pattern. Suppose you have a file which needs to scaned to search for a particular regular expression pattern. In such cases, findall comes to your rescue where you can seach the whole file for the pattern. All you got to do is open the file in read mode and apply findall. Have a look the code:


# Open file
f = open('sample.txt', 'r')
pattern = '^.*?\((\d+)\).*?$'

# pass in the pattern and the file content to findall
result = re.findall(pattern, f.read()) 

Wrapping It Up

In this tutorial, you saw a short introduction to using regular expressions in Python. You saw a couple of regular expression examples like finding between the brackets, finding a particular string inside a string. Hope this tutorial gets you started with using regular expressions in Python. Do let us know your thoughts in the comments below.

Have a look at Regexper, an excellent tool for visualizing the regular expressions.