Python RegEx

python-regex-ipcisco-com-11

There are specific signs that are used in system engineering, networking and programming. With these regular expressions, some duties are done with specific signs instead of writing long lines. For example, in Linux there are some Regular Expressions. There are also regular expressions used with Routing protocol BGP in networking. In Python Programming, there are also Regular Expressions that are called RegEx. In this lesson, we will learn the details of Python RegEx.

 

Basically, a Python RegEx is a sequence of characters that has a specific meaning in python programming. There are different Python RegEx statements. To use them we need to import a python module. This python regEx module is re module.

 

RegEx Functions

 

There are different Python RegEx functions reside in re module. We can define them in four categories. These are:

  • search
  • findall
  • split
  • sub

 

search function checks a string for any match.

findall function checks for matchings and returns with a list.

split function returns with a list where the string split for each match.

sub function replaces matches in the string

 

Below, we will focus on the details of these functions with examples. But firstly, let’s learn also Metacharacters, sets, special characters used with Python RegEx.

 


You can also watch the video of this lesson!

search() Function

 

search() function is used to check any characters in a string and if it finds it returns the index of the first find characters.

 

import re
message = "I like Football"
x = re.search("F", message)
print(x.start())

 

The output of this code will be:

 

7

python-search-function-regex-ipcisco-1


 

findall() Function

 

findall() function is used to check if the whole given characters resides in the given string. If it finds, it returns with the searched characters every time it finds it.

 

import re
message = "Hagi is a perfect footballer.Hagi is from Romania."
a = re.findall("Hagi", message)
print(a)
['Hagi', 'Hagi']

 

 

python-findall-function-regex-ipcisco-1


 

split() Function

 

split() function returns with a list created with the items that are stripted with a specified character.

 

Below, we can create a list with the words of a sentences by using space character as split point.

 

import re
message = "I like Football"
x = re.split("\s", message)
print(x)

 

The return of this code is like below:

 

['I', 'like', 'Football']

python-split-function-regex-ipcisco-1

 

To split the given string from any specified split point, we can use an extra parameter. Below, we will split the string from the second found space character.

 

import re
message = "I like Footbal Basketball Volleyball"
x = re.split("\s", message, 2)
print(x)

 

The output will be:

 

['I', 'like', 'Footbal Basketball Volleyball']

 


 

sub() Function

 

sub() function is used to replace the given characters with the find characters.

 

Below, we will find space characters and we will change space character with dash(-) character.

 

import re
message = "I like Football"
x = re.sub("\s", "-", message)
print(x)

 

The output will be:

 

I-like-Football

 

python-sub-function-regex-ipcisco-1

 

Again, we can use a second parameter to show the places that we will do this change.

 

import re
message = "I like Footbal"
x = re.sub("\s", "-", message,1)
print(x)

 

I-like Footbal

 


 

RegEx Metacharacters

 

Beside functions, there are metacharacters used with Python RegEx that has specific meanings. So what are these metacharacters used with Python RegEx? These are given below:

 

  • []     Used for a set of characters.
  • \      Used to signal a special sequence.
  • .       Used for any character.
  • ^     Used to show starting with.
  • $     Used to show ending with.
  • *      Shows zero or more occurrences.
  • +     Shows one or more occurrences.
  • {}    Shows exactly the specified number of items.
  • |      Shows either or.
  • ()     Captures and groups.

python-regex-metacharacters-ipcisco-com-1


 

Sets used With Python RegEx

 

Sets used with Python RegEx are the specific statements in square brackets ( [] ). The meaning of these square brackets are different according to the used characters in these square brackets. So, wghat re these set statements used with RegEx. Below, you can find some of them and their meaning:

 

 

[xyz]      Returns with the characters that match any of these characters (x,y,z).

 

This can be also done with the numbers like [135]. Here, we can check 1,3,5 in the message.

 

In the below, example, we will check “l” and “y” characters in the message and the code will return with the characters that watch any of these characters.

 

import re
message = "Hello, How are you?"
a = re.findall("[ly]", message)
print(a)

 

The return of this code will be:

 

['l', 'l', 'y']

 

python-regex-set-ipcisco-1

 

[a-x]      Returns with the character that matches any character between a and x alphabetically.

 

This can be also done with the numbers like [1-5]. Here, we can check if are there any line includes these digits.

 

Below, we will find if are there any character between “a” and “f” in the message.

 

import re
message = "abcdefghijklmn"
a = re.findall("[a-f]", message)
print(a)

 

The output of this code will be:

 

['a', 'b', 'c', 'd', 'e', 'f']

python-regex-set-ipcisco-2

 

[xyz]      Returns with the characters other that any of these characters (x,y,z).

 

Below, we will check the message and return with the characters different than the specified characters.

 

import re
message = "abcdefg"
a = re.findall("[^cdg]", message)
print(a)

 

The output will be like below:

 

['a', 'b', 'e', 'f']

python-regular-expressions-ipcisco.com-1

 

[0-9][0-9]            Returns with ant two-digit number match between 00 and 99.

 

We can do this with three or more digits like [0-9][0-9][0-9] or [0-9][0-9][0-9][0-9] etc.

 

Below, we will check an address and return with the two digit numbers if are there any in it.

 

import re
message = "Houston Street No:15 Room:23"
a = re.findall("[0-9][0-9]", message)
print(a)

 

The output will be:

 

['15', '23']

 

python-regular-expressions-ipcisco.com-2

 

[a-zA-Z]               Returns with any match from a to z both lower and upper cases in a string.

 

import re
message = "No:15 Room:23"
a = re.findall("[a-zA-Z]", message)
print(a)

 

The output is

 

['N', 'o', 'R', 'o', 'o', 'm']

python-regular-expressions-ipcisco.com-3


 

Special Sequences used With Python RegEx

 

Special sequences are used with the help of “/” character. After this sign, a specific lower or upper case letters is used. Below, you can find the specific sequences and examples for them.

 

\A          It is used to check if are there any specific characters reside at the beginning of a string.

 

\b          It is used to check if are there any specific character reside at the beginning or at the end of the string.

 

\B          It is used to check if are there any specific character reside but not at the beginning or at the end of the string.

 

\d          It is used to check if string has any digits.

 

\D          It is used to get others than digits in a string.

 

\s           It is used to get the white spaces in a string.

 

\S           It is used to get others than white spaces in a string.

 

\w          It is used to get word characters in a string.

 

\W         It is used to get others than word characters in a string.

 

\Z           It is used to check if specific characters are at the end of a string.

 

python-regex-special-sequences-11-ipcisco-com

 

 

python-regex-special-sequences-22-ipcisco-com


 

Last Word on Regular Expressions in Python

 

In this lesson, we have learned specific usages in Python programming. In other words, we have done practice on Python RegEx Statements. As we have discussed above, Python RegEx usage can be done with the help of different functions, special characters, sets and metacharacters. You can improve your Python Regular Expression skills with using these expressions more in your codes.

 

 

 

 

 

 

Back to: Python Programming Course > Python Basics

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact

  • info@ipcisco.com
IPCisco is the Winner! “Best Certification Study Journey of 2019!”

Cisco-ITBlogAwards-2019-Winner-IPCisco-k