Regular expressions

This section covers:

Introduction

Regular expressions (or RegEx for short) are used within many Osirium PAM templates. Here we will show you some of the common uses of regular expressions and that you only need to understand a few regex metacharacters.

RegEx 101

From Wikipedia (http://en.wikipedia.org/wiki/Regular_expression):

Each character in a regular expression is either understood to be a metacharacter with its special meaning, or a regular character with its literal meaning. Together, they can be used to identify textual material of a given pattern or process a number of instances of it that can vary from a precise equality to a very general similarity of the pattern. The pattern sequence itself is an expression that is a statement in a language designed specifically to represent prescribed targets in the most concise and flexible way to direct the automation of text processing of general text files, specific textual forms, or of random input strings.

Common metacharacters

The following a small subset of the many metacharacters available within regular expressions.

However these are most of the metacharacters that you'll find used within Osirium PAM.

Metacharacter(s)	Meaning
^	Start of line.
*$	End of line.
+	1 or more of something.
*	0 or more of something.
{5}	Match exactly 5 of something.
{2,5}	Match between 2 and 5 of something.
.	Any character.
.+	1 or more of any characters.
.*	0 or more of any characters.
\.	Match the . character.
\+	Match the + character.
\s	Whitespace.
\d	Any digit 0-9.
\w	Any word character a-z A-Z and underscore.
[ ]	Match anything in the set, i.e. [ABC] matches A or B or C.
*\|	Or two strings, i.e cent(er\|re).
()	Capture Group.

The most common use of a regex is to extract some data from the response of a device. This could be reading the version, or extracting a username for example. This is done using Capturing Groups.

Capturing groups

Capturing Groups allow you to define a regular expression and say that you want a specific part of the match to be pulled out and held as a separate substring. For example, we might want to read the version number from a device, but we don't want the other text around it.

Getting the device to tell us it's version might look like this::

   [admin@f5-ltm-ve:Active] ~ # tmsh show sys version

   Sys::Version
   Main Package
     Product  BIG-IP
     Version  10.1.0
     Build    3341.1084
     Edition  Final
     Date     Sat Feb  6 01:05:54 PST 2010

   [admin@f5-ltm-ve:Active] ~ #

The bit we might be after is the 10 on the Version... line. The regex to do this would be::

1	`^\sVersion:\s+(\d+)\..$`

This breaks down to:

Part of Regex	Meaning
^	Start of line.
\s*	0 or more whitespace characters. [#fn1]_
Version:	The literal text "Version:"
\s+	1 or more whitespace characters.
(\d+)	Capture 1 or more digits 0-9 or underscores.
\.	The literal . character, but don't include it in the capturing group as it's outside the capture group brackets.
.*	Any number of any characters.
$	End of line.

More examples

Some more examples:

Regex	Input String (highlighted is what is captured)
^.(Example).$	This is a RegEx Example String
*^Version:\s+(\d+\.\d+).$**	Version: 10.45
*^[\w\s]+\sa\s(\w+).$**	A little bit of text goes a long way
*^[\d\s]{12}(\d).$**	1 2 3 4 5 6 7 8 9 10

Useful regex websites

RegExPlanet: This is a good website to 'fiddle' with regular expressions. It allows you to pick the regular expression engine (make sure you pick Python as that is what the Osirium PAM uses to parse it's templates!).
Regexper:This is a great website to help visualise what is happening as you step through a regular expression.
Regexr: This another good fiddle website which list the various meta characters and allows you to double click to include them in your expression.

Note

One very important point to note is that in actual fact that Osirium PAM strips ALL leading and trailing whitespace from device responses. So the \s* in the example isn't actually needed but because it uses * which means 0 or more, the regex shown still works.