Regular expressions
This section covers:
Introduction
Regular expressions (or RegEx for short) are used within many Osirium PAM templates. Here we will show you some of the common uses of regular expressions and that you only need to understand a few regex metacharacters.
RegEx 101
From Wikipedia (http://en.wikipedia.org/wiki/Regular_expression):
Each character in a regular expression is either understood to be a metacharacter with its special meaning, or a regular character with its literal meaning. Together, they can be used to identify textual material of a given pattern or process a number of instances of it that can vary from a precise equality to a very general similarity of the pattern. The pattern sequence itself is an expression that is a statement in a language designed specifically to represent prescribed targets in the most concise and flexible way to direct the automation of text processing of general text files, specific textual forms, or of random input strings.
Common metacharacters
The following a small subset of the many metacharacters available within regular expressions.
However these are most of the metacharacters that you'll find used within Osirium PAM.
Metacharacter(s) | Meaning |
---|---|
^ | Start of line. |
*$ | End of line. |
+ | 1 or more of something. |
* | 0 or more of something. |
{5} | Match exactly 5 of something. |
{2,5} | Match between 2 and 5 of something. |
. | Any character. |
.+ | 1 or more of any characters. |
.* | 0 or more of any characters. |
\. | Match the . character. |
\+ | Match the + character. |
\s | Whitespace. |
\d | Any digit 0-9. |
\w | Any word character a-z A-Z and underscore. |
[ ] | Match anything in the set, i.e. [ABC] matches A or B or C. |
*| | Or two strings, i.e cent(er|re). |
() | Capture Group. |
The most common use of a regex is to extract some data from the response of a device. This could be reading the version, or extracting a username for example. This is done using Capturing Groups.
Capturing groups
Capturing Groups allow you to define a regular expression and say that you want a specific part of the match to be pulled out and held as a separate substring. For example, we might want to read the version number from a device, but we don't want the other text around it.
Getting the device to tell us it's version might look like this::
1 2 3 4 5 6 7 8 9 10 11 |
|
The bit we might be after is the 10
on the Version... line.
The regex to do this would be::
1 |
|
This breaks down to:
Part of Regex | Meaning |
---|---|
^ | Start of line. |
\s* | 0 or more whitespace characters. [#fn1]_ |
Version: | The literal text "Version:" |
\s+ | 1 or more whitespace characters. |
(\d+) | Capture 1 or more digits 0-9 or underscores. |
\. | The literal . character, but don't include it in the capturing group as it's outside the capture group brackets. |
.* | Any number of any characters. |
$ | End of line. |
More examples
Some more examples:
Regex | Input String (highlighted is what is captured) |
---|---|
^.(Example).$ | This is a RegEx Example String |
^Version:\s+(\d+\.\d+).*$ | Version: 10.45 |
^[\w\s]+\sa\s(\w+).*$ | A little bit of text goes a long way |
^[\d\s]{12}(\d).*$ | 1 2 3 4 5 6 7 8 9 10 |
Useful regex websites
-
RegExPlanet: This is a good website to 'fiddle' with regular expressions. It allows you to pick the regular expression engine (make sure you pick Python as that is what the Osirium PAM uses to parse it's templates!).
-
Regexper:This is a great website to help visualise what is happening as you step through a regular expression.
-
Regexr: This another good fiddle website which list the various meta characters and allows you to double click to include them in your expression.
Note
One very important point to note is that in actual fact that Osirium PAM strips ALL leading and trailing whitespace from device responses. So the \s* in the example isn't actually needed but because it uses * which means 0 or more, the regex shown still works.