View on GitHub

kickstart_regex

Overview Back (Groups) Next (Substitution)

Lookaround

By using anchors like ^ (line beginning) or $ (line ending) we can define positions in a string that do not consume text but match on the positions.

Lookaround groups act similarly, but we can define exactly what that position should look like. Lookaround groups do not consume any text, they just mark a position. With that we can specify positions like “I expect string subject to the right” or “I expect something that is not a number to the left”.

Lookahead

To look ahead we use a lookahead group defined with (?=). This will check if the group content is “on the right side” of the current RegEx position.

Here is an example:

import re

# will only match Alex if part of Alexander
print(re.search(r"Alex(?=ander)", "My name is Alexander") is not None)

# will not match because `ander` is missing
print(re.search(r"Alex(?=ander)", "My name is Alex") is not None)

So the RegEx works like this:

- Match capital `A`? Yes, proceed.
- Match `l`? Yes, proceed
- Match `e`? Yes, proceed
- Match `x`? Yes, proceed
- Is to the right of that `x` a literal `ander`? Yes, so we have a match

Lookback

We can do the same in the opposite direction by using a lookback group. This group is defined with (?<=). The char < looks like an arrow to the left. This can be used as a mnemonic.

So if we want to match a number only if there is a dollar sign on the left side, we can do something like this:

import re

# will only match if we have a dollar sign before the number
print(re.search(r"(?<=\$)\d+", "The price is $100") is not None)

# will not match, dollar sign is missing
print(re.search(r"(?<=\$)\d+", "The price is 100€") is not None)

It is important to understand that groups with lookaround do not consume any character. They will check if it will match (back or ahead) but they will not be part of the match.

Negative Lookaround

So far we have used lookarounds that will match but not consume. Technically speaking these were positive lookaheads and positive lookbacks. We can also define negative lookaheads and negative lookbacks. These will only match (but not consume) if something is not followed by or not prior to something.

The syntax for lookaround groups are:

Lookahead:
Negative lookahead: (?!)
Positive lookahead: (?=)

Lookback
Negative lookback: (?<!)
Positive lookback : (?<=)

We will practice these lookarounds in the next chapter.

Overview Back (Groups) Next (Substitution)