Lesson 1 - Basic Syntax and Character Matching
In this lesson, you will learn about how to use RegEx as a basic text searching tool, as well as some of the metacharacters it provides to be able to do some more advanced searches.
Table of Contents
Lesson Objectives
- Use RegEx as a basic text searching tool.
- Learn about and use metacharacters to create some more advanced searches.
- Search for ranges of characters using character sets and ranges.
Regular Characters
At its simplest, you can search for regular characters and words using regular expressions, much like other search tools (such as your browser’s Ctrl+F search). Using any characters from A-Z, a-z, and 0-9 work like usual – do keep in mind however that searches are case-sensitive.
For the examples below, we’ll search the following text:
Sample Text: This is an example text showcasing regular characters such as A, b, 1, and more.
In the search below, we look for any occurrences of the term “example”.
In this next example search, we look for any occurrences of the letter s
. Note that it only highlights the lowercase letters.
Symbols
Some symbols, such as .
, ?
, and !
, have a special meaning when using regular expressions. If you simply wanted to search for all occurences of these types of symbols, preceed the symbol with a backslash (\).
Below, we show what happens if you only use .
as your regular expression. Instead of searching for all occurrences of .
, it will highlight every single character.
If you wanted to search for the occurrences of the character .
, you’ll need to add a backslash before the period. This can be said about any other special character.
Metacharacters
Rather than specifying the characters and symbols, you can search for more broad queries.
The Dot (.)
The dot (.) matches any singular character, with the exception of newlines. By itself, it can’t do much – it will just highlight the entire text. However, it becomes extremely useful in some contexts, which you’ll see later in the module.
The example below uses the search query i.
, which will return all occurences of the letter i
followed by any character.
Whitespace
The \s
metacharacter matches any whitespace character, which includes spaces, tabs, newlines, null characters, and carriage returns. With a simple test string such as this one, it will only highlight the spaces. However, if you were to create a couple paragraphs, the tabs at the start of each paragraph as well as the newline (enter) characters will also be highlighted.
You can also search for any non-whitespace character using \S
.
If you are instead searching for a specific whitespace character, you can use…
- \n
- Newline
- \t
- Tab
- \0
- Null character
- \r
- Carriage returns
Digits
The \d
metacharacter matches any digit character (0-9).
Like with whitespace, \D
searches for any non-digit character.
Character Sets and Ranges
You can also create your own character sets for which characters to match using square brackets. Do not separate these characters by spaces or commas, otherwise the RegEx will also search for all occurrences of “ “ (space) and ,
as well.
If you want to match several characters sequentially, you can use a character range instead. The range a-d
searches for the letters a, b, c, and d. The range a-z
similarly searches for every lowercase letter from a to z.
By adding a caret symbol (^) at the start of your character set, the search will look for any character that is not in the set. This example below searches for all characters that are not a, b, c, or d.
Key Points / Summary
- You can use RegEx as a basic text search tool if you’re just looking of words.
- Metacharacters can help you create more advanced queries.
- Character sets and ranges can help you look for ranges of characters without having to specify each character seperately.