Regular Expressions
A Regular Expression (regex) is a text string which describes a pattern to find text. You can visit https://www.regular-expressions.info/tutorial.html to learn more about Regular Expressions. However, I will teach you the minimum you need to take from this course.
Literal Characters
Consider This is a test. sentence. Consider the most basic regular expression which is consist of a single character, such as i. It matches all occurrences of letter i This is a test. Note that regex engines are case sensitive by default.
Special Characters
We need some characters for special use. Why? Because we want more than search for literal characters. Here are the special characters: the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening parenthesis (, the closing parenthesis ), the opening square bracket [, and the opening curly brace {, These special characters are often called “metacharacters”. If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash. For example, how to match . in This is a test. ? The correct regex is \..
Quick-Start Table
| Character | Description | Example |
|---|---|---|
| . | Matches any character (a single character) | a. matches ab but not abc |
| * | Matches zero or more characters or patterns | a* does not match a but matches ab and abc |
| + | Matches one or more characters or patterns | a+ matches a, ab, and abc |
| ? | Matches one or no instances of the character or pattern | a?c matches ac, abc, but not abbc |
| \ | Escapes a special character | \\ matches \ |
| | | Or | 2|100 matches 2 or 100 |
| [] | Matches any one of characters within the brackets | [bc]ar matches bar, or car, but not bcar or far |
| [m-n] | One of the characters in the range from m to n | [5-8] matches 5, 6, 7, 8 |
| ( ) | Captures a group | (b|c)ar matches bar and car |
| \n | Unix text files terminate lines with a single newline \n. The scripting languages as well as Windows text files normally break lines with a \r\n pair. | ab\ncd matches as a whole: ab cd |
| ^ | Indicates the start of a string | ^f matches foo |
| $ | Indicates the end of a string | o$ matches foo |
| \s | Most engines: Matches whitespace (space, tab, \n, \r) | Hello\sWorld\! matches Hello World! |
| [[:alnum:]]{m} | Matches m digit number | [[:alnum:]]{2} matches 10, 15, but not 5, 100 |
I found this reference quiet interesting should you need to learn more about regex.
What is a text file?
A text file is a normal file that contains human-readable text. The other kind of file, a binary file, is meant to be interpreted by the computer.
grep
grep searches through files and folders and prints the lines that match patterns. some of the most commonly used switches (options) include -r for recursive searching and -i to ignore the case sensitivity.
┌──(kali㉿kali-1)-[~]
└─$ ls -ltrha /etc/systemd/ | grep conf
-rw-r--r-- 1 root root 931 Jan 18 06:35 sleep.conf
-rw-r--r-- 1 root root 670 Jan 18 06:35 pstore.conf
-rw-r--r-- 1 root root 846 Jan 18 06:35 networkd.conf
-rw-r--r-- 1 root root 1.4K Jan 26 18:35 user.conf
-rw-r--r-- 1 root root 841 Jan 26 18:35 timesyncd.conf
-rw-r--r-- 1 root root 2.0K Jan 26 18:35 system.conf
-rw-r--r-- 1 root root 1.4K Jan 26 18:35 resolved.conf
-rw-r--r-- 1 root root 1.4K Jan 26 18:35 logind.conf
-rw-r--r-- 1 root root 1.3K Jan 26 18:35 journald.conf
To match using regex with grep we use switch -E followed by our Regular Expression. For example, consider this tree.txt file
┌──(kali㉿kali-1)-[~/Documents]
└─$ cat tree.txt
.
├── EBOOKS
│ ├── ebook_100.pdf
│ ├── ebook_101.pdf
│ ├── ebook_102.pdf
│ ├── ebook_10.pdf
│ ├── ebook_11.pdf
│ ├── ebook_12.pdf
│ ├── ebook_1.pdf
│ ├── ebook_2.pdf
│ ├── ebook_3.pdf
│ ├── ebook_4.pdf
│ └── ebook_5.pdf
├── NOTES
│ ├── note_1
│ ├── note_10
│ ├── note_100
│ ├── note_2
│ ├── note_3
│ ├── note_4
│ ├── note_9
│ └── note_90
└── tree.txt
2 directories, 20 files
Now, we want to find note_2 and note_100 in the following file.
┌──(kali㉿kali-1)-[~/Documents]
└─$ cat tree.txt | grep -E "(note_(2|100))"
│ ├── note_100
│ ├── note_2
Given the text file below, write a grep -E commands to extract the lines containing an email address. write another grep -E command to extract the lines containing IP addresses:
┌──(kali㉿kali-1)-[~]
└─$ cat list
a.b.c.d
10.1.0.12
127.0.0.1
Hello@world
256.0.1.2
172.20.65.1
test@example.com
me@example.co.uk
not@valid@email.com
Here is my answer; you might come up with another answers to:
┌──(kali㉿kali-1)-[~]
└─$ grep -E "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}$" list
test@example.com
me@example.co.uk
┌──(kali㉿kali-1)-[~]
└─$ grep -E "^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$" list
10.1.0.12
127.0.0.1
172.20.65.1
sed
sed is a very powerful stream editor. It is very complex tool and I cover the surface here.
Let’s create a stream of text then pipe it to sed to replace a string on it.
s/regexp/replacement/ attempts to match regexp against the pattern space. If successful, replace that portion matched with replacement.
┌──(kali㉿kali-1)-[~]
└─$ echo "Hi World." | sed 's/Hi/Hello/'
Hello World.
Let’s try another example which has multiple string of Hellos. note that g applies the replacement to all matches to the regexp, not just the first.
┌──(kali㉿kali-1)-[~]
└─$ echo Hello World! Who said Hello? I said hello. | sed 's/Hello/Hi/'
Hi world! Who said Hello? I said hello.
┌──(kali㉿kali-1)-[~]
└─$ echo Hello World! Who said Hello? I said hello. | sed 's/Hello/Hi/g'
Hi World! Who said Hi? I said hello.
The comment lines start with sharp sign # in Linux and most programming languages. The following command will remove commented lines on a text file when displaying on stdout. d deletes pattern space.
┌──(kali㉿kali-1)-[~]
└─$ sed '/^#/d' /etc/adduser.conf
DSHELL=/bin/bash
DHOME=/home
GROUPHOMES=no
LETTERHOMES=no
SKEL=/etc/skel
FIRST_SYSTEM_UID=100
LAST_SYSTEM_UID=999
FIRST_SYSTEM_GID=100
LAST_SYSTEM_GID=999
FIRST_UID=1000
LAST_UID=59999
FIRST_GID=1000
LAST_GID=59999
USERGROUPS=yes
USERS_GID=100
DIR_MODE=0755
SETGID_HOME=no
QUOTAUSER=""
SKEL_IGNORE_REGEX="dpkg-(old|new|dist|save)"
Here is another example which complements the last example. I am going to delete the empty lines as well:
┌──(kali㉿kali-1)-[~]
└─$ sed '/^\s*#/d;/^$/d' /etc/adduser.conf
DSHELL=/bin/bash
DHOME=/home
GROUPHOMES=no
LETTERHOMES=no
SKEL=/etc/skel
FIRST_SYSTEM_UID=100
LAST_SYSTEM_UID=999
FIRST_SYSTEM_GID=100
LAST_SYSTEM_GID=999
FIRST_UID=1000
LAST_UID=59999
FIRST_GID=1000
LAST_GID=59999
USERGROUPS=yes
USERS_GID=100
DIR_MODE=0755
SETGID_HOME=no
QUOTAUSER=""
SKEL_IGNORE_REGEX="dpkg-(old|new|dist|save)"
cut
The cut command is very simple and handy. It is used to separate the fields. We need to give it delimiter with switch -d.
┌──(kali㉿kali-1)-[~]
└─$ cut -d ":" -f 1 /etc/passwd
root
daemon
bin
sys
sync
games
man
lp
mail
news
┌──(kali㉿kali-1)-[~]
└─$ echo "Names: Mustermenn, Uve, Klara" | cut -d "," -f 3
Klara
awk
AWK is a programming language designed for text processing. Like sed and grep, it is a filter and is a standard feature of most Unix-like operating systems. Most commonly used switch is -F (field separator) and the print command which prints the result.
┌──(kali㉿kali-1)-[~]
└─$ echo "Names: Mustermenn, Uve, Klara" | awk -F "," '{print $1, $3}'
Names: Mustermenn Klara
vi
vi is a text editor (analogous to Windows text editor Notepad) which is installed on every POSIX-compliant system. It is very powerful although It is difficult to get used to it. To edit a file pass the name of the file as an argument to vi:
┌──(kali㉿kali-1)-[~]
└─$ vi hello_world.txt
Once the file is opened, enable insert-text mode (hit i) to begin typing. To disable insert-text mode and go back to command mode, press the Esc key. In the end you typing :q! quits the vi without saving; hitting :wq! quits with saving the file. Here is good documentation should you need learn on vi.
Today, in one minute, is my last day in my current company I am working for.