String and Regular Expression

By ukmodak | March 31st 2024 10:36:23 AM | viewed 501 times

Regular Expressions

Regular expressions are commonly known as regex. These are nothing more than a pattern or a sequence of characters, which describe a special search pattern as text string.

Regular expression allows you to search a specific string inside another string. Even we can replace one string by another string and also split a string into multiple chunks. They use arithmetic operators (+, -, ^) to create complex expressions.

By default, regular expressions are case sensitive.

Advantage and uses of Regular Expression

Regular expression is used almost everywhere in current application programming. Below some advantages and uses of regular expressions are given:

  1. Regular expression helps the programmers to validate text string.
  2. It offers a powerful tool to analyze and search a pattern as well as to modify the text string.
  3. By using regexes functions, simple and easy solutions are provided to identify the patterns.
  4. Regexes are helpful for creating the HTML template system recognizing tags.
  5. Regexes are widely used for browser detection, form validation, spam filtration, and password strength checking.
  6. It is helpful in user input validation testing like email address, mobile number, and IP address.
  7. It helps in highlighting the special keywords in file based upon the search result or input.
  8. Metacharacters allow us to create more complex patterns.
You can create complex search patterns by applying some basic rules of regular expressions. Many arithmetic operators (+, -, ^) are also used by regular expressions to create complex patterns.

Operators in Regular Expression

Operator Description
^ It indicates the start of string.
$ It indicates the end of the string.
. It donates any single character.
() It shows a group of expressions.
[] It finds a range of characters, e.g., [abc] means a, b, or c.
[^] It finds the characters which are not in range, e.g., [^xyz] means NOT x, y, or z.
- It finds the range between the elements, e.g., [a-z] means a through z.
| It is a logical OR operator, which is used between the elements. E.g., a|b, which means either a OR b.
? It indicates zero or one of preceding character or element range.
* It indicates zero or more of preceding character or element range.
+ It indicates zero or more of preceding character or element range.
{n} It denotes at least n times of preceding character range. For example - n{3}
{n, } It denotes at least n, but it should not be more than m times, e.g., n{2,5} means 2 to 5 of n.
{n, m} It indicates at least n, but it should not be more than m times. For example - n{3,6} means 3 to 6 of n.
\ It denotes the escape character.

Special character class in Regular Expression

Special Character Description
\n It indicates a new line.
\r It indicates a carriage return.
\t It represents a tab.
\v It represents a vertical tab.
\f It represents a form feed.
\xxx It represents an octal character.
\xxh It denotes hexadecimal character hh.

PHP offers two sets of regular expression functions:

  • POSIX Regular Expression
  • PERL Style Regular Expression

POSIX Regular Expression

The structure of POSIX regular expression is similar to the typical arithmetic expression: several operators/elements are combined together to form more complex expressions.

The simplest regular expression is one that matches a single character inside the string. For example - "g" inside the toggle or cage string. Let's introduce some concepts being used in POSIX regular expression:

Brackets

Brackets [] have a special meaning when they are used in regular expressions. These are used to find the range of characters inside it.

Expression Description
[0-9] It matches any decimal digit 0 to 9.
[a-z] It matches any lowercase character from a to z.
[A-Z] It matches any uppercase character from A to Z.
[a-Z] It matches any character from lowercase a to uppercase Z.

The above ranges are commonly used. You can use the range values according to your need, like [0-6] to match any decimal digit from 0 to 6.

Quantifiers

A special character can represent the position of bracketed character sequences and single characters. Every special character has a specific meaning. The given symbols +, *, ?, $, and {int range} flags all follow a character sequence.

Expression Description
p+ It matches any string that contains atleast one p.
p* It matches any string that contains one or more p's.
p? It matches any string that has zero or one p's.
p{N} It matches any string that has a sequence of N p's.
p{2,3} It matches any string that has a sequence of two or three p's.
p{2, } It matches any string that contains atleast two p's.
p$ It matches any string that contains p at the end of it.
^p It matches any string that has p at the start of it

POSIX Function

PHP provides seven functions to search strings using POSIX-style regular expression -

Function Description
ereg() It searches a string pattern inside another string and returns true if the pattern matches otherwise return false.
ereg_replace() It searches a string pattern inside the other string and replaces the matching text with the replacement string.
eregi() It searches for a pattern inside the other string and returns the length of matched string if found otherwise returns false. It is a case insensitive function.
eregi_replace() This function works same as ereg_replace() function. The only difference is that the search for pattern of this function is case insensitive.
split() The split() function divide the string into array.
spliti() It is similar to split() function as it also divides a string into array by regular expression.
Sql_regcase() It creates a regular expression for case insensitive match and returns a valid regular expression that will match string.

PERL Style Regular Expression

Perl-style regular expressions are much similar to POSIX. The POSIX syntax can be used with Perl-style regular expression function interchangeably. The quantifiers introduced in POSIX section can also be used in PERL style regular expression.

Metacharacters

A metacharacter is an alphabetical character followed by a backslash that gives a special meaning to the combination.

For example - '\d' metacharacter can be used search large money sums: /([\d]+)000/. Here /d will search the string of numerical character.

Below is the list of metacharacters that can be used in PERL Style Regular Expressions -

Character Description
. Matches a single character
\s It matches a whitespace character like space, newline, tab.
\S Non-whitespace character
\d It matches any digit from 0 to 9.
\D Matches a non-digit character.
\w Matches for a word character such as - a-z, A-Z, 0-9, _
\W Matches a non-word character.
[aeiou] It matches any single character in the given set.
[^aeiou] It matches any single character except the given set.
(foo|baz|bar) Matches any of the alternatives specified.
Modifiers

There are several modifiers available, which makes the work much easier with a regular expression. For example - case-sensitivity or searching in multiple lines, etc.

Below is the list of modifiers used in PERL Style Regular Expressions -

Character Description
i Makes case insensitive search
m It specifies that if a string has a carriage return or newline characters, the $ and ^ operator will match against a newline boundary rather than a string boundary.
o Evaluates the expression only once
s It allows the use of .(dot) to match a newline character
x This modifier allows us to use whitespace in expression for clarity.
g It globally searches all matches.
cg It allows the search to continue even after the global match fails.

PHP Regexp PERL Function

PHP currently provides seven functions to search strings using POSIX-style regular expression -

Function Description
preg_match() This function searches the pattern inside the string and returns true if the pattern exists otherwise returns false.
preg_match_all() This function matches all the occurrences of pattern in the string.
preg_replace() The preg_replace() function is similar to the ereg_replace() function, except that the regular expressions can be used in search and replace.
preg_split() This function exactly works like split() function except the condition is that it accepts regular expression as an input parameter for pattern. Mainly it divides the string by a regular expression.
preg_grep() The preg_grep() function finds all the elements of input_array and returns the array elements matched with regexp (relational expression) pattern.
preg_quote() Quote the regular expression characters.
bONEandALL
Visitor

Total : 26654

Today :3

Today Visit Country :

  • France
  • United States