Difference between revisions of "Regular Expressions"

From Freephile Wiki
Jump to navigation Jump to search
(New page: Regex is short for Regular Expression and is a syntax that allows for powerful pattern matching. One important use of regular expressions is in multi-line, multi-file editing. For exampl...)
 
Line 20: Line 20:
 
PCRE was originally written for the Exim MTA, but is now used by many high-profile open source projects, including Apache, PHP, KDE, Postfix, Analog, and Nmap
 
PCRE was originally written for the Exim MTA, but is now used by many high-profile open source projects, including Apache, PHP, KDE, Postfix, Analog, and Nmap
  
 +
== Should I use perl, bash, php, awk, sed, ...? ==
 +
PHP has rich regular expression support. Perl obviously does too.  So when you're at the command line with BASH, what's the best way to quickly search some content for a pattern using a rich regular expression?  It can be hard to use bash because of all the quoting and interpolation.  But, let's look at a couple examples of searching a PHP configuration file for variable assignments.
 +
Using perl, it's easy to print out only the parenthetical sub-expression
 +
<source lang="bash">
 +
perl -ne 'print $1 if /\$wgDBuser.*"(.*)"/' ./LocalSettings.php
 +
</source>
 +
Using grep, you have \K for variable length look-behind but it may not be available on older systems.  Thus, you may need to use cut
 +
<source lang="bash">
 +
grep -Po '(?<=\$wgDBuser).*"(.*)"' ./LocalSettings.php | cut -d \" -f 2
 +
</source>
  
 
== Resources ==
 
== Resources ==

Revision as of 14:31, 27 August 2016

Regex is short for Regular Expression and is a syntax that allows for powerful pattern matching.

One important use of regular expressions is in multi-line, multi-file editing. For example, let's say you have 10,000 files and you want to edit similar (but not exact) occurrences of strings within those files. Regular expressions could help you isolate the target strings, and with precision, edit just the parts you need to edit while retaining the parts you need to keep.

Multiline Edits[edit | edit source]

Most graphical text editors or word processors have a single line input for the Search/Replace dialog. This is unsuitable for many text edit situations where the string you're looking to replace spans multiple lines.

Quanta integrates KFileReplace, which you can launch standalone or use within Quanta to do multiline regex-capable search and replace. Since the 'Advanced Search/Replace' dialog does not allow multiline text input, it's a bit awkward to do this from Quanta. However, once the KFileReplace part is open in Quanta, you can edit your search session any way you like including entering multiple line text as the search 'needle'.

To launch KFileReplace standalone, press the 'Alt + F2' keys and type 'KFileReplace' (enter)

Single Line[edit | edit source]

Many editors and utilities are line based so even though they support a regular expression syntax for pattern matching, it is only good if the target does not span more than a single line. This is typically very problematic for code, XML, or HTML content where content is almost always in multiline "blocks" like function definitions, nodes, or paragraphs.

Background[edit | edit source]

Most implementations that we are concerned with utilize the PCRE Perl Compatible Regular Expression library.

The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. PCRE has its own native API, as well as a set of wrapper functions that correspond to the POSIX regular expression API. The PCRE library is free, even for building commercial software.

PCRE was originally written for the Exim MTA, but is now used by many high-profile open source projects, including Apache, PHP, KDE, Postfix, Analog, and Nmap

Should I use perl, bash, php, awk, sed, ...?[edit | edit source]

PHP has rich regular expression support. Perl obviously does too. So when you're at the command line with BASH, what's the best way to quickly search some content for a pattern using a rich regular expression? It can be hard to use bash because of all the quoting and interpolation. But, let's look at a couple examples of searching a PHP configuration file for variable assignments. Using perl, it's easy to print out only the parenthetical sub-expression

perl -ne 'print $1 if /\$wgDBuser.*"(.*)"/' ./LocalSettings.php

Using grep, you have \K for variable length look-behind but it may not be available on older systems. Thus, you may need to use cut

grep -Po '(?<=\$wgDBuser).*"(.*)"' ./LocalSettings.php | cut -d \" -f 2

Resources[edit | edit source]

Regex in Javascript
http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:RegExp
Regex in PHP
http://us3.php.net/manual/en/ref.pcre.php
Limitations and Manual
http://www.pcre.org/pcre.txt
Regex on Windows
http://weitz.de/regex-coach/

See Also[edit | edit source]