Regexr Desktop

Posted on by
  1. Regex Desktop Application
  2. Regex Offline Desktop
  3. Regexr Desktop Version
  4. Regexr Desktop
  5. Gskinner Regex Desktop

A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that define a search pattern.Usually such patterns are used by string-searching algorithms for 'find' or 'find and replace' operations on strings, or for input validation.It is a technique developed in theoretical computer science and formal language theory. RegExr is an online tool to learn, build, & test Regular Expressions (RegEx / RegExp). Results update in real-time as you type. Roll over a match or expression for details. Trusted Windows (PC) download RegExr 0.3.1. Virus-free and 100% clean download. Get RegExr alternative downloads.

Regular Expressions are so cool. Knowledge of regexes will allow you to save the day.

It seems the most popular options were: Kodos (no commits in the past decade, runs off deprecated Qt3) Regexr (desktop version ran on Adobe Air, no longer available) RegExBuddy ($40, maybe it's the only option) With Perl 5.10 use re 'debug'. Debugcolor (no clue what this even means).

Definitions

In formal language theory, a regular expression (a.k.a. regex, regexp, or r.e.), is a string that represents a regular (type-3) language.

Huh??

Okay, in many programming languages, a regular expression is a pattern that matches strings or pieces of strings. The set of strings they are capable of matching goes way beyond what regular expressions from language theory can describe.

Basic Examples

Rather than start with technical details, we’ll start with a bunch of examples.

RegexMatches any string that
hellocontains {hello}
gray greycontains {gray, grey}
gr(a e)ycontains {gray, grey}
gr[ae]ycontains {gray, grey}
b[aeiou]bblecontains {babble, bebble, bibble, bobble, bubble}
[b-chm-pP]at otcontains {bat, cat, hat, mat, nat, oat, pat, Pat, ot}
colou?rcontains {color, colour}
rege(x(es)? xps?)contains {regex, regexes, regexp, regexps}
go*glecontains {ggle, gogle, google, gooogle, goooogle, ..}
go+glecontains {gogle, google, gooogle, goooogle, ..}
g(oog)+lecontains {google, googoogle, googoogoogle, googoogoogoogle, ..}
z{3}contains {zzz}
z{3,6}contains {zzz, zzzz, zzzzz, zzzzzz}
z{3,}contains {zzz, zzzz, zzzzz, ..}
[Bb]rainf**kcontains {Brainf**k, brainf**k}
dcontains {0,1,2,3,4,5,6,7,8,9}
d{5}(-d{4})?contains a United States zip code
1d{10}contains an 11-digit string starting with a 1
[2-9] [12]d 3[0-6]contains an integer in the range 2.36 inclusive
Hellonworldcontains Hello followed by a newline followed by world
mi...ftcontains a nine-character (sub)string beginning with mi and ending with ft (Note: depending on context, the dot stands either for “any character at all” or “any character except a newline”.) Each dot is allowed to match a different character, so both microsoft and minecraft will match.
d+(.dd)?contains a positive integer or a floating point number with exactly two charactersafter the decimal point.
[^i*&[email protected]]contains any character other than an i, asterisk, ampersand, 2, or at-sign.
//[^rn]*[rn]contains a Java or C# slash-slash comment
^dogbegins with 'dog'
dog$ends with 'dog'
^dog$is exactly 'dog'

Notation

There are many different syntaxes for regular expressions, but in general you will see that:

  • Most characters stand for themselves
  • Certain characters, called metacharacters, have special meaning and must be escaped (usually with if you want to use them as characters.In most syntaxes the metacharacters are:
  • Within square brackets, you only have to escape (1) an initial ^, (2) a non-initial or non-final -, (3) a non-initial ], and (4) a .

Using Regular Expressions

Many languages allow programmers to define regexes and then use them to:

  • Validate that a piece of text (or a portion of that text) matches some pattern
  • Find fragments of some text that match some pattern
  • Extract fragments of some text
  • Replace fragments of text with other text

Generally a regex is first compiled into some internal form that can be used for super fast validation, extraction, and replacing. Sometimes there is an explicit compile function or method, and sometimes special syntax is used to compile, such as the very common form /../.

Validation

Example: find 'color' or 'colour'Crack age of empire 2. in a given string.

If you want to know if an entire string matches a pattern, define the pattern with ^ and $, or with A and Z. In Java, you can call matches() instead of find().

Extraction

After doing a match against a pattern, most regex engines will return you a bundle of information, including such things as:

  • the part of the text that matched the pattern
  • the index within the string where the match begins
  • each part of the text matching the parenthesized portions within the pattern
  • (sometimes) the text before the matched text
  • (sometimes) the text after the matched text

Example in Ruby:

Regexr Desktop

The same thing in JavaScript:

Note how in JavaScript, the match result object looks like an array and an object.

The so-called group numbers are found by counting the left-parentheses in the pattern:

TODO PICTURE GOES HERE

Sometimes you need parentheses only for precedence purposes and you don’t want to incur the cost of extracting a group. We have non-capturing groups for this purpose.

Ruby:

JavaScript:

Java

Regex Desktop Application

Substitution

Many languages have replace or replaceAll methods that replace the parts of a string that match a regex. Sometimes you will see a g flag on a regex instead of a replaceAll function.

Components of Regexes

Character Classes

  • Square brackets [ ] — means exactly one character
  • A leading ^ negates, a non-leading, non-terminal - defines a range:
  • If you have a ] in your set, put it first. Use to escape.
  • Java allows crazy extensions:
  • Other ways to say exactly one character from a set are:

Groups

Defined above, in the section on extraction.

Regex Offline Desktop

Quantifiers

Generally, 18 types:

EagerReluctantPossessive
Zero or one????+
Zero or more**?*+
One or more++?++
m times{m}{m}?{m}+
At least m times{m,}{m,}?{m,}+
At least m, at most n times{m,n}{m,n}?{m,n}+

Eager (Greedy and Generous) — match as much as possible, but give back

Possessive — match as much as possible, but do NOT give back

Reluctant — match as little as possible

Regexr Desktop Version

Backreferences

Things captured can be used later:

Anchors, Boundaries, Delimiters

Some regex tokens do not consume characters! They just assert the matching engine is at a particular place, so to speak.

  • ^: Beginning of string (or line, depending on the mode)
  • $: End of string (or line, depending on the mode)
  • A: Beginning of string
  • z: End of string
  • Z: Varies a lot depending on the engine, so be careful with it
  • b: Word boundary
  • B: Not a word boundary
Regexr desktop

Read more about these at Rexegg.

Also, the lookarounds (up next!) don’t consume any characters either!

Lookarounds

  • Lookarounds do not consume anything
  • Even though they have parens, they do not capture
  • Positive Lookahead: Matches only if followed by something

    matches the Hillary in Hillary Clinton but not the Hillary in Hillary Makasa.

  • Negative Lookahead: Matches only if not followed by something
  • Positive Lookbehind: Matches only if preceded by something
  • Negative Lookbehind: Matches only if not preceded by something
  • Lookarounds show up in search and replace applications

Regexr Desktop

Note: Read this awesome article on lookarounds.

Modifiers

A modifier affects the way the rest of the regex is interpreted.Not every language supports all of the modifiers below. For example,JavaScript (officially) supports only i, g, and m.

ModifierMeaning
gglobal
iignore case
mmultiple line
ssingle line (DOTALL): Means that the dotmatches any character at all. Without this modifier, thedot matches any character except a newline.
xignore whitespace in the pattern
dUnix line mode: Considers only U+000A as aline separator, rather than U+000D or the U+000D/U+000A comboor even U+2028.
uUnicode case: in this mode the case-insensitivemodifier respects Unicode cases; outside of this modethat modifier only consolidates cases of US-ASCII characters.

Performance Pitfalls

You should know some things about how your regex engineworks since two 'equivalent' regexes can have drastic differencesin processing speed.

  • It is possible to write regexes that take exponential time to match, but you pretty much have to TRY to make one (they’re pathological)
  • It is more common to accidentally create regexes that run in quadratic time
  • Main types of problems
    • Recompilation (from forgetting to compile regexes used multiple times)
    • Dot-star in the Middle (which causes backtracking)
      • Solution 1: Use negated character class
      • Solution 2: Use reluctant quantifiers
    • Nested Repetition

Performance Tips

Always do the following:

Gskinner Regex Desktop

  • Use non-capturing groups when you need parentheses but not capture.
  • If the regex is very complex, do a quick spot-check before attempting a match, e.g.
    • Does an email address contain '@'?
  • Present the most likely alternative(s) first, e.g.
    • black white blue red green metallic seaweed
  • Reduce the amount of looping the engine has to do
    • ddddd is faster than d{5}
    • aaaa+ is faster than a{4,}
  • Avoid obvious backtracking, e.g.
    • Mr Ms Mrs should be M(?:rs? s)
    • Good morning Good evening should be Good (?:morning evening)

Miscellaneous Language-Specific Notes

A few things that are good to know:

  • Java’s built-in support for regexes exceeds that of many languages
  • Especially good for Unicode and for character classes (has more than Perl)
  • Syntax is more cumbersome (string literal support weak, no operator for matching..) — live with it!
  • Perl has nice regex, too, even allows you can even embed code inside them.
  • Great Perl documentation at the perldoc pagesperlrequick,perlretut,perlre. andperlreref.
  • JavaScript seems to less extensive support than other languages, but I think this is changing.
  • Python puts regex functions in a module.
  • Python docs are here.

Study and Practice

Here are some good sources:

  • RegExr (Awesome online tool for Java regexes)
  • Rubular (Ruby Online Regex Tester)
  • JavaScript Regular Expressions (at Mozilla Developer Center)
  • Regexpal (online JavaScript regex tester)