TWB
v2014.06.05.11.08.57

cheatsheet

Regular Expressions Character Classes

\c      Control character
\s      White space
\S      Not white space
\d      Digit
\D      Not digit
\w      Word
\W      Not word
\x      Hexade­cimal digit
\O      Octal digit

Regular Expressions Quantifiers

*       0 or more
+       1 or more
?       0 or 1
{3}     Exactly 3
{3,}    3 or more
{3,5}   3, 4 or 5

non greedy: .*?

Regular Expressions Groups and Ranges

.       Any character except new line (\n)
(a|b)   a or b
(...)   Group
(?:...) Passive (non-c­apt­uring) group
[abc]   Range (a or b or c)
[^abc]  Not a or b or c
[a-q]   Letter from a to q
[A-Q]   Upper case letter from A to Q
[0-7]   Digit from 0 to 7
\n      nth group/­sub­pattern

Ranges are inclusive.

Regular Expressions Anchors

^       Start of string, or start of line
\A      Start of string
$       End of string, or end of line
\Z      End of string
\b      Word boundary
\B      Not word boundary
\<      Start of word
\>      End of word

Regular Expressions Assertions

?=      Lookahead assertion
?!      Negative lookahead
?<=     Lookbehind assertion
?!=     Negative lookbehind
?<!     Negative lookbehind
?>      Once-only Subexp­ression
?()     Condition [if then]
?()|    Condition [if then else]
?#      Comment

Regular Expressions String Replacement

$n      nth non-pa­ssive group
$2      "­xyz­" in /^(abc­(xy­z))$/
$1      "­xyz­" in /^(?:a­bc)­(xyz)$/
$`      Before matched string
$'      After matched string
$+      Last matched string
$&      Entire matched string

Some regex implem­ent­ations use \ instead of $.

Regular Expressions Special Characters

\n      New line
\r      Carriage return
\t      Tab
\v      Vertical tab
\f      Form feed
\xxx    Octal character xxx
\xhh    Hex character hh

Regular Expressions Escape Sequences

\       Escape following character
\Q      Begin literal sequence
\E      End literal sequence

Regular Expression Common Metacharacters

^ $ \ | . ? + * ( ) [ ] { } < >

Regular Expressions Pattern Modifiers

g       Global match
i       Case-i­nse­nsitive
m       Multiple lines
s       Treat string as single line
x       Allow comments and white space in pattern
e       Evaluate replac­ement
U       Ungreedy pattern

(?s)PATTERN

Snippets

LZ und Zahl, LZ und Zahl mit Punkt, LZ und "-" am Ende der Zeile

\s\d+$|\s\d+\.?\d+$|\s-$

CSS "display:none" ODER "display : none"

display\W*:\W*none

style="color: #f00;"

style=".*"

JSON trailing commas (SUBLIMETEXT)

trailing_commas     ",[\W]*?(\]|\})"

Mickey but not Mouse

(?m)^(?=.*Mickey)((?!Mouse).)*$

"l+a"

\w{1}\+\w{1}

"v2014.06.05.11.08.57"

v[0-9]{4}\.[0-9]{2}\.[0-9]{2}\.[0-9]{2}\.[0-9]{2}\.[0-9]{2}

TOBI_EDIT, tobiedit, tobi edit, ...

(?i)tobi.{0,5}(nle|comment|removed|include|requiered)

Leerzeichen; mind. 2 - ~

{2,}
[\x20]{2,}

Whitespaces, Leerz., tabs, Linebrakes; mind. 2 - ~

\s{2,}

style attributes from html tags

src:            <p style='margin-bottom:5.95pt;line-height:normal'>
regex find:     <p\s*style=.*>
regex replace:  <p>

Word-Html cleaning

# alle span, font, ...
src:            <span style='font-size:12.0pt;font-family:"Times New Roman","serif";background:#F3F1F0'>
regex find:     <[/]?(font|span|xml|del|ins|[ovwxp]:\w+)[^>]*?>
regex replace:  ''

html, alle Attribute

src:            <p class=MsoNormal style='margin-bottom:5.95pt;line-height:normal'>
regex find:     <([^>]*)(?:class|lang|style|size|face|align|width|cellspacing|cellpadding|border|[ovwxp]:\w+)=(?:'[^']*'|""[^""]*""|[^\s>]+)([^>]*)>
regex replace:  <$1$2>

clean >

src:            <p   >
regex find:     \s*>
regex replace:  >

sanitize String

// ersetzt Whitespace n>2 mit einem Leerzeichen
$strUnsafeString = preg_replace("/\s{2,}/", ' ' , $strUnsafeString);

// ersetzt alle nicht erlaubten Zeichen mit einem ''
// erlaubte Zeichen: [a-z] [A-Z] [0-9] [ ] [_] [-]
$strUnsafeString = preg_replace("/[^\w\ -]/", '' , $strUnsafeString);