ویکیپیڈیا:آٹوویکی براؤزر/ریگیولر ایکسپریشن

آٹو ویکی براؤزر 6.2.1.0 د ب ت
صفحہ اول تعارف و قواعد	رہنمائے صارف طریقہ استعمال	گفتگو آٹو ویکی براؤز سے متعلق گفتگو	رہنما اصول رہنما اصول برائے استعمال کنندگان	درخواست اختیارات استعمال کی اجازت لینے کے لیے	بگز آٹوویکی براؤزر سے متعلق مشکلات اور سوالات	درخواست خصوصیت آٹوویکی براؤزر میں کسی نئی خصوصیت کا مطالبہ	تکنیکی تکنیکی دستاویز
نوشتہ تبدیلی • ڈویلپر گفتگو • ریگیولر ایکسپریشن • تختہ مشق • سانچہ جاتی رجوع مکررات • املائی اغلاط • شماریات استعمال • خانہ صارف

ریگیولر ایکسپریشن کے سلسلہ میں ویکیپیڈیا معاونت ذیل میں درج ہے۔

Regular expression definitions

ریگیولر ایکسپریشن
Anchors		Comments
^	Start of string	First character on page
\A	Start of string	First character on page
$	End of string	Last character on page
\Z	End of string	Last character on page
\b	On a word boundary	On a letter, number or underscore character
\B	Not on a word boundary	Not on a letter, number or underscore character

Character Classes		Examples
\w	Any "word" character (letters, digits, underscore)	abcdefghijklmnopqstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789_
\W	Any character other than "word" characters	$?!#%*@&;:.,+-±=^"`\\|/<>{}[]()~(newline)(tab)(space)
\s	White space character	(space) (tab) (literal new line) (return)
\S	Any character other than white space	abcxyz_ABCXYZ$?!#%@&;:.,+-=^"/<{[(~0123789 (incomplete list)*
\d	Any digit	0123456789
\D	Any character other than digits	abcxyz_ABCXYZ$?!#%@&;:.,+-=^"/<{[(~(newline)(tab)(space) (incomplete list)*
\t	Tab	(tab)
\c	Control character
\x	Any hexadecimal digit	0123456789abcdefABCDEF
\0	Any octal digit	01234567

Quantifiers		Comments
*	0 or more
+	1 or more
?	0 or 1
{3}	Exactly 3
{3,}	3 or more
{2,4}	2, 3, or 4

Escape Character		Comments
\	Escape Character

Metacharacters (must be escaped)		Comments
Metacharacter	Metacharacter escaped
^	\^
$	\$
(	\(
)	\)
<	\<
.	\.
*	\*
+	\+
?	\?
[	\[
]	\]
{	\{
\	\\
\|	\\|
>	\>
		Not in this list: `=}#!/%&_:;` (incomplete list)
Special Characters		Comments
\n	Newline

Groups and Ranges Note: Ranges are inclusive		Comments
.	Any character except newline
( . . . )	Capture group (captures anything between the parentheses)	use captured groups with $1 $2 etc.
(abc)	abc (in sequence)	$1 $2 $3 etc. are called backreferences
\|	Alternation (matches either the right side or the left)
ab\|cd\|ef	ab or cd or ef
[def]	d or e or f
[^abc]	Anything (including newline) except a or b or c
[a-q]	Lowercase letter between a and q
[A-Q]	Uppercase letter between A and Q
[0-7]	Digit between 0 and 7

String matching		Comments
\1, \2, \3, etc.	Match strings in captured groups (...).	(\n[^\n]+)\1 matches identical adjacent lines; $1 will replace with a single copy.
Back references		Comments
(sam) (max) (pete)	$1 - returns sam
(sam) (max) (pete)	$2 - returns max
(sam) (max) (pete)	$3 - returns pete
(A) (B) (C) (D) (E) (F) (G) (H) (I) (J)	$10 - returns J
(A) (B) (C) (D) (E) (F) (G) (H) (I) (J)	${1}0 - returns A0

Extension notation		Comments
(?:...) non-capturing parens	(?:abc) match and consume, but don't capture, abc
(?=...) positive lookahead, no string consumed	abc(?=xyz) matches abc only if followed by xyz
(?!...) negative lookahead, no string consumed	abc(?!xyz) matches abc except when it's followed by xyz
(?<=...) positive lookbehind, no string consumed	(?<=xyz)abc matches abc only if preceded by xyz
(?<!...) negative lookbehind, no string consumed	(?<!xyz)abc matches abc except when it's preceded by xyz
(?#...) comment	(?#Just a comment in here)

Sample Patterns
Regex pattern	Will Match	Comments
([A-Za-z0-9-]+)	1 or more characters which are letters, numbers and hyphens
(\d{1,2}\/\d{1,2}\/\d{4})	Date 3/24/2008 or 03/24/2008 or 24/03/2008
\[\[\d{4}\]\]	4 digit number wiki link [[2008]]

Tokens and groups

Tokens and groups are portions of a regular expression which can be followed by a quantifier to modify the number of consecutive matches. A token is a character, special character, character class, or range (e.g. [m-q]). A group is formed by enclosing tokens or other groups within parenthesis. All of these can be modified to match a number of times by a quantifier. For example: a?, \n+, \d{4}, [m-r]*, (a?\n+\d{4}[m-r]*|not){3,7}, and ((?:97[89]-?)?(?:\d[ -]?){9}[\dXx]).

Greed and quantifiers

Greed, in regular expression context, describes the number of characters which will be matched – often also stated as "consumed" – by a variable length portion of a regular expression – a token or group followed by a quantifier which specifies a variable length. If the portion of the regular expression is "greedy", it will match as many characters as possible. If it is not greedy, it will match as few characters as possible.

[[Lorem ipsum]] dolor sit amet, [[consectetur adipisicing]] elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

\[\[.*\]\]

Will match [[Lorem ipsum]] dolor sit amet, [[consectetur adipisicing]]

\[\[.*?\]\]

Will match [[Lorem ipsum]] and [[consectetur adipisicing]]

Be careful (\w)(<ref[^<>]*>.*?</ref>)([,.:;]) whose center capture group will span more than one ref group if the outer conditions are met:
sed do eiusmod tempor<ref>reference</ref> incididunt ut <ref>reference 2</ref>. labore

Recursive subgroups

\[\[(Image:[^][|]+)\|([^][]*(\[\[[^][]+\]\][^][]*)*)\]\]

Regular expression examples

Regular expression examples
Search for flagicon template and remove
Find:	\{\{\s?[Ff]lagicon\s?\\|.*?\}\}
Replace With:	(nothing)
Example of text to search:	{{flagicon\|USA}} [[United States]]
Result:	[[United States]]
Comments:
Search for any of three template parameters and replace the value with some new value
Find:	([\\|]\s)(occupation\|spouse\|notableworks)(\s=\s)([^\\|\}]+)(?=\s(\\|\|}}))
Replace With:	$1$2$3new value$5
Example of text to search:	{{infobox person\|name=Steveo\|occupation=dancer\|nationality=The moon}}
Result:	{{infobox person\|name=Steveo\|occupation=new value\|nationality=The moon}}
Comments:	$1 $2 $3 and $5 are for preserving whitespace. A cut-down version could be used to rename template parameters. Further processing could be performed against $4.

Tips and tricks

Using look ahead/behind

Match [url] and [url title]
Regex:   \[*((?:\w+:)?\/\/[^<>\[\]\s"]+) *([^\n\]]+(?=\])|)\]+\s*
$1 will contain the url.
$2 will contain the title without trailing ] or will be empty.

User-made shortcut editing macros

You can make your own shortcut editing macros. When you edit a page, you can enter your short-cut macro keys into the page anywhere you want AWB to act upon them.

For example you are examining a page in the AWB edit box. You see numerous items like adding {{fact}}, inserting line breaks <br>, commenting out entire lines , inserting state names, <ref>Insert footnote text here</ref>, insert Level 2,3,or even 4 headlines, etc... This can all be done by creating your short-cut macro keys.

The process

Create a rule. See Find and replace, Advanced.
Edit your page in the edit box. Insert your short-cut editing macro key(s) anywhere in the page you want AWB to make the change(s) for you.
Re-parse the page. Right click on the edit box and select Re-parse from the context pop up menu. AWB will then re-examine your page with your macro short-cut key(s), find your short-cut key(s) and perform the action you specified in the rule.

Naming a short-cut macro key can be any name. But it is best to try and make it unique so that it will not interfere with any other process that AWB may find and suggest. For that reason using /// followed by a set of lowercase characters that you can easily remember is best (lowercase is used so that you do not have to use the shift key). You can then enter these short-cut macros keys you create into the page manually or by using the edit box context menu paste more function. The reason why we use three '/' is so that AWB will not confuse web addresses/url's in an page when re-parsing.

Examples:

Create a rule as a regular expression.

User made short-cut editing macros
`///col` Comment out entire line
Short-cut key:	///col
Name:	Comment out entire line
Find:	///col(.*)
Replace With:	<!-- $1 -->
Example before reparsing:	///colThe quick brown fox jumps over the lazy dog
Result after re-parsing:	<!-- The quick brown fox jumps over the lazy dog -->
Comments:
`///br` Insert line feed
Short-cut key:	///br
Name:	Insert line feed
Find:	///br
Replace With:	<br />
Example before reparsing:	Eat some more///br of these soft French buns///br and drink some tea
Result after re-parsing:	Eat some more<br /> of these soft French buns<br /> and drink some tea
Comments:
`///fac` Insert {{fact}} with current date
Short-cut key:	///fac
Name:	Insert {{fact}} with current date
Find:	///fac
Replace With:	{{fact\|date={{subst:CURRENTMONTHNAME}} {{subst:CURRENTYEAR}}}}
Example before reparsing:	The quick brown fox jumps over the lazy dog///fac
Result after re-parsing:	The quick brown fox jumps over the lazy dog{{fact\|date={{subst:CURRENTMONTHNAME}} {{subst:CURRENTYEAR}}}}
Comments:

Token matching

Match inside <ref></ref>
Regex: <ref[^>]*>([^<]|<[^/]|</[^r]|</r[^e]|</re[^f]|</ref[^>])+</ref>

Match inside <ref></ref> using a (?! not match) notation
Regex: <ref[^>]*>([^<]|<(?!/ref>))+</ref>

Match template {{...}} possibly with templates inside it, but no templates inside those
Regex: \{\{([^{]|\{[^{]|\{\{[^{}]+\}\})+\}\}

Ordinary matching

Match words and spaces
Regex: [\w\s]+

Match non-wiki text
Regex: [^][{}|<>']+

Match bracketed URLs
Regex: \[(https?://[^][<>\s"]+) *((?<= )[^\n\]]*|)\]

بیرونی روابط برائے معاونت

http://regexpal.com/
http://www.wellho.net/regex/dotnet.html
http://www.regular-expressions.info/
http://perldoc.perl.org/perlre.html
https://docs.python.org/release/2.5.2/lib/re-syntax.html
RegExr: Online Regular Expression Testing Tool (Requires Adobe Flash Player)
MSDN Regular Expressions
MSDN Regular Expressions
a Ruby regular expression editor
Regular Expressions - User Guide