ruby-changes:66919

https://git.ruby-lang.org/ruby.git/commit/?id=4fc9ddd7b6

From 4fc9ddd7b6af54abf88d702c2e11e97ca7750ce3 Mon Sep 17 00:00:00 2001
From: Jeremy Evans <code@j...>
Date: Tue, 27 Jul 2021 12:30:43 -0700
Subject: Update Capturing and Anchors sections of regexp documention

Document that only first 9 numbered capture groups can use the \n
backreference syntax.  Document \0 backreference.  Document \K anchor.

Fixes [Bug #14500]
---
 doc/regexp.rdoc | 36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/doc/regexp.rdoc b/doc/regexp.rdoc
index 5ec6490..23fe711 100644
--- a/doc/regexp.rdoc
+++ b/doc/regexp.rdoc
@@ -222,13 +222,13 @@ jeopardises the overall match. https://github.com/ruby/ruby/blob/trunk/doc/regexp.rdoc#L222
 == Capturing
 
 Parentheses can be used for <i>capturing</i>. The text enclosed by the
-<i>n</i><sup>th</sup> group of parentheses can be subsequently referred to
+<i>n</i>th group of parentheses can be subsequently referred to
 with <i>n</i>. Within a pattern use the <i>backreference</i>
-<tt>\n</tt>; outside of the pattern use
-<tt>MatchData[</tt><i>n</i><tt>]</tt>.
+<tt>\n</tt> (e.g. <tt>\1</tt>); outside of the pattern use
+<tt>MatchData[n]</tt> (e.g. <tt>MatchData[1]</tt>).
 
-'at' is captured by the first group of parentheses, then referred to later
-with <tt>\1</tt>:
+In this example, <tt>'at'</tt> is captured by the first group of
+parentheses, then referred to later with <tt>\1</tt>:
 
     /[csh](..) [csh]\1 in/.match("The cat sat in the hat")
         #=> #<MatchData "cat sat in" 1:"at">
@@ -238,6 +238,21 @@ available with its #[] method: https://github.com/ruby/ruby/blob/trunk/doc/regexp.rdoc#L238
 
     /[csh](..) [csh]\1 in/.match("The cat sat in the hat")[1] #=> 'at'
 
+While Ruby supports an arbitrary number of numbered captured groups,
+only groups 1-9 are supported using the <tt>\n</tt> backreference
+syntax.
+
+Ruby also supports <tt>\0</tt> as a special backreference, which
+references the entire matched string.  This is also available at
+<tt>MatchData[0]</tt>.  Note that the <tt>\0</tt> backreference cannot
+be used inside the regexp, as backreferences can only be used after the
+end of the capture group, and the <tt>\0</tt> backreference uses the
+implicit capture group of the entire match.  However, you can use
+this backreference when doing substitution:
+
+  "The cat sat in the hat".gsub(/[csh]at/, '\0s')
+    # => "The cats sats in the hats"
+
 === Named captures
 
 Capture groups can be referred to by name when defined with the
@@ -524,6 +539,17 @@ characters, <i>anchoring</i> the match to a specific position. https://github.com/ruby/ruby/blob/trunk/doc/regexp.rdoc#L539
 * <tt>(?<!</tt><i>pat</i><tt>)</tt> - <i>Negative lookbehind</i>
   assertion: ensures that the preceding characters do not match
   <i>pat</i>, but doesn't include those characters in the matched text
+* <tt>\K</tt> - Uses an positive lookbehind of the content preceding
+  <tt>\K</tt> in the regexp.  For example, the following two regexps are
+  almost equivalent:
+
+      /ab\Kc/
+      /(?<=ab)c/
+
+  As are the following two regexps:
+
+      /(a)\K(b)\Kc/
+      /(?<=(?<=(a))(b))c/
 
 If a pattern isn't anchored it can begin at any point in the string:
 
-- 
cgit v1.1


--
ML: ruby-changes@q...
Info: http://www.atdot.net/~ko1/quickml/