ruby-changes:66919
From: Jeremy <ko1@a...>
Date: Wed, 28 Jul 2021 04:36:31 +0900 (JST)
Subject: [ruby-changes:66919] 4fc9ddd7b6 (master): Update Capturing and Anchors sections of regexp documention
https://git.ruby-lang.org/ruby.git/commit/?id=4fc9ddd7b6 From 4fc9ddd7b6af54abf88d702c2e11e97ca7750ce3 Mon Sep 17 00:00:00 2001 From: Jeremy Evans <code@j...> Date: Tue, 27 Jul 2021 12:30:43 -0700 Subject: Update Capturing and Anchors sections of regexp documention Document that only first 9 numbered capture groups can use the \n backreference syntax. Document \0 backreference. Document \K anchor. Fixes [Bug #14500] --- doc/regexp.rdoc | 36 +++++++++++++++++++++++++++++++----- 1 file changed, 31 insertions(+), 5 deletions(-) diff --git a/doc/regexp.rdoc b/doc/regexp.rdoc index 5ec6490..23fe711 100644 --- a/doc/regexp.rdoc +++ b/doc/regexp.rdoc @@ -222,13 +222,13 @@ jeopardises the overall match. https://github.com/ruby/ruby/blob/trunk/doc/regexp.rdoc#L222 == Capturing Parentheses can be used for <i>capturing</i>. The text enclosed by the -<i>n</i><sup>th</sup> group of parentheses can be subsequently referred to +<i>n</i>th group of parentheses can be subsequently referred to with <i>n</i>. Within a pattern use the <i>backreference</i> -<tt>\n</tt>; outside of the pattern use -<tt>MatchData[</tt><i>n</i><tt>]</tt>. +<tt>\n</tt> (e.g. <tt>\1</tt>); outside of the pattern use +<tt>MatchData[n]</tt> (e.g. <tt>MatchData[1]</tt>). -'at' is captured by the first group of parentheses, then referred to later -with <tt>\1</tt>: +In this example, <tt>'at'</tt> is captured by the first group of +parentheses, then referred to later with <tt>\1</tt>: /[csh](..) [csh]\1 in/.match("The cat sat in the hat") #=> #<MatchData "cat sat in" 1:"at"> @@ -238,6 +238,21 @@ available with its #[] method: https://github.com/ruby/ruby/blob/trunk/doc/regexp.rdoc#L238 /[csh](..) [csh]\1 in/.match("The cat sat in the hat")[1] #=> 'at' +While Ruby supports an arbitrary number of numbered captured groups, +only groups 1-9 are supported using the <tt>\n</tt> backreference +syntax. + +Ruby also supports <tt>\0</tt> as a special backreference, which +references the entire matched string. This is also available at +<tt>MatchData[0]</tt>. Note that the <tt>\0</tt> backreference cannot +be used inside the regexp, as backreferences can only be used after the +end of the capture group, and the <tt>\0</tt> backreference uses the +implicit capture group of the entire match. However, you can use +this backreference when doing substitution: + + "The cat sat in the hat".gsub(/[csh]at/, '\0s') + # => "The cats sats in the hats" + === Named captures Capture groups can be referred to by name when defined with the @@ -524,6 +539,17 @@ characters, <i>anchoring</i> the match to a specific position. https://github.com/ruby/ruby/blob/trunk/doc/regexp.rdoc#L539 * <tt>(?<!</tt><i>pat</i><tt>)</tt> - <i>Negative lookbehind</i> assertion: ensures that the preceding characters do not match <i>pat</i>, but doesn't include those characters in the matched text +* <tt>\K</tt> - Uses an positive lookbehind of the content preceding + <tt>\K</tt> in the regexp. For example, the following two regexps are + almost equivalent: + + /ab\Kc/ + /(?<=ab)c/ + + As are the following two regexps: + + /(a)\K(b)\Kc/ + /(?<=(?<=(a))(b))c/ If a pattern isn't anchored it can begin at any point in the string: -- cgit v1.1 -- ML: ruby-changes@q... Info: http://www.atdot.net/~ko1/quickml/