regex 3plumb | 2012-09-05 |
---|
regex - filtering and string substitution based on regex
[regex pattern=REGX]
[regex pattern=REGX pass=mismatch]
[regex pattern=REGX subst=STR]
[regex pattern=REGX subst=STR global=yes pass=match]
+-------+ 0 (stdin) ---->| regex |----> (stdout) 1 +-------+
Regex takes each record and macthes them against a regex pattern. There are two main mode of operation: filtering and substitution (when subst is specified). In filtering mode the records are not altered. In subst mode a backreference capable substitution if subst string is done on the matched records once (if global is no) or muiltiple times (if global is yes) - just like the "g" switch for sed's "s" command.With or without substitution, whether records are passed or not on match depends on the pass setting:
all pass all records regardless of whether the pattern matched - can be used only when subst is used; default for subst mode match pass all records where pattern matched; default for filtering mode invert pass all records where pattern did not match; can be used with filtering mode Regex syntax is documented in project genregex (TODO: external referece).
Regex handles binary records, the common use case that each record is a single line ($LSP) is just a special case. The following regex commands have unusual meaning because of binary operation:
^ begin-of-record; after an $LSP, this is beginning of a line as well $ end-of-line (at end-of-record); replaced with [\r\n]+$; after an $LSP it matches the end of the line as expected $$ end-of-record; relaced with a single $; always matches the end of the current binary record Thus ^ can be used for both binary and text; for text streams regex will work properly only after an $LSP, as regex doesn't attempt to do any line splitting, and $ works as expected; for binary streams, $$ works as end-of-record without side effects. It is possible to use regex on text not line-split, but ^ and $ will not match begin or end of lines.
eof handling
Default.blocking/flow control
Default.buffering
None.
The following script reads each line on stdin and replaces distr with DISTR then prints all lines to stdout (sed: "s/distr/DISTR/g"):env:0 | $LSP | [regex pattern=distr subst=DISTR global=true],sticky=1 | env:1Same script, replacing only the first distr in each line (sed: "s/distr/DISTR/"):
env:0 | $LSP | [regex pattern=distr subst=DISTR],sticky=1 | env:1Next script reads each line on stdin and quotes digits then prints matched lines to stdout (sed: "s/distr/DISTR/g"):
env:0 | $LSP | [regex pattern="[0-9]*" subst='"\1"' global=true pass=match],sticky=1 | env:1
regex 3plumb | 2012-09-05 |
---|