[RndTbl] Wrong time of night for doing regex?

Hartmut W Sager hwsager at marityme.net
Sat Jan 4 05:00:00 CST 2020


This might be the wrong time of night for doing regex (i.e., my mistake),
or my trusty Vedit text editor has a bug in its regex implementation.

Original search string: ^(From
AncientBBS[1-2])\s+(Sun|Mon|Tue|Wed|Thu|Fri|Sat)[\s\,]+(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9][0-9]|\s[0-9])[\s\,]+(19[0-9][0-9])[\s\,]+([0-9][0-9]\:[0-9][0-9]\:[0-9][0-9])\s*$
Replacement string: <Nah, skip it>

The above search string gives a syntax error.  I am a bit suspicious of the
([0-9][0-9]|\s[0-9]) group re operator precedence of the "or", and
proceeded to stepwise simplification to narrow it down.  I finally got down
to:

Search string:
(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9])[\s\,]+
Replacement string: \1\s0\2\s

The new search works fine (as did some of the previous stepwise simplified
ones), but the replacements are baffling me.
The line
>From AncientBBS1 Thu  Jan  2, 1986  20:50:00
gets changed to
>From AncientBBS1 Thu   02 1986  20:50:00

I.e., the variable \1 seems to get lost.  In my previous stepwise
simplified cases, multiple variables got lost when the search worked at all.

Why am I doing this?  I need to massage some old BBS messages into the
retarded mbox format, whose date format (on the "From " line) of "Tue Nov
05 19:02:00 1985" is particularly illogical.  Be that as it may, The two
sources of these messages I am processing had further sloppiness in their
dates, done by some ancient BBS bozos.  I did successfully fix a lot of
that already with regex.

Hartmut W Sager - Tel +1-204-339-8331
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://muug.ca/pipermail/roundtable/attachments/20200104/0b8efab7/attachment.html>


More information about the Roundtable mailing list