[RndTbl] Wrong time of night for doing regex?
Hartmut W Sager
hwsager at marityme.net
Sat Jan 4 12:24:35 CST 2020
Hi Dan,
"\s" is a single space, "0" is just "0", and "\1" and "\2" are variables
that reference parts/segments of the search string.
Thanks for the rubular tip-off. Being a classic hard-core programmer, I'm
not used to those kind of tools, but I might look at rubular. I did figure
out the problem, and in my main reply (to myself), you'll see a detailed
explanation.
Hartmut W Sager - Tel +1-204-339-8331
On Sat, 4 Jan 2020 at 10:59, Dan Martin <dan at martinmedcorp.com> wrote:
> Hi Hartmut
>
> I am not familiar with your replacement syntax \1\s0\2\s
>
> Rubular shows the groups as:
> 1 From AncientBBS1
> 2 Thu
> 3 Jan
> and 3 others
>
> and for the truncated expression:
> 1 Jan
> 2 2
>
> I find rubular a convenient online tool for checking regex
> https://rubular.com/
>
> -Dan
>
> On Sat, Jan 4, 2020 at 10:27 AM Hartmut W Sager <hwsager at marityme.net>
> wrote:
>
>> This might be the wrong time of night for doing regex (i.e., my mistake),
>> or my trusty Vedit text editor has a bug in its regex implementation.
>>
>> Original search string: ^(From
>> AncientBBS[1-2])\s+(Sun|Mon|Tue|Wed|Thu|Fri|Sat)[\s\,]+(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9][0-9]|\s[0-9])[\s\,]+(19[0-9][0-9])[\s\,]+([0-9][0-9]\:[0-9][0-9]\:[0-9][0-9])\s*$
>> Replacement string: <Nah, skip it>
>>
>> The above search string gives a syntax error. I am a bit suspicious of
>> the ([0-9][0-9]|\s[0-9]) group re operator precedence of the "or", and
>> proceeded to stepwise simplification to narrow it down. I finally got down
>> to:
>>
>> Search string:
>> (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+([0-9])[\s\,]+
>> Replacement string: \1\s0\2\s
>>
>> The new search works fine (as did some of the previous stepwise
>> simplified ones), but the replacements are baffling me.
>> The line
>> From AncientBBS1 Thu Jan 2, 1986 20:50:00
>> gets changed to
>> From AncientBBS1 Thu 02 1986 20:50:00
>>
>> I.e., the variable \1 seems to get lost. In my previous stepwise
>> simplified cases, multiple variables got lost when the search worked at all.
>>
>> Why am I doing this? I need to massage some old BBS messages into the
>> retarded mbox format, whose date format (on the "From " line) of "Tue Nov
>> 05 19:02:00 1985" is particularly illogical. Be that as it may, The two
>> sources of these messages I am processing had further sloppiness in their
>> dates, done by some ancient BBS bozos. I did successfully fix a lot of
>> that already with regex.
>>
>> Hartmut W Sager - Tel +1-204-339-8331
>>
>> _______________________________________________
>> Roundtable mailing list
>> Roundtable at muug.ca
>> https://muug.ca/mailman/listinfo/roundtable
>>
> _______________________________________________
> Roundtable mailing list
> Roundtable at muug.ca
> https://muug.ca/mailman/listinfo/roundtable
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://muug.ca/pipermail/roundtable/attachments/20200104/30c98faf/attachment-0001.html>
More information about the Roundtable
mailing list