Adam and I were having an offline discussion, and some testing shows that AWK outperforms SED by a slight margin:<div><br></div><div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; "><div>

[sean@bob tmp]$ W=/usr/share/dict/words</div><div>[sean@bob tmp]$ (tail -1000 $W; echo output start; cat $W; echo output end; head -1000 $W) &gt; infile</div><div>[sean@bob tmp]$ wc -l infile</div><div>481831 infile</div>

<div>[sean@bob tmp]$ time awk &#39;/output start/,/output end/&#39; &lt; infile &gt; /dev/null</div><div><br></div><div>real    0m0.411s</div><div>user    0m0.393s</div><div>sys     0m0.016s</div><div>[sean@bob tmp]$ time  sed -n &#39;/output start/,/output end/p&#39; &lt; infile &gt; /dev/null</div>

<div><br></div><div>real    0m0.678s</div><div>user    0m0.631s</div><div>sys     0m0.029s</div></span><div><br></div><div>I ran it a bunch more times and the results were similar.  YMMV, benchmarks are lies, etc.</div><div>

<br></div><div>Sean</div><br><div class="gmail_quote">On Wed, Nov 10, 2010 at 11:32 AM, Gilles Detillieux <span dir="ltr">&lt;<a href="mailto:grdetil@scrc.umanitoba.ca">grdetil@scrc.umanitoba.ca</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

I may have misinterpreted the question before.  If you want the &quot;output<br>

start&quot; and &quot;output end&quot; marker lines in the output (which I guess your<br>

grep pipeline would do), then Adam&#39;s sed script will do that.  Mine,<br>

using the &quot;d&quot; commands, will output only the data in between.  The<br>

shortest awk script to do the same would be:<br>

<br>

awk &#39;/output start/{s=1};s==1;/output end/{s=0};&#39;<br>

<br>

or<br>

<br>

awk &#39;/output end/{s=0};s==1;/output start/{s=1};&#39;<br>

<br>

The first is a simplification of Adam&#39;s, which outputs the output marker<br>

lines, while the second, using the same statements in the opposite<br>

order, suppresses the markers.  Of perl, awk and sed, I suspect sed is<br>

the most lightweight, and probably the quickest, unless perl can<br>

outperform sed on larger files.  awk has a reputation for being pretty<br>

slow.  I tend to favour sed unless awk or perl makes the job a lot easier.<br>

<br>

Gilles<br>

<div class="im"><br>

On 11/10/2010 11:13 AM, Adam Thompson wrote:<br>

&gt; The AWK version is functionally identical, and not very much shorter, or<br>

&gt; any more elegant:<br>

&gt;<br>

&gt;     awk ‘/output start/ {s=1};{if (s==1) print $0};/output end/ {s=0}’<br>

&gt;<br>

&gt; (the perl version can generally be made that small, too.)<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; I would instead suggest sed(1), since this is precisely what it’s<br>

&gt; designed for:<br>

&gt;<br>

&gt;     sed –n ‘/output start/,/output end/p’ &lt; infile<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; -Adam<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; *From:* <a href="mailto:roundtable-bounces@muug.mb.ca">roundtable-bounces@muug.mb.ca</a><br>

&gt; [mailto:<a href="mailto:roundtable-bounces@muug.mb.ca">roundtable-bounces@muug.mb.ca</a>] *On Behalf Of *Sean Walberg<br>

&gt; *Sent:* Wednesday, November 10, 2010 10:56<br>

&gt; *To:* Continuation of Round Table discussion<br>

&gt; *Subject:* Re: [RndTbl] Command line challenge: trim garbage from start<br>

&gt; and end of a file.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; OTTOMH:<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; perl -n -e &#39;BEGIN {$state = 0} $state = 1 if ($state == 0 and /output<br>

&gt; start/); $state = 2 if ($state == 1 and /output end/)  ; print if<br>

&gt; ($state == 1)&#39; &lt; infile &gt; outfile<br>

&gt;<br>

&gt; I&#39;ll bet there&#39;s a shorter AWK version though.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; Sean<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; On Wed, Nov 10, 2010 at 10:51 AM, John Lange &lt;<a href="mailto:john@johnlange.ca">john@johnlange.ca</a><br>

</div><div class="im">&gt; &lt;mailto:<a href="mailto:john@johnlange.ca">john@johnlange.ca</a>&gt;&gt; wrote:<br>

&gt;<br>

&gt; I have files with the following structure:<br>

&gt;<br>

&gt; garbage<br>

&gt; garbage<br>

&gt; garbage<br>

&gt; output start<br>

&gt; .. good data<br>

&gt; .. good data<br>

&gt; .. good data<br>

&gt; .. good data<br>

&gt; output end<br>

&gt; garbage<br>

&gt; garbage<br>

&gt; garbage<br>

&gt;<br>

&gt; How can I extract the good data from the file trimming the garbage<br>

&gt; from the beginning and end?<br>

&gt;<br>

&gt; The following works just fine but it&#39;s dirty because I don&#39;t like the<br>

&gt; fact that I have to pick an arbitrarily large number for the &quot;before&quot;<br>

&gt; and &quot;after&quot; values.<br>

&gt;<br>

&gt; grep -A 999999 &quot;output start&quot; &lt;infile&gt; | grep -B 999999 &quot;output end&quot; &gt;<br>

&gt; newfile<br>

&gt;<br>

&gt; Can anyone come up with something more elegant?<br>

&gt;<br>

&gt; --<br>

&gt; John Lange<br>

</div>&gt; <a href="http://www.johnlange.ca" target="_blank">www.johnlange.ca</a> &lt;<a href="http://www.johnlange.ca" target="_blank">http://www.johnlange.ca</a>&gt;<br>

<div class="im"><br>

--<br>

Gilles R. Detillieux              E-mail: &lt;<a href="mailto:grdetil@scrc.umanitoba.ca">grdetil@scrc.umanitoba.ca</a>&gt;<br>

Spinal Cord Research Centre       WWW:    <a href="http://www.scrc.umanitoba.ca/" target="_blank">http://www.scrc.umanitoba.ca/</a><br>

Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 0J9  (Canada)<br>

_______________________________________________<br>

</div><div><div></div><div class="h5">Roundtable mailing list<br>

<a href="mailto:Roundtable@muug.mb.ca">Roundtable@muug.mb.ca</a><br>

<a href="http://www.muug.mb.ca/mailman/listinfo/roundtable" target="_blank">http://www.muug.mb.ca/mailman/listinfo/roundtable</a><br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br>Sean Walberg &lt;<a href="mailto:sean@ertw.com">sean@ertw.com</a>&gt;    <a href="http://ertw.com/">http://ertw.com/</a><br>

</div>