Read and Analyze Giant Files

Sannyasin Sivakatirswami katir at hindu.org
Thu Nov 7 23:49:00 EST 2002


We are trying to use Rev (or MC) to analyze a web site access log that 
is 3 million lines long, a 300 meg (or more) file.

If I try a shell script (interpreted) or pascal program (compiled) each 
runs in about 2 minutes on this file but an xTalk script takes a very 
long time, maybe it hangs forever?  shell and pascal can read the file 
one line at a time and process the line but not sure how to do it in mc.

Here's the code

shell, 2 lines

#!/bin/sh
fgrep mystic_mouse | wc -l

pascal, 16 lines

Program detect;
{$H+}
Var
  buffer    : string;
  result    : integer;
Begin {main}
  buffer := '';
  result := 0;
  While (Not(eof)) Do
  Begin
    Readln(buffer);
    If (pos('mystic_mouse', buffer) > 0) Then
      inc(result);
  End; {file}
  Writeln(result);
End. {program}

metacard, 13 lines

#!/usr/local/bin/mc
on startup
  put empty into the_message
  put 0 into the_counter
  read from stdin until empty
  put it into the_message
  repeat for each line this_line in the_message
    if (this_line contains "mystic_mouse") then
      put the_counter + 1 into the_counter
    end if
  end repeat
  put the_counter
end startup



is there a more efficient way in Transcript to do this?

Well, to try it out, I chopped down the log to a half million lines, 
then I got these times:  shell script, 22 seconds.  Pascal 7 seconds.  
metacard 13 minutes and still running ??

I would like never to have to say that "Metacard/Revolution can't do 
this"

Before I tackle it further I was thinking someone had already invented 
this wheel.



Thanks!
Himalayan Academy Publications
Sannyasin Sivakatirswami
Editor's Assistant/Production Manager
katir at hindu.org
www.HinduismToday.com, www.HimalayanAcademy.com,
www.Gurudeva.org, www.hindu.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 2317 bytes
Desc: not available
Url : http://lists.runrev.com/pipermail/metacard/attachments/20021107/b6464991/attachment.bin


More information about the metacard mailing list