Read and Analyze Giant Files
Sannyasin Sivakatirswami
katir at hindu.org
Thu Nov 7 23:49:00 EST 2002
We are trying to use Rev (or MC) to analyze a web site access log that
is 3 million lines long, a 300 meg (or more) file.
If I try a shell script (interpreted) or pascal program (compiled) each
runs in about 2 minutes on this file but an xTalk script takes a very
long time, maybe it hangs forever? shell and pascal can read the file
one line at a time and process the line but not sure how to do it in mc.
Here's the code
shell, 2 lines
#!/bin/sh
fgrep mystic_mouse | wc -l
pascal, 16 lines
Program detect;
{$H+}
Var
buffer : string;
result : integer;
Begin {main}
buffer := '';
result := 0;
While (Not(eof)) Do
Begin
Readln(buffer);
If (pos('mystic_mouse', buffer) > 0) Then
inc(result);
End; {file}
Writeln(result);
End. {program}
metacard, 13 lines
#!/usr/local/bin/mc
on startup
put empty into the_message
put 0 into the_counter
read from stdin until empty
put it into the_message
repeat for each line this_line in the_message
if (this_line contains "mystic_mouse") then
put the_counter + 1 into the_counter
end if
end repeat
put the_counter
end startup
is there a more efficient way in Transcript to do this?
Well, to try it out, I chopped down the log to a half million lines,
then I got these times: shell script, 22 seconds. Pascal 7 seconds.
metacard 13 minutes and still running ??
I would like never to have to say that "Metacard/Revolution can't do
this"
Before I tackle it further I was thinking someone had already invented
this wheel.
Thanks!
Himalayan Academy Publications
Sannyasin Sivakatirswami
Editor's Assistant/Production Manager
katir at hindu.org
www.HinduismToday.com, www.HimalayanAcademy.com,
www.Gurudeva.org, www.hindu.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 2317 bytes
Desc: not available
Url : http://lists.runrev.com/pipermail/metacard/attachments/20021107/b6464991/attachment.bin
More information about the metacard
mailing list