Read and Analyze Giant Files
Pierre Sahores
psahores at easynet.fr
Fri Nov 8 15:30:01 EST 2002
Sannyasin Sivakatirswami a écrit :
>
> We are trying to use Rev (or MC) to analyze a web site access log that is 3 million lines long, a 300 meg (or more) file.
>
> If I try a shell script (interpreted) or pascal program (compiled) each runs in about 2 minutes on this file but an xTalk script takes a very long time, maybe it hangs forever? shell and pascal can read the file one line at a time and process the line but not sure how to do it in mc.
>
> Here's the code
>
> shell, 2 lines
>
> #!/bin/sh
> fgrep mystic_mouse | wc -l
>
> pascal, 16 lines
>
> Program detect;
> {$H+}
> Var
> buffer : string;
> result : integer;
> Begin {main}
> buffer := '';
> result := 0;
> While (Not(eof)) Do
> Begin
> Readln(buffer);
> If (pos('mystic_mouse', buffer) > 0) Then
> inc(result);
> End; {file}
> Writeln(result);
> End. {program}
>
> metacard, 13 lines
>
> #!/usr/local/bin/mc
> on startup
> put empty into the_message
> put 0 into the_counter
> read from stdin until empty
> put it into the_message
> repeat for each line this_line in the_message
> if (this_line contains "mystic_mouse") then
> put the_counter + 1 into the_counter
> end if
> end repeat
> put the_counter
> end startup
>
> is there a more efficient way in Transcript to do this?
>
> Well, to try it out, I chopped down the log to a half million lines, then I got these times: shell script, 22 seconds. Pascal 7 seconds. metacard 13 minutes and still running ??
>
> I would like never to have to say that "Metacard/Revolution can't do this"
>
> Before I tackle it further I was thinking someone had already invented this wheel.
>
> Thanks!
> Himalayan Academy Publications
> Sannyasin Sivakatirswami
> Editor's Assistant/Production Manager
> katir at hindu.org
> www.HinduismToday.com, www.HimalayanAcademy.com,
> www.Gurudeva.org, www.hindu.org
Try something alike :
> on mouseup
> put "1" into startread
> open file thefile for read
> read from file thefile until eof
> put the num of lines of it in endtoread
> close file thefile
> repeat while startread < endtoread
> open file thefile for read
> read from file thefile at startread for 99 lines
> ...
> do what you need with it
> ...
> close file thefile
> add 100 to startread
> end repeat
> end mouseup
--
Cordialement, Pierre Sahores
Inspection académique de Seine-Saint-Denis.
Applications et bases de données WEB et VPN
Qualifier et produire l'avantage compétitif
More information about the metacard
mailing list