Read and Analyze Giant Files

Pierre Sahores psahores at easynet.fr
Fri Nov 8 15:30:01 EST 2002


Sannyasin Sivakatirswami a écrit :
> 
> We are trying to use Rev (or MC) to analyze a web site access log that is 3 million lines long, a 300 meg (or more) file.
> 
> If I try a shell script (interpreted) or pascal program (compiled) each runs in about 2 minutes on this file but an xTalk script takes a very long time, maybe it hangs forever?  shell and pascal can read the file one line at a time and process the line but not sure how to do it in mc.
> 
> Here's the code
> 
> shell, 2 lines
> 
> #!/bin/sh
> fgrep mystic_mouse | wc -l
> 
> pascal, 16 lines
> 
> Program detect;
> {$H+}
> Var
>   buffer    : string;
>   result    : integer;
> Begin {main}
>   buffer := '';
>   result := 0;
>   While (Not(eof)) Do
>   Begin
>     Readln(buffer);
>     If (pos('mystic_mouse', buffer) > 0) Then
>       inc(result);
>   End; {file}
>   Writeln(result);
> End. {program}
> 
> metacard, 13 lines
> 
> #!/usr/local/bin/mc
> on startup
>   put empty into the_message
>   put 0 into the_counter
>   read from stdin until empty
>   put it into the_message
>   repeat for each line this_line in the_message
>     if (this_line contains "mystic_mouse") then
>       put the_counter + 1 into the_counter
>     end if
>   end repeat
>   put the_counter
> end startup
> 
> is there a more efficient way in Transcript to do this?
> 
> Well, to try it out, I chopped down the log to a half million lines, then I got these times:  shell script, 22 seconds.  Pascal 7 seconds.  metacard 13 minutes and still running ??
> 
> I would like never to have to say that "Metacard/Revolution can't do this"
> 
> Before I tackle it further I was thinking someone had already invented this wheel.
> 
> Thanks!
> Himalayan Academy Publications
> Sannyasin Sivakatirswami
> Editor's Assistant/Production Manager
> katir at hindu.org
> www.HinduismToday.com, www.HimalayanAcademy.com,
> www.Gurudeva.org, www.hindu.org

Try something alike :

> on mouseup
> put "1" into startread
> open file thefile for read
> read from file thefile until eof
> put the num of lines of it in endtoread
> close file thefile
> repeat while startread < endtoread
> open file thefile for read
> read from file thefile at startread for 99 lines
> ...
> do what you need with it
> ...
> close file thefile
> add 100 to startread
> end repeat
> end mouseup

-- 
Cordialement, Pierre Sahores

Inspection académique de Seine-Saint-Denis.
Applications et bases de données WEB et VPN
Qualifier et produire l'avantage compétitif



More information about the metacard mailing list