the large file challenge
Pierre Sahores
psahores at easynet.fr
Thu Nov 14 03:29:00 EST 2002
Sadhunathan Nadesan a écrit :
>
> Ok, here are the results so far,
>
> bash
> Sun Nov 10 13:01:59 PST 2002
> 17333
> Sun Nov 10 13:03:43 PST 2002
>
> pascal
> Sun Nov 10 13:03:43 PST 2002
> 17333
> Sun Nov 10 13:05:47 PST 2002
>
> andu's metacard
> Sun Nov 10 13:05:47 PST 2002
> 29623
> Sun Nov 10 13:08:10 PST 2002
>
> pierre's metacard
> Sun Nov 10 13:08:10 PST 2002
> 17338
> Sun Nov 10 13:10:21 PST 2002
>
> bruce's metacard
> Sun Nov 10 13:10:21 PST 2002
> 33351
> Sun Nov 10 13:14:59 PST 2002
>
> That would be
>
> bash 1:44
> pascal 2:04
> Andu 2:23
> Pierre 2:11
> Bruce 4:38
>
> Now, it is likely I have become confused and mixed up exactly what came
> from who, sorry about that! My apologies if your name is not associated
> with your contribution, or vice versa.
>
> Now, why did we get different counts? I believe the count of 17333 is
> correct. Maybe someone can debug that.
>
> Here's the code
>
> Andu
> ---
> #!/usr/local/bin/mc
>
> on startup
> put 0 into the_counter
> put 1 into the_offset
> put 333491183 into file_size
> put 30000 into the_increment
> put "/gig/tmp/log/access_log" into the_file
> put "mystic_mouse" into pattern
>
> open file the_file for read
>
> repeat until (the_offset >= file_size)
> read from file the_file at the_offset for the_increment
> put it into the_text
> repeat for each line this_line in the_text
> get offset(pattern, this_line)
> if (it is not 0) then add 1 to the_counter
> end repeat
> add the_increment to the_offset
> end repeat
>
> put the_counter
> end startup
>
> Pierre
> ------
> #!/usr/local/bin/mc
>
> on startup
> put 0 into the_counter
> put 1 into the_offset
> put 333491183 into file_size
> put 30000 into the_increment
> put "/gig/tmp/log/access_log" into the_file
> put "mystic_mouse" into pattern
>
> open file the_file for read
>
> repeat until (the_offset >= file_size)
> read from file the_file at the_offset for the_increment
> put filter it with "mystic_mouse" into tempo
> add the num of lines in tempo to the_counter
> # put it into the_text
>
> # repeat until lineoffset("mystic_mouse", the_text) = 0
> # if (lineoffset("mystic_mouse", the_text) is not "0") then
> # add 1 to the_counter
> # delete line 1 to lineoffset("mystic_mouse", the_text) of the_text
> # end if
> # end repeat
>
> add the_increment to the_offset
> end repeat
>
> put the_counter
> end startup
>
> Bruce
> -----
> #!/usr/local/bin/mc
> on startup
> ## initialize variables: try adjusting numLines
> put "/gig/tmp/log/access_log" into the_file
> put $1 into numLines -- called with 10000 as parameter
> put 0 into counter
>
> open file the_file
>
> repeat until (isEOF = TRUE)
> ## read the specified number of lines, check if we are at the end of the file
> read from file the_file for numLines lines
> put it into thisChunk
> put (the result = "eof") into isEOF
>
> ## count the number of matches in this chunk
> put offset("mystic_mouse", thisChunk) into theOffset
> repeat until (theOffset = 0)
> add 1 to counter
> put offset("mystic_mouse", thisChunk, theOffset) into tempOffset
> if (tempOffset > 0) then add tempOffset to theOffset
> else put 0 into theOffset
> end repeat
>
> end repeat
>
> close file the_file
>
> put counter
> end startup
>
> _______________________________________________
> metacard mailing list
> metacard at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/metacard
Aloha,
What does it do in using the filter command instead of the lineoffset
one ? Faster, slower ?
--
Cordialement, Pierre Sahores
Inspection académique de Seine-Saint-Denis.
Applications et bases de données WEB et VPN
Qualifier et produire l'avantage compétitif
More information about the metacard
mailing list