the large file challenge

Pierre Sahores psahores at easynet.fr
Thu Nov 14 03:29:00 EST 2002


Sadhunathan Nadesan a écrit :
> 
> Ok, here are the results so far,
> 
> bash
> Sun Nov 10 13:01:59 PST 2002
> 17333
> Sun Nov 10 13:03:43 PST 2002
> 
> pascal
> Sun Nov 10 13:03:43 PST 2002
> 17333
> Sun Nov 10 13:05:47 PST 2002
> 
> andu's metacard
> Sun Nov 10 13:05:47 PST 2002
> 29623
> Sun Nov 10 13:08:10 PST 2002
> 
> pierre's metacard
> Sun Nov 10 13:08:10 PST 2002
> 17338
> Sun Nov 10 13:10:21 PST 2002
> 
> bruce's metacard
> Sun Nov 10 13:10:21 PST 2002
> 33351
> Sun Nov 10 13:14:59 PST 2002
> 
> That would be
> 
> bash    1:44
> pascal  2:04
> Andu    2:23
> Pierre  2:11
> Bruce   4:38
> 
> Now, it is likely I have become confused and mixed up exactly what came
> from who, sorry about that!  My apologies if your name is not associated
> with your contribution, or vice versa.
> 
> Now, why did we get different counts?  I believe the count of 17333 is
> correct.  Maybe someone can debug that.
> 
> Here's the code
> 
> Andu
> ---
> #!/usr/local/bin/mc
> 
> on startup
>   put 0 into the_counter
>   put 1 into the_offset
>   put 333491183 into file_size
>   put   30000 into the_increment
>   put "/gig/tmp/log/access_log" into the_file
>   put "mystic_mouse" into pattern
> 
>   open file the_file for read
> 
>   repeat until (the_offset >= file_size)
>     read from file the_file at the_offset for the_increment
>     put it into the_text
>     repeat for each line this_line in the_text
>       get offset(pattern, this_line)
>       if (it is not 0) then add 1 to the_counter
>     end repeat
>     add the_increment to the_offset
>   end repeat
> 
>   put the_counter
> end startup
> 
> Pierre
> ------
> #!/usr/local/bin/mc
> 
> on startup
>   put 0 into the_counter
>   put 1 into the_offset
>   put 333491183 into file_size
>   put   30000 into the_increment
>   put "/gig/tmp/log/access_log" into the_file
>   put "mystic_mouse" into pattern
> 
>   open file the_file for read
> 
>   repeat until (the_offset >= file_size)
>     read from file the_file at the_offset for the_increment

>     put filter it with "mystic_mouse" into tempo
>     add the num of lines in tempo to the_counter

>    # put it into the_text
> 
>    #  repeat until lineoffset("mystic_mouse", the_text) = 0
>    #    if (lineoffset("mystic_mouse", the_text) is not "0") then
>    #      add 1 to the_counter
>    #      delete line 1 to lineoffset("mystic_mouse", the_text) of the_text
>    #    end if
>    #  end repeat
> 
>     add the_increment to the_offset
>   end repeat
> 
>   put the_counter
> end startup
> 
> Bruce
> -----
> #!/usr/local/bin/mc
> on startup
>   ## initialize variables: try adjusting numLines
>   put "/gig/tmp/log/access_log" into the_file
>   put $1 into numLines  -- called with 10000 as parameter
>   put 0 into counter
> 
>   open file the_file
> 
>   repeat until (isEOF = TRUE)
>      ## read the specified number of lines, check if we are at the end of the file
>      read from file the_file for numLines lines
>      put it into thisChunk
>      put (the result = "eof") into isEOF
> 
>      ## count the number of matches in this chunk
>      put offset("mystic_mouse", thisChunk) into theOffset
>      repeat until (theOffset = 0)
>         add 1 to counter
>         put offset("mystic_mouse", thisChunk, theOffset) into tempOffset
>         if (tempOffset > 0) then add tempOffset to theOffset
>         else put 0 into theOffset
>      end repeat
> 
>   end repeat
> 
>   close file the_file
> 
>   put counter
> end startup
> 
> _______________________________________________
> metacard mailing list
> metacard at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/metacard

Aloha,

What does it do in using the filter command instead of the lineoffset
one ? Faster, slower ?
-- 
Cordialement, Pierre Sahores

Inspection académique de Seine-Saint-Denis.
Applications et bases de données WEB et VPN
Qualifier et produire l'avantage compétitif



More information about the metacard mailing list