the large file challenge

Yennie at aol.com Yennie at aol.com
Sun Nov 10 19:33:01 EST 2002


All right... I tweaked a little more outside of email.
For accuracy in the case where "mystic_mouse" occurs multiple times on one 
line, uncomment the line: 
"add offset(return, thisChunk, theOffset) to theOffset"

This just skips to the next line whenever a match is found.

This should run faster than my previous attempts:

on startup
   ## initialize variables: try adjusting numLines
   put "/gig/tmp/log/access_log" into the_file
   put ($1*1024*1024) into chunkSize ## this is for MB
   put 0 into counter
   put FALSE into isEOF
   
   open file the_file
   
   repeat until (isEOF = TRUE)
     ## read the specified number of lines, check if we are at the end of the 
file
     read from file the_file for chunkSize
     put it into thisChunk
     put (the result = "eof") into isEOF
     
     ## count the number of matches in this chunk
     put offset("mystic_mouse", thisChunk) into theOffset
     repeat
       add 1 to counter
       get offset("mystic_mouse", thisChunk, theOffset)
       if (it = 0) then exit repeat
       put theOffset + it + 12 into theOffset
       ## add offset(return, thisChunk, theOffset) to theOffset
     end repeat
     
   end repeat
   
   close file the_file

   put counter
end startup

HTH.
Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.runrev.com/pipermail/metacard/attachments/20021110/e74d0467/attachment.htm


More information about the metacard mailing list