the large file challenge

Fri Nov 8 22:55:01 EST 2002

I'm pretty sure the problem with speed here is from reading in the entire 
file.
Unless of course you have enough free RAM- but that's hard to imagine when 
the files are 300MB+.

How about this, which you can adjust to read any given number of lines at a 
time.
Try it with 10, 1000, 10000, etc and see what gives you the best performance!
Hasn't been tested but hopefully it'll run with a tweak or less.

#!/usr/local/bin/mc
on startup
  ## initialize variables: try adjusting numLines
  put "/gig/tmp/log/xaa" into the_file
  put 1000 into numLines
  put 0 into counter

  open file the_file

  repeat until (isEOF = TRUE)
     ## read the specified number of lines, check if we are at the end of the 
file
     read from file the_file for numLines lines
     put it into thisChunk
     put (the result = "eof") into isEOF

     ## count the number of matches in this chunk
     put offset("mystic_mouse", thisChunk) into theOffset
     repeat until (theOffset = 0)
        add 1 to counter
        put offset("mystic_mouse", thisChunk, theOffset) into tempOffset
        if (tempOffset > 0) then add tempOffset to theOffset
        else put 0 into theOffset
     end repeat

  end repeat

  close file the_file

  put counter
end startup

HTH,
Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.runrev.com/pipermail/metacard/attachments/20021108/6b2f812c/attachment.htm