the large file challenge
Sadhunathan Nadesan
sadhu at castandcrew.com
Sun Nov 10 14:04:01 EST 2002
|
| I'm pretty sure the problem with speed here is from reading in the entire
| file.
| Unless of course you have enough free RAM- but that's hard to imagine when
| the files are 300MB+.
|
| How about this, which you can adjust to read any given number of lines at a
| time.
| Try it with 10, 1000, 10000, etc and see what gives you the best performance!
| Hasn't been tested but hopefully it'll run with a tweak or less.
|
| #!/usr/local/bin/mc
| on startup
| ## initialize variables: try adjusting numLines
| put "/gig/tmp/log/xaa" into the_file
| put 1000 into numLines
| put 0 into counter
|
| open file the_file
|
| repeat until (isEOF = TRUE)
| ## read the specified number of lines, check if we are at the end of the
| file
| read from file the_file for numLines lines
| put it into thisChunk
| put (the result = "eof") into isEOF
|
| ## count the number of matches in this chunk
| put offset("mystic_mouse", thisChunk) into theOffset
| repeat until (theOffset = 0)
| add 1 to counter
| put offset("mystic_mouse", thisChunk, theOffset) into tempOffset
| if (tempOffset > 0) then add tempOffset to theOffset
| else put 0 into theOffset
| end repeat
|
| end repeat
|
| close file the_file
|
| put counter
| end startup
|
| HTH,
| Brian
-------------------
Hey Brian, thanks, excellent.
I tried it with 10, 1000, 10000 and it got slightly faster (just a few
seconds) with each increase so I'll leave it at 10000 and compare against other
suggested algorithms, and let everyone knkow results..
Sadhu
More information about the metacard
mailing list