Oftentimes I find myself needing to work with large text files, and opening up huge files just chokes out even the best text editors. Most of the time, I only need a sampling of the file to get a picture of what is happening, and these are my go-to utilities.
I’m a big fan of Linux and GNU Utilities, but living practically, I use Windows as my primary workstation. One of my first installs on a new build is the GnuWin32 package available at http://gnuwin32.sourceforge.net/ The library is fairly small & lean, and doesn’t require complex environments.
Three useful utilities in the coreutils package – head, tail, and split.
Simply for head and tail:
head -n 50 input.txt
Optionally send the output to a new file:
head -n 50 input.txt > 50lines.txt
tail works the same way, and n specifies to grab the first or last n lines. The default value for n is 10.
split is a little more complicated, here’s the usage:
Usage: split [OPTION] [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
PREFIX is 'x'. With no INPUT, or when INPUT is -, read standard input.
-b, --bytes=SIZE put SIZE bytes per output file
-C, --line-bytes=SIZE put at most SIZE bytes of lines per output file
-l, --lines=NUMBER put NUMBER lines per output file
-NUMBER same as -l NUMBER
--verbose print a diagnostic to standard error just
before each output file is opened
--help display this help and exit
--version output version information and exit
SIZE may have a multiplier suffix: b for 512, k for 1K, m for 1 Meg.
Report bugs to.
My task for split was to take a large file, and break it up into files no larger than 8MB each. The command used:
split -C8m input.txt split
That results in files no larger than 8MB, prefixed with split, and the best part is it keeps lines together.