LISTSERV 16.5 - CODE4LIB Archives

One of our colleagues wrote:

> Instead of “cat file | tail -n +2”, do “tail -n +2 file”.
>  
> Every one of those “$(…)” creates a subshell, with all the attendant overhead.  Many of those, run in parallel,, may be causing a traffic jam for resources.  Have you tried reducing the number of processes launched in parallel, see if overall performance improves?  If that find command returns hundreds of files, you may be overwhelming your system.  8 to 10 parallel processes seems to be the optimum on most VMs I’ve worked on in the recent past with normal amounts of memory and 2 CPUs.
>  
> It may be disk i/o is your enemy here – that is a kernel process that will put the CPU in a wait state, while it waits for the disk to deliver up the data.  Again, especial with disk i/o, less sometimes gives you more.


Using tail -n +x file removes a subshell. I'll give that a whirl.

Yes, the find command not only finds hundreds of files, it finds 10s of thousands of files. 

My shared file system is NFS, and I hear-tell NFS is not very good for parallel processing.

Another colleague of ours suggested re-writing it in Perl, Python, Ruby, etc. 

Thanks for the input.

--
Eric Morgan