Monday, July 31, 2017

Parsing large log files quickly

Timegrep is a fantastic utility to parse through massive log files quickly.  It does a binary search for a time range based on a specified time format.

The utility is available off github: https://github.com/linux-wizard/timegrep

Here is an example of how I can use it to go through and grep through dozens of log files each of which can be several GBs in size:

This example is for an NGINX server's errors.

find /var/log/nginx/ -type f -name '*.log-20170730' -exec ~/bin/timegrep.py -d 2017-07-29 --start-time=19:30:00 --end-time=19:45:00 '{}' \; | grep '\[error\]' > ./errors-list.txt

Another example to get some stats from Apache, combined with some piping and grepping from: https://blog.nexcess.net/2011/01/21/one-liners-for-apache-log-files/

Run this command from /var/log/httpd on a CentOS system:


find . -type f -name '*.access.log' -exec /root/bin/timegrep.py -d 2017-07-31 --start-time=10:05:00 --end-time=10:06:00 '{}' \; | awk '{print $1}' | sort | uniq -c | sort -rn | head -20

This will go through all of the .access.log files in /var/log/httpd and parse all of the entries during the 10:05 to 10:06 minute, and print the top 20 IPs.

Basically, if you combine timegrep with the find command, you've got yourself some serious log parsing firepower.

Of course, if you've got this quantity of logs to parse through, sometimes tools like splunk are a bit more appropriate.  However, as they are not always available, the above technique can get you out of a serious bind.

No comments:

Post a Comment