Monitor changed files on Linux using find command and XML+XSLT
Once I’ve decided to write my own monitor for updated files on my Linux server. I’ve selected XML files as storage, and Bash-scripts and Cron as monitor.
Bash script findnewfiles.sh for generating XML-file with list of daily changed files looks like:
#!/bin/bash echo '<?xml version="1.0" encoding="utf-8"?>' echo "<?xml-stylesheet type='text/xsl' href='template.xsl'?>" echo '<files>' find /var/www/vhosts/ -mtime -1 -print | /var/www/newfiles/findfilter.pl echo '</files>'
Additionally I’ve used a filter findfilter.pl for excluding logfiles, dirs etc.
#!/usr/local/bin/perl -w use strict; use warnings; use POSIX qw(locale_h strftime); while (my $filename = <>) { chomp($filename); if (length($filename) && $filename !~ m/webstat(\-ssl)?/ && $filename !~ m#/statistics/webstat(\-ssl)?/# && $filename !~ m#/statistics/ftpstat/# && $filename !~ m#/statistics/logs# && $filename !~ m#/templates_c$# && $filename !~ m#/statistics/(anon_)?ftpstat# && ! (-d $filename) ) { use File::stat; my $sb = stat($filename); print "\t",'',"\n"; print "\t\t", "",$filename,"\n"; print "\t\t", "",strftime ("%a %b %e %H:%M:%S %Y", localtime $sb->mtime),"\n"; print "\t\t", "",(getpwuid($sb->uid))[0],"\n"; print "\t\t", "",(getgrgid($sb->gid))[0],"\n"; print "\t\t", "",$sb->size,"\n"; print "\t\t", "",sprintf("%04o",$sb->mode & 07777),"\n"; print "\t",'',"\n"; } } 1;
Bash-script startfindnewfiles.sh for cron:
#!/bin/bash cd /var/www/newfiles/ dd=`date "+%Y-%m-%d"` ./findnewfiles.sh | gzip > "$dd.xml.gz" echo "http://yourserver.com/newfiles/?date=$dd"
Notice. I’ve used gzip compression for disk space saving.
For formatting the output I’ve used an XSL template. For better usability I’ve added here an jQuery Plugin Tablesorter. It allows you to sort data in table clicking on the column header. XSL Template source:
<html xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml"> <head> <link rel="stylesheet" type="text/css" href="themes/style.css" media="screen"/> <link rel="stylesheet" type="text/css" href="styles.css" media="screen"/> </head> <body> <div align="center"><a href="index.php">back</a></div> <table id="myTable"> <thead> <tr> <th>Filename</th> <th>Modify time</th> <th>Owner</th> <th>Size</th> <th>Rights</th> <th>Group</th> </tr> </thead> <tbody> <xsl:for-each select="files/file"> <tr> <td><xsl:value-of select="name"/></td> <td nowrap="nowrap"><xsl:value-of select="mtime"/></td> <td nowrap="nowrap"><xsl:value-of select="owner"/></td> <td nowrap="nowrap"><xsl:value-of select="size"/></td> <td nowrap="nowrap"><xsl:value-of select="mode"/></td> <td nowrap="nowrap"><xsl:value-of select="group"/></td> </tr> </xsl:for-each> </tbody> </table> <div align="center"><a href="index.php">back</a></div> <script type="text/javascript" language="javascript" src="js/jquery.js" /> <script type="text/javascript" language="javascript" src="js/jquery.tablesorter.js" /> <script type="text/javascript"> <xsl:comment> $(document).ready(function() { $("#myTable").tablesorter({ sortList:[[1,1],[0,0]] }); } ); </xsl:comment> </script> </body> </html>
And finally, the index.php script code:
<?php header('Content-type: text/html; charset=utf-8'); ob_start(); if (!isset($_GET['date']) && !isset($argv[1])) { echo '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>Modified files</title> <link rel="stylesheet" type="text/css" href="styles.css"> </head> <body><div align="center">'; if ($handle = opendir(getcwd())) { $files = array(); while (false !== ($file = readdir($handle))) { if (preg_match('/^\d{4}\-\d{2}\-\d{2}\.xml\.gz$/',$file)) { $files[] = $file; } } natcasesort($files); $files = array_reverse($files); foreach($files as $file) { $d = preg_replace('/\.xml\.gz$/','', $file); printf('<a href="?date=%s">%s</a><br>', $d, dateToSovok($d)); } } echo '</div></body></html>'; } else { $date = isset($_GET['date']) ? $_GET['date'] : $argv[1]; if (!is_file($date.'.xml.gz')) { header('Location: index.php'); die; } // Load the XML source $xml = new DOMDocument; $xml->loadXML(gzdecode(file_get_contents($date.'.xml.gz'))); $xsl = new DOMDocument; $xsl->load('template.xsl'); // Configure the transformer $proc = new XSLTProcessor; $proc->importStyleSheet($xsl); // attach the xsl rules $doc = $proc->transformToDoc($xml); echo $doc->saveHTML(); } function dateToSovok($dt) { $pos1 = strpos($dt,'-'); $pos2 = strrpos($dt,'-'); $year = substr ($dt, 0, $pos1); $month = substr ($dt, $pos1 + 1, $pos2-$pos1-1); $day = substr ($dt, $pos2 + 1, strlen($dt)); return ($day.".".$month.".".$year); } $content = ob_get_clean(); if(function_exists('gzencode') && ($encoding = checkCanGzip()) ) { header("Content-Encoding: ".$encoding); echo gzencode( $content . '<!-- gzencoded -->', 6 ); } else echo $content . '<!-- without compression -->'; /* ------------------------------------------------------------ */ function checkCanGzip() { global $_SERVER;; if (!isset($_SERVER['HTTP_ACCEPT_ENCODING'])) return 0; if (strpos($_SERVER['HTTP_ACCEPT_ENCODING'], 'x-gzip') !== false) return "x-gzip"; if (strpos($_SERVER['HTTP_ACCEPT_ENCODING'],'gzip') !== false) return "gzip"; return 0; } function gzdecode($data) { $len = strlen($data); if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) { return null; // Not GZIP format (See RFC 1952) } $method = ord(substr($data,2,1)); // Compression method $flags = ord(substr($data,3,1)); // Flags if ($flags & 31 != $flags) { // Reserved bits are set -- NOT ALLOWED by RFC 1952 return null; } // NOTE: $mtime may be negative (PHP integer limitations) $mtime = unpack("V", substr($data,4,4)); $mtime = $mtime[1]; $xfl = substr($data,8,1); $os = substr($data,8,1); $headerlen = 10; $extralen = 0; $extra = ""; if ($flags & 4) { // 2-byte length prefixed EXTRA data in header if ($len - $headerlen - 2 < 8) { return false; // Invalid format } $extralen = unpack("v",substr($data,8,2)); $extralen = $extralen[1]; if ($len - $headerlen - 2 - $extralen < 8) { return false; // Invalid format } $extra = substr($data,10,$extralen); $headerlen += 2 + $extralen; } $filenamelen = 0; $filename = ""; if ($flags & 8) { // C-style string file NAME data in header if ($len - $headerlen - 1 < 8) { return false; // Invalid format } $filenamelen = strpos(substr($data,8+$extralen),chr(0)); if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) { return false; // Invalid format } $filename = substr($data,$headerlen,$filenamelen); $headerlen += $filenamelen + 1; } $commentlen = 0; $comment = ""; if ($flags & 16) { // C-style string COMMENT data in header if ($len - $headerlen - 1 < 8) { return false; // Invalid format } $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0)); if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) { return false; // Invalid header format } $comment = substr($data,$headerlen,$commentlen); $headerlen += $commentlen + 1; } $headercrc = ""; if ($flags & 1) { // 2-bytes (lowest order) of CRC32 on header present if ($len - $headerlen - 2 < 8) { return false; // Invalid format } $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff; $headercrc = unpack("v", substr($data,$headerlen,2)); $headercrc = $headercrc[1]; if ($headercrc != $calccrc) { return false; // Bad header CRC } $headerlen += 2; } // GZIP FOOTER - These be negative due to PHP's limitations $datacrc = unpack("V",substr($data,-8,4)); $datacrc = $datacrc[1]; $isize = unpack("V",substr($data,-4)); $isize = $isize[1]; // Perform the decompression: $bodylen = $len-$headerlen-8; if ($bodylen < 1) { // This should never happen - IMPLEMENTATION BUG! return null; } $body = substr($data,$headerlen,$bodylen); $data = ""; if ($bodylen > 0) { switch ($method) { case 8: // Currently the only supported compression method: $data = gzinflate($body); break; default: // Unknown compression method return false; } } else { // I'm not sure if zero-byte body content is allowed. // Allow it for now... Do nothing... } // Verifiy decompressed size and CRC32: // NOTE: This may fail with large data sizes depending on how // PHP's integer limitations affect strlen() since $isize // may be negative for large sizes. if ($isize != strlen($data) || crc32($data) != $datacrc) { // Bad format! Length or CRC doesn't match! return false; } return $data; } ?>
Download all source codes Monitor changed files on Linux using find command and XML+XSLT (28 kb)