Page Watcher

Description

The page watcher utility is a simple perl script that will, when used in conjunction with a scheduling program such as cron, notify when a page that is being watch has been changed.

The script is unique in a couple respects. The first is that it uses a configuration file to identify the pages to be watched. This configuration file allows for default values to be established and to permit overrides of the defaults on a page by page basis. For example. If you want to get notified at work for most updates you can set the default to notify your work email address. But if there is a page that is of interest to someone else, you can override the email notification address for that particular page.

The program is also unique in that it can watch only the links on a page for change instead of the whole page. This is very useful for sites that contain both dynamic content and download links. If you are only interested in the fact that there is a new release of your favorite software and not that the author of the page as updated some text on the page, this feature is very handy.

Another problem with comparing web pages is that often time javascript and the like will change and you probably do not care about that. The "htmlonly" watch type will ignore all of the non html only components of a web page during the comparison.

Some pages use hit counters or will have hidden input tags that are populated on the back end and have a different value on each evocation of the page. This causes the script to register and report a change every time it runs. You can get past this by using the ignore-regex option, specifying a pattern that should be ignored when the compare is done.

Installation

There is no fancy install script here. Just untar the archive and move the page-watch.pl to wherever you keep your local binaries; This is usually either ~/bin or /usr/local/bin. Next move the page-watch.cf file to ~/.page-watch.cf. It is now installed. Not very useful but it is installed.

Now you need to edit the ~/.page-watch.cf file. The file is annotated internally on format and configuration options. It is very straight forward. You are close to being done. The last thing to do is to test it manually by simply running the command. Once it works like you want schedule it in your favorite scheduling program like cron or any of its cousins.

Dependencies

The script makes use of two perl modules that are not installed by default in any distribution that I know of. They are:

WWW::Mechanize;
HTML::Scrubber;
Mail::Sendmail;
You can install them using whatever your favorite way of installing perl modules is. I use:
# perl -MCPAN -e shell
cpan> install WWW::Mechanize
cpan> install HTML::Scrubber
cpan> install Mail::Sendmail

Administrivia

Download Click Here to go to the sourceforge download page.
Support This Project While I am not in this for the money, if you wish to show some appreciation you are welcome to do so. Half of all donations go to support other open source projects.
SourceForge.net Logo Thank you!