Wednesday, 14 January 2015

Building a HATEOAS Hypermedia RESTful Record Store Web Service with Spring


  • This is a very simple example of developing a hypermedia-driven RESTful web service, using Spring HATEOAS
  • A companion project is available to download on github (using Java, Maven and Spring Boot)

HATEOAS? What is it?

  • HATEOAS ("Hypermedia as the Engine of Application State") is an approach to building RESTful web services, where the client can dynamically discover the actions available to it at runtime from the server. All the client should require to get started is an initial URI, and set of standardised media types. Once it has loaded the initial URI, all future application state transitions will be driven by the client selecting from choices provided by the server. 
  • The choices of available actions may be driven by state in time (e.g. different actions available to the user based on access control or stateful data through workflow or previous actions), or by available functionality (e.g. as new functionality becomes available - the client can hook into them with no design changes on its part)

Where did it come from?

  • HATEOAS constraint is an essential part of the "uniform interface" feature of REST, as defined in Roy Fielding's doctoral dissertation (Ray was one of the principal authors of the HTTP specification, and co-founder of the Apache HTTP server project)
  • In his view if an API is not being driven by hypertext, then it cannot be RESTful

The Record Store Spring Example

  • The entry point to our API is :
    • http://localhost:8080/albums/
    • This will basically list all the albums available at our store, along with Stock Levels

  • It provides links to any client of our api telling it the URLs it can use to:
  • View and Albums details:
    • http://localhost:8080/album/{id}

  • View details of the Artist:
    • http://localhost:8080/artist/{id}

  • Purchase a copy of the Album:
    • http://localhost:8080/album/purchase/{id}
    • Note: this action is only displayed next to an album when the stock level is > 0 (an example of how the server will control the transitions available to the client based on current application state)
    • Look at the example below - first call to view album '3' shows it is available to purchase:
      • http://localhost:8080/album/3

      • If you then purchase it, using http://localhost:8080/album/purchase/3, next time you view it you can see there is no longer a purchase option available

The Code

  • This is just a mock service that simulates a service/data access layer
  • It provides some methods to retrieve Album and Artist data, and also to purchase an Album (which for now will just reduce the stock level by 1)

  • Spring web controller for Artist operations

  • Spring web controller for Album operations
  • Note how in here a check is made on stock Levels in determining whether to allow the client to make a purchase


Building and running the example


  • You will require Maven installed and configured (see .... for more details on this)
  • The example project uses Spring Boot to run the example. This basically builds a JAR file and spins up an embedded Tomcat container using a very simple command (so no WAR file is built, and no deployment to an external server is needed). This is great for getting up and running quickly - all you need is Maven and a JDK.


Running with Eclipse STS

  • If you use Eclipse STS - you can just run the project as a Spring Boot App (just right-click the project in the package explorer and click 'Run As .. Spring Boot App'

Running with Maven

  • On the command line, navigate to the project folder root (containing pom.xml), and use 'mvn clean install spring-boot:run'

Either of these will start the Spring Boot deployment, and you should see a screen similar to this:

Changing the Embedded Tomcat Port

  • Just change the 'server.port' setting in

Further Reading


Wednesday, 28 May 2014

Querying XML CLOB Data Directly in Oracle

Oracle 9i introduced a new datatype - XMLType- specifically designed for handling XML naively in the database.

However, we may find we have a database that stores XML directly in a CLOB/NCLOB datatype (either because of legacy data, or because we are supporting multiple database platforms).

It can be useful to query this data directly using the XMLType functions available. This post just shows a very simple example of a query to do this.

(Note: there may be better ways to do this - but this worked for me when I needed an ad-hoc query quickly!)

Our Table

Assume we have a  table called USER, which has the following columns

Our embedded XML (example for Ringo)

Our Query

Say we want to look at account details in the XML - e.g. find which users all have the same sort code. A query like the following:

Will produce output like this. Obviously from here you can use all the standard SQL operators to filter, join etc further to get what you need:

Friday, 14 March 2014

Data Pumper - Migrating Data from MySQL to PostgreSQL

I recently made the decision to migrate from MySQL to PostgreSQL (the reasons why to be covered later). I thought I'd just jot down my notes on a tool I tried which helped with the migration - SQL Workbench/J Data Pumper (no sniggering at the back..).

Migrating the Database

My MySQL database was running on Amazon RDS. I first created a new PostgreSQL instance on Amazon RDS and created the basic Table structures (I'm using JPA/Hibernate in my app - so this was just a case of updating my JPA config and pointing it at the new PostgreSQL instance and let it create all the tables for me) - the benefits of ORM!

Migrating the Data

I then needed to migrate the data over. I didn't have a huge amount of data (in the region of a few hundred/low thousands across a number of tables).

I was initially going to just use mysqldump - possibly to a csv and then just import this into Postgres. However, I had a few issues getting the data out this time (I've used mysqldump before on a semi-regular basis - so not sure exactly what the problem was this time - though it was the first time I'd used it on RDS with MySQL 5.6). As I was moving away from MySQL - I wasn't particularly bothered about getting to the bottom of this - so changed tack and thought I'd give the SQL Workbench Data Pumper a whirl.

It worked really well - and is a great little tool for jobs like this (undoubtedly not the best tool of choice if you had large datasets and remote systems - but for small 'hacky' jobs like this it was ideal).

Just to outline below - this is what I did:


  • Download SQL Workbench/J   
    • I did this on linux - so it was simple extract and run 'chmod +x *.sh' to make the shell scripts executable
    • run 'sqlworkbench' to fire up the app
  • Download the JDBC Database Drivers
  • Configure SQL Workbench with connections to both your databases

  • Try connecting to a database just to confirm it all works as expected

  • Open the 'Data Pumper' (Tools .. Data Pumper)

  • Select Source connection in top left (MySQL in my case), and Target connection in top right (PostgreSQL). Select the Source and Target tables and field mappings, and then click the 'Start' button.

  • Bingo - data transferred! Simple and easy to use. I'd definitely use it again for small jobs like this - useful to have in the toolbox (and supports a wide range of database platforms).

Sunday, 4 August 2013

MongoDB in 3 Easy Steps!

If  you're entirely new to MongoDB - this is just a very gentle introduction that gets you up and running with a sample database in just a few easy steps.

It's really easy to get started. The steps are:

1. Install MongoDB
2. Create a Database
3. Query the Database

1. Install MongoDB

  • The mongoDB site has simple, clear instructions for downloading and installing. Just click on the link below and follow the instructions for your particular operating system (much clearer that trying to explain them again here!):
  • (Make sure you have the have started the 'mongod' server process as explained in these instructions - you should have a mongod server up and running before moving on to Step 2 - like in this screenshot below)

2. Create A Database

  • Now we have a server up and running - the next step is to create a database and pop some data in. For our example we have a very simple Library database, with a single 'book' table (or 'collection' in mongo-speak).
  • Instead of running SQL against mongo - you use Javascript - an example build file is included below (you could enter the commands in the interactive console window - shown later - but for speed and ease of use - you can also run them all together from a script file - which is the approach we use here)
  • Copy the javascript code from the GIST below, and save it to a file called 'library_build_script.js' build script on your machine

  • Open a new console window, navigate to the 'bin' folder in the extracted mongo installation files,  and use the mongo command to load the script - just type:
mongo SCRIPT_LOCATION/library_build_script.js'

  • (Note: if you set the PATH up on your machine, you can of course run the mongo executables from anywhere)
  • this will run the build script to create the database, collection and indexes, and prepopulate with some test data.
  • Note - we are using getSiblingDB() to create and work with a sibling database. This is slightly more complex than the default, but if you create multiple databases, this helps organise them better (as you build databases in a tree like structure). It keeps it easier to manage/view multiple databases on the same server. Have a play around with different options when you get up and running to see what works best for you.


3. Query the Data

  • It's as simple as that to have a populated database up and running on your machine - we can now use the interactive mongo command line tool to query the data
  • Start up the interactive mongo console. Still in the 'bin' folder, type

  • Find All Books Query - list all books in the database

  • Find All Books With Type 'Fantasy' Only

As you can see - it's really easy to get up and running in just a few minutes. Of course there's much more to cover - but this should be a good starting point to start playing.

I'll hopefully be expanding on the Library database in some future posts - exploring more advanced features and functions.


Addendum: Using A GUI Client

You can also use a GUI client to explore your local MongoDB database if you prefer that to using the command line.

Check out UMongo - you can download it for free from the website here:

Just follow the installation instructions for your OS, start it up and create a connection to your local mongod server.

You can then interact with your local mongo instance using the tools in UMongo

Thursday, 1 August 2013

JPA Searching Using Lucene - A Working Example with Spring and DBUnit

Working Example on Github


There's a small, self contained mavenised example project over on Github to accompany this post - check it out here:

Running the Demo

See the README file over on GitHub for details of running the demo. Essentially - it's just running the Unit Tests, with the usual maven build and test results output to the console - example below. This is the result of running the DBUnit test, which inserts Book data into the HSQL database using JPA, and then uses Lucene to query the data, testing that the expected Books are returned (i.e. only those int he SCI-FI category, containing the word 'Space', and ensuring that any with 'Space' in the title appear before those with 'Space' only in the description.

The Book Entity

Our simple example stores Books. The Book entity class below is a standard JPA Entity with a few additional annotations to identify it to Lucene:

@Indexed - this identifies that the class will be added to the Lucene index. You can define a specific index by adding the 'index' attribute to the annotation. We're just choosing the simplest, minimal configuration for this example.

In addition to this - you also need to specify which properties on the entity are to be indexed, and how they are to be indexed. For our example we are again going for the default option by just adding an @Field annotation with no extra parameters. We are adding one other annotation to the 'title' field - @Boost - this is just telling Lucene to give more weight to search term matches that appear in this field (than the same term appearing in the description field).

This example is purposefully kept minimal in terms of the ins-and-outs of Lucene (I may cover that in a later post) - we're really just concentrating on the integration with JPA and Spring for now.

The Book Manager

The BookManager class acts as a simple service layer for the Book operations - used for adding books and searching books. As you can see, the JPA database resources are autowired in by Spring from the application-context.xml. We are just using an in-memory hsql database in this example.


This is the Spring configuration file. You can see in the JPA Entity Manager configuration the key for '' is added to the jpaPropertyMap to tell Lucene where to create the index. We have also externalised the database login credentials to a properties file (as you may wish to change these for different environments), for example by updating the propertyConfigurer to look for and use a different external properties if it finds one on the file system).

Testing Using DBUnit

In the project is an example of using DBUnit with Spring to test adding and searching against the database using DBUnit to populate the database with test data, exercise the Book Manager search operations and then clean the database down. This is a great way to test database functionality and can be easily integrated into maven and continuous build environments.

Because DBUnit bypasses the standard JPA insertion calls - the data does not get automatically added to the Lucene index. We have a method exposed on the service interface to update the Full Text index 'updateFullTextIndex()' - calling this causes Lucene to update the index with the current data in the database. This can be useful when you are adding search to pre-populated databases to index the  existing content.

The source data for the test is defined in an xml file

Wednesday, 31 July 2013

502 Proxy Error Using CometD, Apache and Camel

We recently hit an issue during testing with Apache returning "502 Proxy Errors" to clients who were connecting via CometD (using Apache as a proxy server in front of a Camel/Jetty CometD server).

<title>502 Proxy Error</title>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/cometdServer/connect">POST&nbsp;/cometdServer/connect</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>

Our Setup

We are using a combination of CometD, Camel (in ActiveMQ) and Apache to broadcast messages from our server to subscribed browser clients.
  • CometD is a scalable HTTP-based event routing bus that uses an Ajax Push technology pattern known as Comet.
  • Comet is a web application model using long-held HTTP requests which allow the web server to ‘PUSH’ data to a browser, without the browser specifically requesting it.
  • We have Camel sitting on an ActiveMQ server. On this we have routes set up to process incoming JMS messages. These messages get processed, and then the output from this process (in the form of a JSON message) is sent to a CometD endpoint defined in the Camel route, for example:

  • This endpoint uses the Camel CometD module to run a Jetty instance on port 9099, which handles the CometD endpoint.
  • The browser clients connect to this endpoint via an Apache server. We do this to make use of Apaches mod_proxy module to act as a proxy to the CometD endpoint, e.g. in proxy.conf:
ProxyPass /cometdServer http://camelserver:9099/cometd

ProxyPassReverse /cometdServer http://camelserver:9099/cometd

  • This allows us to easily ‘PUSH’ messages out to subscribed browser clients to inform them of the results of processing taking place inside our Camel messaging routes (the kind of thing you would typically implement using JMS Topics in a java desktop/swing type solution)

Known Problems with Apache and CometD Long Polling

As a general rule, using Apache as a proxy in front of a CometD long-polling/Bayeux server is not a good idea.  This is due to Apaches ‘thread-per-request’ model – which introduces problems when scaling this solution.

This is described in more detail here:

The problem is caused by the Apache proxy server timing out before the Jetty\Camel server (using the default configuration). By default, Apache Timeouts are set to 2 minutes (as defined globally in httpd.conf).
Over on the Camel\CometD side, the default timeout is set to 4 minutes (240000 milliseconds)

Under certain conditions – if the CometD endpoint holds the connection for longer than 2 minutes, it causes a timeout on the Apache proxy, which then disconnects the CometD connection on the client (the length it holds the connection open seems to vary, but is typically much shorter than this)

Solution 1

Increase the timeout on the Apache Proxy server to be greater than 4 minutes. You can either do this globally, or set it specifically on each ProxyPass configuration (in proxy.conf)

ProxyPass /cometdServer http://camelserver:9099/cometd timeout=250

ProxyPassReverse /cometdServer http://camelserver:9099/cometd timeout=250

Solution 2

Decrease the timeout on the Camel server to be less than 2 minutes, e.g.


Useful Tools for Debugging

I found these tools invaluable in getting to the bottom of what was happening (and digging into the details of the CometD messages)

Chrome Developer Tools

Useful for monitoring the network traffic between the browser and the server – lets you easily see the details of each HTTP interaction (on the ‘Network’ tab). Of particular use here was seeing the Timings – you could see that the request which failed did so at a 2 minutes point

Apache Server-Status Module

Enabling apache mod_status was useful to confirm that the problem was not thread related (as mentioned in the known issues with Apache/CometD above)
This lets you view what the current worker threads are doing on Apache. Instructions on how to enable can be found here:

Friday, 24 May 2013

Analysing REST Web Services with Squid and AWStats


I've got a Java application running on Tomcat on Amazon EC2, which exposes RESTful web services delivering JSON to a variety of mobile apps and web sites.

I'm running a Squid server as a reverse proxy caching in front of the web services, to cache defined content at the HTTP layer.

(I'm also caching in the application layer using Ehcache and Spring - but adding the Squid cache in front of this lets me cache more content that just the JSON, and lets me take advantage of Squids proxy functions to manage back end services easily. It also takes more load off the Tomcat server - which is also running scheduled tasks to update the data in the system.)

This lets me run a pretty mixed environment, which suits my current purposes for delivering speedy content, removes load on my main app server and also doesn't tie me to closely to any particular technical implementation in the back end. There is a complexity overhead, but part of the reason for doing this is to play with the technologies - so thats all part of the fun!

At the moment, all this is running on a single EC2 Ubuntu server instance.

All HTTP traffic comes into the Squid server, running on Port 80. This caches certain html, image, json, php, etc content (as defined in /etc/squid3/squid.conf) - and if it can't find the request in the cache (or has been told not to cache it), will hand off to one of the upstream servers to service the request - Tomcat (on port 8080 - serving my Java web services, and admin App) or Apache (on port 9898 - serving my web sites, PHP/HTML).

So my basic setup is similar to this diagram:

The Problem - How to Analyse Web Service Traffic?

What I want to do is be able to track visitors across all web sites and mobile apps. I can track website visitors fine using Google Analytics. However, the mobile apps access the web services directly - so a client based tracker like Analytics cannot track these requests..

As all requests are routed through Squid - all access data sits in the squid 'access.log'. This is the only place that knows about all the traffic (as this will log all access, whether delivered from the cache, or from the backend servers)

Although Squid provides the cachemgr tool - this is more geared to monitoring cache access, than actual detail of what is delivered, where and to who (which is the kind of info I'm more interested in)


My Solution

Looking at the various log analysis tools available - I decided to give AWStats a try - it's free, seems well documented and commonly used

My goal was to get this AWStats set up on my Ubuntu box to read the Squid logs and provide some nice HTML output - updated on a regular basis so I could monitor usage through the day

So my plan was to end up with something like this:


  • Before I started - I already had these set up and running on my Ubuntu (version 12) box
    • Squid Server (running on port 80)
    • Apache (running on port 9898)


Step 1 : Install AWStats

The first step was just to install AWStats on Ubuntu using these apt commands (note: the last 2 are only needed if you want to track stats by country):

sudo aptitude install awstats 
sudo aptitude install libnet-ip-perl 
sudo aptitude install libgeo-ipfree-perl

Step 2 : Configure Squid Logging

The next step is to change the format of the Squid logs. By default, Squids access.log is principally designed to only log the kind of info useful for logging caching activity (what was requested, when, was it served from the cache..)

An example of this is below
1369117923.612     26 TCP_MISS/200 4204 GET http://.../rest/feeds/v3/feeditem/9932/ - FIRST_UP_PARENT/tomcat application/json

1369117947.802      0 TCP_MEM_HIT/200 12130 GET http://.../rest/feeds/v3/latest/WH/2000-01-01-00-00? - NONE/- application/json

1369118022.040    139 TCP_MISS/200 12929 GET http://.../rest/feeds/v3/search/JAVA/JSF - FIRST_UP_PARENT/tomcat application/json

1369118027.240      0 TCP_MEM_HIT/200 12130 GET http://.../rest/feeds/v3/latest/WH/2000-01-01-00-00? - NONE/- application/json

1369118041.264      0 TCP_MEM_HIT/200 12130 GET http:/.../rest/feeds/v3/latest/WH/2000-01-01-00-00? - NONE/- application/json

1369118062.362      0 TCP_MEM_HIT/200 12130 GET http://.../rest/feeds/v3/latest/WH/2000-01-01-00-00? - NONE/- application/json

1369118062.811      0 TCP_MEM_HIT/200 9397 GET http://.../rest/image/38 - NONE/- image/png

1369118062.830      0 TCP_MEM_HIT/200 6936 GET http://.../rest/image/44 - NONE/- image/png

For AWStats to analyse it, and provide more information - we need to change the logging format to a more Apache style log. Something like this: - - [23/May/2013:14:52:25 +0000] "GET http://.../rest/image/34 HTTP/1.1" 200 5043 "-" "Mozilla/5.0 (Linux; U; Android 2.3.6; en-gb; U8815 Build/HuaweiU8815C02B895) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1" TCP_MEM_HIT:NONE - - [23/May/2013:14:53:26 +0000] "GET http://.../rest/feeds/v3/jsonp/feeditem/20112? HTTP/1.1" 200 5277 "-" "Mozilla/5.0 (Linux; U; Android 2.2; en-gb; HTC Desire Build/FRF91) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1" TCP_MISS:FIRST_UP_PARENT - - [23/May/2013:14:53:39 +0000] "GET http://.../rest/feeds/v3/jsonp/feeditem/20109? HTTP/1.1" 200 7114 "-" "Mozilla/5.0 (Linux; U; Android 2.2; en-gb; HTC Desire Build/FRF91) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1" TCP_MISS:FIRST_UP_PARENT

To change the /var/log/squid3/acccess.log logging format - add the following lines to /etc/squid3/squid.conf
logformat combined %>a %ui %un [%{%d/%b/%Y:%H:%M:%S +0000}tl] "%rm %ru HTTP/%rv" %Hs %<st "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh

access_log /var/log/squid3/access.log combined

Stop Squid, backup up and remove the old /var/log/squid3/access.log and restart squid - now all logging from this point on will use this new file format - which AWStats will process
sudo stop squid3
sudo mv /var/log/squid3/access.log /var/log/squid3/access.log.backup
sudo start squid3


Step 3 : Configure AWStats

We need to tell AWStats which log file we want to use, and what the format is.

There are various ways you can configure your domain in AWStats. For simplicity here - I'm just assuming we have one domain and just use the 'out-the-box' awstats.conf configuration (I believe it is common practice to have different conf files for each domain - but we'll just use the default for now).

Find the following lines in '/etc/awstats/awstats.conf' - and change them to these settings (see more details of setting the LogFormat here):


Step 4 : Generating A Report

Now we have AWStats installed and configured, and Squid logging the correct format, we can generate an analysis report. Use the following command to do this:
sudo /usr/lib/cgi-bin/ -config=awstats.conf –update

This should give you aome output similar to:
sudo /usr/lib/cgi-bin/ -config=awstats.conf -update
Create/Update database for config "/etc/awstats/awstats.conf" by AWStats version 7.0 (build 1.971)
From data in log file "/var/log/squid3/access.log"...
Phase 1 : First bypass old records, searching new record...
Direct access after last parsed record (after line 412)
Jumped lines in file: 412
 Found 412 already parsed records.
Parsed lines in file: 53
 Found 25 dropped records,
 Found 0 comments,
 Found 0 blank records,
 Found 0 corrupted records,
 Found 0 old records,
 Found 28 new qualified records.


Step 5 : Configuring Apache/Squid To View Reports

Now we have AWStats installed, and Squid logging the correct format, the next step is to setup Apache and Squid to view the reports.

We want to view the reports on the URL: http://yourdomainname/statistics/

Because all our traffic goes through Squid - we just need to add some directives to '/etc/squid3/squid.conf' to redirect any urls going via '/statistics' directly to the Apache server. We also map '/awstats' urls to the apache server too - to catch the css and js files AWStats references.

# allow access to the - redirect to Apache
acl statisticsAcl url_regex -i (/statistics)
cache deny statisticsAcl
http_access allow statisticsAcl
cache_peer parent 9898 0 no-query originserver name=statisticsPeer
cache_peer_access statisticsPeer allow statisticsAcl

# allow access to the awstats css, js, etc - redirect to Apache
acl awstatsAcl url_regex -i (/awstats)
cache deny awstatsAcl
http_access allow awstatsAcl
cache_peer parent 9898 0 no-query originserver name=awstatsPeer
cache_peer_access awstatsPeer allow awstatsAcl

We then need to configure Apache to work with AWStats. We can easily do this by creating a new file '/etc/apache2/conf.d/statistics' (with the following content):
Alias /awstatsclasses "/usr/share/awstats/lib/"
Alias /awstats-icon/ "/usr/share/awstats/icon/"
Alias /awstatscss "/usr/share/doc/awstats/examples/css"
ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
ScriptAlias /statistics/ /usr/lib/cgi-bin/
Options ExecCGI -MultiViews +SymLinksIfOwnerMatch

Last step is just to restart Apache and Squid
sudo /etc/init.d/apache2 restart
sudo stop squid3
sudo start squid3

You should then be able to access your stats on 'http://yourdomainname/statistics/':


Step 6 : Scheduling as a Cron Job

If you want to schedule the stats to be updated (say, every 30 minutes) - just add this to '/etc/crontab'
*/30 * * * * root /usr/lib/cgi-bin/ -config=awstats.config -update > /dev/null

Step 7 : Option - Generate Static Web Pages

So far - everything we do calls the perl script which re-generates the same web pages dynamically with the latest info.

If you want to - you can also generate static web pages. For example - I want to generate static pages - a different set each day (with the year, day and month in the HTML page title).

You can do this with the following command:

sudo /usr/share/awstats/tools/ -awstatsprog=/usr/lib/cgi-bin/ -config=awstats.conf -dir=/var/www/awstats-report -builddate=%YYYY-%MM-%DD

This will create a set of pages in '/var/www/awstats-report/' with the format 'awstats.awstats.conf.130524.html' (note: you will have to create the /var/www/awstats-report/ directory with the correct permissions.
You will have to add a new set of rules to '/etc/squid3/squid.conf' to:

# allow access to the awstats static reports folder - redirect to Apache
acl awstatsReportAcl url_regex -i (/awstats-report)
cache deny awstatsReportAcl
http_access allow awstatsReportAcl
cache_peer parent 9898 0 no-query originserver name=awstatsReportPeer
cache_peer_access awstatsReportPeer allow awstatsReportAcl

Similarly - you can schedule this to run on a regular basis. For example, to run at 55 minutes past the hour, every hour - add this line to '/etc/crontab'
55 * * * * root /usr/share/awstats/tools/ -awstatsprog=/usr/lib/cgi-bin/ -config=awstats.conf -dir=/var/www/awstats-report -builddate=%YYYY-%MM-%DD

You can then access the static pages on the url: 'yourdomainname/awstats-report/awstats.awstats.conf.130523.html' 


Take all the above with a pinch of salt - I'm sure there are better ways to do some of it, it was learning experience for me - and at the end, delivered what I was after.

I think Squid is a great product, especially for a developer like myself who wants to play with different tech and swap it in/and out of my environment. It's proxying abilities allow you to swap implementatations and server locations with relative ease.

And in it's main role - as a reverse proxy web acceleration server - it's been performing excellently, speeding up web service and image serving with never a grumble. Now I have AWStats up and running - I can drill down and get a lot more information about what's going on under the covers!