Download Counter
- Version
0.9.0
Contents
Getting Started
Installation
Before you start
To avoid problems, read this page in full before running the app on a live server.
Dependencies
Download Counter requires Python3. Tested with Python 3.8.10 - slightly earlier versions may work, but are untested.
Permissions
dlcounter.py may either be run by passing the command to python:
$ python3 dlcounter.py <args>
or by making dlcounter.py executable, then run by entering the command path/name and (optional) argments:
$ sudo chmod +x dlcounter.py
$ dlcounter.py <args>
Reading server access logs requires root / admin access. Run dlcounter.py as root / admin. For example, to initialise the database on Linux:
$ sudo ./dlcounter.py -i /var/log/nginx/access.log -v
Installing
To use Download Counter, place dlcounter.py, dlcounter_html.py and dlcounter.cfg into a suitable directory outside of your website. For example, they could be placed in a folder in your home directory:
$ mkdir ~/download_counter
$ mv dlcounter.py ~/download_counter
$ mv dlcounter_html.py ~/download_counter
$ mv dlcounter.cfg ~/download_counter
$ sudo chmod +x ~/download_counter/dlcounter.py
Alternatively, if Download Counter is obtained as an archive file, simply extract the entire package to a convenient location outside of you website. Remember to set file execute permission for dlcounter.py if required.
Files
dlcounter.py is the main app.
dlcounter.cfg contains the configuration settings. (This file must be customised before use.)
dlcounter_html.py provides an HTML template for html output.
downloads.db (created when Download Counter is initialised) is the database that stores the data.
Configuration
Configuration file
The app has a configuration file “dlcounter.cfg” that may be edited with any plain text editor. For example, to edit with nano :
$ cd ~/download_counter
$ nano dlcounter.cfg
Sections
- [ACCESSLOGS]
One or more server logs to analyse.
- [FILEPATH]
The first part of the name of file(s) to be counted from the access log.
- [FILENAMES]
The final part of the name of file(s) to be counted from the access log. Typically this will be one or more file extensions.
- [WEBPAGE]
The HTML file to display download totals. Typically this will be within the website html directory.
- [DATETIME]
Datetime formats for reading access logs and writing HTML.
- datetime_read
Format for reading access logs.
- datetime_write
Format for writing html webpage.
Further details can be found in the Customisation section.
Command Line
The following command line switches are provided:
-d, --docsShow built-in documentation and exit.-h, --helpShow short help and exit.-i, --init’path/to/access.logs’*Initialise database. (See: “Initialising”).-v, --verboseVerbose output.Prints arguments, options, and the database contents to stdout.Along with -D (--debug) this can be useful to check theconfiguration and for debugging.-D, --debugPrints debug information to stdout.Along with -v (--verbose) this can be useful to check theconfiguration and for debugging.-V, --versionShow version and exit.
Initialising
Download Counter is initialised by running dlcounter.py as root/admin with the -i (--init) switch, and the path to to the access logs. It is a good idea to run this manually from the command line with the -i (--init), -D (--debug), and -v (--verbose) options. Note that the path must include the base filename.
All access logs in ‘path/to/access.logs’, (including .gz files), are read. Matching download files are counted regardless of when they were downloaded. This option overrides [ACCESSLOGS] in dlcounter.cfg and should only be used on first run. If the database already exists, the downloads table will be deleted and recreated.
Example
python dlcounter.py -i '/var/log/nginx/access.log'
This example will read all logs:
/var/log/nginx/access.log
/var/log/nginx/access.log.1
/var/log/nginx/access.log.2.gz
/var/log/nginx/access.log.3.gz
…
It does not attempt to read other logs such as error.log
.
Running the App
After initialising the database, and setting up the configuration options, dlcounter.py may be run at any time to update the database and html output. To ensure that all downloads are caught, dlcounter.py should be run immediately prior to log rotation. (Alternatively, logs from the current and previous day could be analysed.)
Tip:
Run the program with the -v (--verbose) and -D (--debug) switches, and redirect output to a text file to check that it is running as expected.
Log Rotation
By default, Ubuntu uses logrotate to rotate logs once per day. Ideally, download counter should be run immediately before the logs are rotated so that all records for the previous 24 hours are analyzed.
How it works
This example describes the default setup for Ubuntu / Nginx.
The the script /etc/cron.daily/logrotate
runs daily and executes
/usr/sbin/logrotate /etc/logrotate.conf
The logrotate.conf file contains the global default settings for logrotate
and includes additional configuration files in /etc/logrotate.d
.
Among the files in /etc/logrotate.d
is the logrotate configuration for
nginx:
/var/log/nginx/*.log {
daily
missingok
rotate 14
compress
delaycompress
notifempty
create 0640 www-data adm
sharedscripts
prerotate
if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
run-parts /etc/logrotate.d/httpd-prerotate; \
fi \
endscript
postrotate
invoke-rc.d nginx rotate >/dev/null 2>&1
endscript
}
The line compress
specifies that the log files are compressed, but
delaycompress
delays the compression of the most recent log until the
next rotation cycle.
The section between prerotate
and endscript
checks for the existence of
the directory /etc/logrotate.d/httpd-prerotate
. If it exists, then
executable scripts within that directory are run before the logs are rotated.
Thus a script can be scheduled to run immediately before rotation by placing
it in the folder /etc/logrotate.d/httpd-prerotate
.
Note
By default, run-parts
requires that script names must consist entirely of
ASCII upper- and lower-case letters, ASCII digits, ASCII underscores, and
ASCII minus-hyphens. In particular, dots are not allowed, so file
extensions must not be used.
Example
If /etc/logrotate.d/httpd-prerotate
does not exist, create it:
$ sudo mkdir /etc/logrotate.d/httpd-prerotate
Create a bash script to run dlcounter.py. Initially we will run with verbose (-v) and debug (-D) options, and redirect the output to a file to check that it is running as expected
$ sudo nano /etc/logrotate.d/httpd-prerotate/dlcounter
Example script:
#!/bin/bash
output="/home/<username>/dlcount.txt"
date +"%Y-%m-%d %H:%M:%S.%N %z" > $output
echo -e "-----------------\n" >> $output
/home/<username>/dlcounter/dlcounter.py -v -D >> $output
And make the script executable:
sudo chmod +x /etc/logrotate.d/httpd-prerotate/dlcounter
Customisation
Before use, the dlcounter.cfg file must be customised to suit your server setup. This file must be in the same directory as dlcounter.py.
Below are descriptions of the sections of the config file, and suggested settings for common setups based on a WordPress installation running on Nginx and Ubuntu Linux.
[ACCESSLOGS]
Specify the log file(s) to analyse.
Typically, for a WordPress site on Nginx, this will be:
/var/log/nginx/access.log
, which contains data since the last log
rotation. Provided that downloadcounter runs immediately before log rotation it
is sufficient to analyze just this one log file once per day. See the
“Log Rotation” section for more information about how to
use downloadcounter with logrotate.
Example
[ACCESSLOGS]
log1 = /var/log/nginx/access.log
Downloads are only counted if their time stamp is more recent than any downloads already counted. If it is necessary to analyze aditional (older) log files, then the files must be listed oldest first. All log files listed here must be plain text files.
Example
[ACCESSLOGS]
log1 = /var/log/nginx/access.log.1
log2 = /var/log/nginx/access.log
[FILEPATH]
The first part of the search string to identify the downloads in the log files.
This version of Download Counter supports only one FILEPATH parameter. Typically the downloads to be counted will have a common file path, which for WordPress sites is in the form:
.../wp-content/uploads/*<year>*/*<month>*/*<filename>*
As “/wp-content/uploads/” is common to all download files, this is used as the first part of the search string when finding downloaded files.
Example
[FILEPATH]
path = /wp-content/uploads/
[FILENAMES]
The last part of the file(s) to search for in the log files. Typically this will be a list of file extensions.
Example
[FILENAMES]
file1 = .zip
file2 = .exe
[WEBPAGE]
The location for html output.
Download Counter generates a web page for viewing the download totals. Typically this will be located within the public html directory of your website. This must be a fully qualified path. If omitted, no html will be generated.
Example
[WEBPAGE]
path = /var/www/html/downloads.html
[DATETIME]
Datetime formats for reading access logs and writing the html webpage. This section has two settings, both of which are required:
datetime_read
Format for reading access logs.
- Default:
%d/%b/%Y:%H:%M:%S %z
The default matches: “01/Jan/2022:23:35:05 +0000”
datetime_write
Format for writing html webpage.
- Default:
%a %d %b %H:%M
The default matches: “Mon 01 Jan 18:35”
API
Download counter
Searches access.log files for successful downloads that match a specified search string. Matching downloads are tallied in an SQLite database, and results output to an html file.
Usage
The main functional parameters are set in dlcounter.cfg. Additional options may be set through command line arguments.
Command line switches
Usage: dlcounter.py [-d] [-h] [-i] [-v] [-D] [-V]
Arguments:
- -d, --docs
Show this documentation and exit.
- -h, --help
Show short help and exit.
- -i, --init string
This option is required if you wish to count downloads in old archived ‘.gz’ files.
All access logs in path, (including .gz files), are read. Matching download files are counted regardless of when they were downloaded, so this option should only be used on first run, (before the database contains data). This option overrides ACCESSLOGS in dlcounter.cfg.
Example
The path string should be entered in the form:
$ python3 dlcounter.py -n '/var/log/nginx/access.log'
To read all logs:
* /var/log/nginx/access.log
* /var/log/nginx/access.log.1
* /var/log/nginx/access.log.2.gz
* /var/log/nginx/access.log.3.gz
* ...
- -v, --verbose
Print commands and database contents to stdout.
- -D, --debug
Print additional debug strings to stdout.
- -V, --version
Show program version and exit.
Configuration file
The configuration file (‘dlcounter.cfg’) must be in the same directory as ‘download_counter.py’.
[ACCESSLOGS] One or more access logs.
Log files must be plain text (not .gz archives). When more than one access.log files specified, files must be in reverse chronological order (process oldest first).
- Default:
log1 = /var/log/nginx/access.log
[FILEPATH] The first part of the download file’s string.
This refers to the string as it appears in the access log. If not supplied, all file names matching the [FILENAMES] option(s) will be counted.
- Default:
path = /wp-content/uploads/
The default option will catch files in any of:
…/website/downloads/2001/
…/website/downloads/2002/
…/website/downloads/…/
[FILENAMES] Download files end of string.
The default options will catch .zip and .exe files that begin with ‘FILEPATH’.
- Default:
file1 = .zip
file2 = .exe
[WEBPAGE] Fully qualified path for html output.
HTML output is disabled if this path is not specified.
- Default:
path = /var/www/html/downloads.html
[DATETIME] Datetime formats for reading access logs and writing HTML.
datetime_read
Format for reading access logs. The default matches: 01/Jan/2022:23:35:05 +0000
- Default:
%d/%b/%Y:%H:%M:%S %z
datetime_write
Format for writing html webpage. The default matches: Mon 01 Jan 18:35
- Default:
%a %d %b %H:%M
Note:
[FILEPATH] and [FILENAMES] options are just strings to search for in the accesslog file(s). Regex is used to search the log file(s) for: “<path-string> any-characters <file-string>”
- dlcounter.check_path(name, file)
Check if file exists.
- Parameters
name (string) – File identifier / file name.
file (string) – File path.
- Returns
True or exit.
- Return type
bool
- dlcounter.db_path()
Path to database file.
- Parameters
None –
- Returns
Fully qualified path to SQLite database file.
- Return type
string
- dlcounter.first_item_in_section(cfg, section)
Return the first item from cfg section.
- Parameters
cfg (configparser.ConfigParser) – The ConfigParser object.
section (string) – Section key.
- Returns
First value from section, or empty string.
- Return type
string
- dlcounter.format_datetime_output(dt_string)
Format dt_string as required for html output.
- Parameters
dt_string (datetime) – datetime object.
- Returns
Reformatted datetime string.
- Return type
string
- dlcounter.get_config()
Return dict of arguments from dlcounter.cfg.
List values are retrieved by
list_section()
. Single values retrieved byfirst_item_in_section()
. Also print parameters from command line and config file when –verbose command line argument is passed.- Parameters
None –
- Returns
Values from configuration file.
- Return type
dict
- dlcounter.get_db_time(con)
Return most recent timestamp from database.
- Parameters
con (connection) – Connection to the database.
- Returns
datetime.min if timestamp not found.
- Return type
datetime
- dlcounter.get_record(record, pattern)
Return (filename, time) from line when start and end of name are found, or None if not found.
- Parameters
record (string) – One line from access log.
pattern (string) – Regex pattern: f’GET {re.escape(start)}.*{re.escape(end)}’
- Returns
(Short filename string, datetime object) if successful,or None.
- Return type
tuple
- dlcounter.get_time(record)
Return timestamp from record.
- Parameters
record (string) –
- Returns
Exit if timestamp not found.
- Return type
datetime
- dlcounter.init_db(logpath, opt)
Initialise database.
Similar to main() but reads all logs that start with ‘logpath’ and does NOT check timestamp before counting. If the database already exists, the old table will be deleted and a new table created.
- Parameters
logpath (string) – Path to access logs.
opt (dict) – Parameters from dlcounter.cfg.
- Return type
None
- dlcounter.list_section(cfg, section)
Return list of values from cfg section.
- Parameters
cfg (configparser.ConfigParser) – The ConfigParser object.
section (string) – Section key.
- Returns
List of zero or more values.
- Return type
list
- dlcounter.log_to_sql(con, file, searchstring, timecheck=None)
Copy download data from log file to database.
Read one log file and update database. Updating is handled by
update_db()
.- Parameters
con (connection) – Connection to the database.
file (_io.TextIOWrapper) – Pointer to access log.
searchstring (string) – Regex pattern for filename in log.
timecheck (datetime) – Initialise if timecheck=None.
- Returns
True when database has been modified.
- Return type
bool
- dlcounter.main(opt)
Count downloads from access logs.
Search for file names in the access logs that match the search criteria, and update the database as necessary. The database is created automatically if it does not exist. Downloads with timestamps older than the last update are ignored.
- Parameters
opt (dict) – Contains string values: acclogs, searchstring, and html_out.
- Return type
None
- dlcounter.print_table()
Print contents of database to stdout.
This function is used only with –verbose option.
- Parameters
None –
- Return type
None
- dlcounter.sql_table(con)
Create ‘downloads’ table if it doesn’t exist.
- Parameters
con (connection) – Connection to the database.
- Return type
None
- dlcounter.time_format(readf='', writef='')
Return the date-time format
Values from config for reading access logs and writing html. Call either time_format.read or time_format.write.
- Parameters
readf (string, default '') – Time format for reading access logs.
writef (string, default '') – Time format for writing html.
- dlcounter.read
- Type
string
- dlcounter.write
- Type
string
- Return type
None
- dlcounter.update_db(con, fname, timestamp)
Update database.
If fname exists in database, update its download total and timestamp, else insert it into the database with a count of 1.
- Parameters
con (connection) – Connection to the database.
fname (string) – Name of the downloaded file.
timestamp (datetime) – Timestamp of download.
- Return type
None
- dlcounter.write_html(con, htmlfile)
Write sql data to web page.
- Parameters
con (connection) – Connection to the database.
htmlfile (string) – Path to html output file.
- Return type
None
Web Page Template
HTML code for generating web page output.
dlcounter.py creates table data between html_top()
and html_bottom()
. This module provides the rest of the HTML for the
download counter web page.
This file may be modified according to need.
- dlcounter_html.html_bottom()
Return end of html page.
- Returns
HTML output.
- Return type
string
- dlcounter_html.html_top(timestamp)
Return beginning of html page.
- Returns
HTML output.
- Return type
string
Download Counter
A standalone Python app to keep a tally of downloads from a website.
This app was developed for use with WordPress / NGINX / Ubuntu, but will probably work on other platforms with minimal alterations. Patches to support other common platforms are welcome.
Rationale
While there are several WordPress modules for managing downloads, in 2022 it seems that they are all large, complex modules aimed at e-commerce. From a security perspective, it is advantageous to avoid presenting a larger attack target than absolutely necessary, hence this small download counter utility.
What it does:
This app scans the server access logs and searches for downloaded files that match the given search criteria. When found, they are logged in an SQLite database, along with the download timestamp and a count of how many times the file has been downloaded.
Finally, the app generates an HTML page to display the database contents.
How it Works
Nginx logs all access traffic in its ‘access.log’. On a daily schedule the logs are rotated by default (Ubuntu / Nginx):
access.log -> access.log.1
access.log.1 -> access.log.2.gz
…
access.log.13.gz -> access.log.14.gz
access.log.14.gz -> deleted.
Each file downloaded from a website on the server has a log entry in the form:
Remote-IP - - [local-time-date] "GET /path/to/file.ext protocol"/
status-code bytes_sent "http-referer" "http-user-agent"
As (by default) the ‘access.log’ file only contains logs for the current day, this app should be run once per day (by a root cron job), immediately before log rotation. See the section “Log Rotation” for how to do this. (Alternatively, both ‘access.log’ and ‘access.log.1’ could be analyzed at any time of day, though this is much less efficient.)
From the log we need to find:
lines containing the file search string
with status code 200
after the last time it was counted
The results are stored in an SQLite database, and written to an HTML file.