Posts Tagged MySQL

Book Review: WordPress Theme Design

wordpress theme design.thumbnail Book Review: Wordpress Theme DesignOver the past few months I’ve had the book “WordPress Theme Design” on my desk ready to be reviewed (it’s another book from Packt publishing). I finally got to it earlier this week. You can read my review over at Amazon. I gave it 3 stars mainly because it wasn’t complete and lacked some of the details I’d like to see in a “guide” book. But I can’t say I didn’t learn anything.

One of the reasons I picked up this title was because I’ve been itching to redesign Tod means Fox. I use WordPress (and like it very much) and have all the tools (and some questionable skills) for a redesign. But don’t worry, any redesign will have the same glorious orange that helps remind me of FoxPro, Firefox, ING, Packt, and Holland! But honestly, It’s been a while and I’m restless. I also want a redesign to be done entirely on my new iMac. If I do it quick enough, it will be my first real work performed on the new machine.

It turns out though that the book wasn’t very helpful. I’m sure it will sit on my shelf until some day in the distant future when its time to “clean up” the office. I’ll likely be on revision n of Tod means Fox, and who knows where WordPress will be. But I suppose that’s the nature of buying technology books.

Very few tech books stand the test of time. Those that do are usually pure reference and theory books and not how-to books.

I keep reference books (like the PHP Cookbook, The Complete Reference: MySQL, and of course the Hacker’s Guide to Visual FoxPro) at arms length. These books I go to often. Then there are the theory books. These are books I keep further away but access often — and usually not in front of the computer. Books like Kimball’s The Data Warehouse Toolkit and Zeldman’s Designing with Web Standards fit into this category.

OK, enough babbling. I’d be interested to hear your thoughts on a redesign, WordPress, time-tested tech books, or the fabulous color orange.

Tags: , , , , ,

No Comments

Web Analysis with VFP: Download, Parse, and Stage

My Web Analysis application is beginning to take shape. At the moment, I have created a little road map document (which I’ll share shortly), defined all my base classes (the entire application will be made available in the coming months), and proved the general concept (see below).

To recap, this project is about building a tool to help me make better business decisions on and using the Internet. While there are a ton of canned products on the market, I feel that I can get what I need with VFP (and get it better). In general, I want to be able to download a variety of data from my server, analyze it, and use that data to make decisions (everything from what IPs to ban to how I might redesign a certain page). See my post “Business Intelligence Through Web Analysis” for more details.

The following zip file contains all the code necessary to prove the concept. It (a) establishes a database, (b) downloads the log files from the FTP server, and (c) parses the file, storing it in a stage directory, which I can then use to create a Dimensional Model for analysis purposes. Here’s the programs:

web_analytics.zip

In the zip, you will find:

  • wa_main.prg (the setup program, check the path and run this once)
  • functions.prg (a place to store my functions, this will become a class soon)
  • download.prg (set your server information up and run, it calls load_logfile(), which lives in functions.prg)
  • ftp/ftp.prg (FTP services class written by Robert Abram many moons ago — still works great, requires wininet.dll)

Now that the concept has been proved, and I can visually analyze my log files, I will start to build an application in VFP9 around it. Stay tuned!

Tags: , , , , , ,

No Comments

Analyze Apache Raw Access Logs in FoxPro

I’ve been increasingly interested in examining the access logs on my server. For the past 6 or 7 years, I’ve used some of the cool little tools provided by cPanel located in their Web/FPT Statistics folder (like Webalizer and Awstats). While nice to look at, I’ve always felt a bit handicapped by the interface and general content. I have my own ideas on what I’d like to see and how I want to see it.

In this and subsequent articles, I’ll build an Apache Raw Access Log analysis application that will (a) download the access logs once or twice a day, (b) dump the data into a database, (c) integrate this data with other information (perhaps some demographics info, error logs, site information, date and times of blog posts, etc) (d) and in the end, build a suite of reports and filters that can be used to identify things like abuse, spam, and positive traffic.

My hope is to do a good enough job with this that, in the end, could end up in the Fox Open Source community if there’s even a remote bit of interest. This should evolve in such a way that any access log from any server would be supported, but in the initial stages, I’ll be using my Raw Access Logs from Apache. For those interested, here are the details of my server configuration:

Operating system: Linux
Apache version: 1.3.37 (Unix)
cPanel Build: 10.9.0-STABLE 9966
cPanel Pro: 1.0 (RC36)

The Log File

I started by looking at the raw files created by Awstats on my server (/tmp/awstats/). There is a ton of information in there which is already parsed and used by the various free analysis tools. But what I really need is a raw file that contains a line for each ‘hit’ to my server (including 200, 404, and 500 messages). The Raw Access Logs are perfect (in cPanel, you can manage them by opening the Raw Log Manager). They contain a line for each hit, some referral information, http status codes, etc.

The only problem with the file is its format (NCSA combined/XLF/ELF)! A single line looks like:

206.17.xxx.xxx – - [23/May/2007:07:59:21 -0500] “GET /Styles/color.css HTTP/1.1″ 304 – “http://www.rabbitstewlounge.com/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; InfoPath.1)”

Fortunately, FoxPro can handle this sort of thing quite easily. With a log file containing potentially thousands of lines each day, you can see how reading this file without parsing can be nightmarish (not to mention highly inefficient). So, this is where I started. I downloaded today’s access file from my site, and wrote the following code. As this project matures, I’ll be sure to clean this up (hopefully with some reader’s suggestions!) and provide the source in zip format. For now, follow along:

Get Started

First, download your raw access file. Load cPanel and click “Raw Access Logs”. Select the raw access log for your domain and unzip it. In this example, I’ve also changed it’s name from “rabbitstewlounge.com” to “raw_log.txt”.

The Basic Code

I left out some environmental settings, such as SET SAFETY and SET MEMOWIDTH for now. Eventually this will be part of a project, which will set all these things later. Also, as this code is merely used to test the theory, I haven’t included any real error checking. I’d be interested in hearing some ideas on how to tweak this for better speed and reliability.

I did consider writing this using low-level file functions (FOPEN, FGETS, FREAD, etc), but in the end I felt that the format was too wacky and I had better rely on STREXTRACT and SUBSTRING instead.

CREATE DATABASE C:/webstats/webstats
OPEN DATABASE C:/webstats/webstats
CREATE TABLE C:/webstats/webstats.dbf (;
    stat_id integer AUTOINC,;
    ip_address character(15),;
    date_time datetime,;
    gmt_offset character(5),;
    access_method character(4),;
    location varchar(200),;
    http_version character(10),;
    http_code character(3),;
    bytes integer,;
    referrer varchar(200),;
    agent varchar(200),;
    source_file varchar(60))
 
lcData = FILETOSTR("C:/webstats/raw_log.txt")
 
FOR n = 1 TO MEMLINES(lcData)
 
    lcLine = MLINE(lcData,n)
 
    *-- IP address is first.
    lcIP = ALLTRIM(SUBSTR(lcLine,1,AT("- -",lcLine)-1))
 
    *-- date time and GMT offset info. Use STREXTRACT
    *-- to get the data between the first two quotes
    *-- on the line. Then break the datetime string
    *-- into separate lines to use MLINE to build a
    *-- proper datetime string
 
    lcDateTimeGMT = ;
        STRTRAN(STREXTRACT(lcLine,"[","]")," ",CHR(13))
    lcDT = ;
        STRTRAN(STRTRAN(MLINE(lcDateTimeGMT,1),"/",CHR(13)),":",CHR(13))
    IF !EMPTY(lcDT)
        ltDatetime = DATETIME(;
            VAL(MLINE(lcDT,3)),;
            get_monthnum(MLINE(lcDT,2)),;
            VAL(MLINE(lcDT,1)),;
            VAL(MLINE(lcDT,4)),;
            VAL(MLINE(lcDT,5)),;
            VAL(MLINE(lcDT,6)))
    ELSE
        ltDatetime = {}
    ENDIF
 
    lcGMTOffset = MLINE(lcDateTimeGMT,2)
 
    *-- Request data. Just to 'switch-it-up' a bit,
    *-- I'm using GETWORDNUM here because
    *-- I need 3 chunks of data within the
    *-- STREXTRACT, all conveniently separated
    *-- by a space.
 
    lcRequest = STREXTRACT(lcLine,'"','"')
    lcMethod = GETWORDNUM(lcRequest,1)
    lcLocation = GETWORDNUM(lcRequest,2)
    lcHTTPVer = GETWORDNUM(lcRequest,3)
 
    *-- HTTP Code and bytes returned.
    lcHTTPCode = SUBSTR(lcLine, ;
        AT(lcRequest,lcLine)+LEN(lcRequest)+2,3)
    lnBytes = VAL(SUBSTR(lcLine,AT(" " + lcHTTPCode + ;
        " ",lcLine)+LEN(lcHTTPCode)+2,;
        AT('"',lcLine,3) - (AT(" " + lcHTTPCode + " ",lcLine)+;
        LEN(lcHTTPCode)+2)))
 
    *-- referrer
    lcReferer = STREXTRACT(lcLine,'"','"',3)
 
    *-- User's agent
    lcAgent = STREXTRACT(lcLine,'"','"',5)
 
    INSERT INTO webstats (ip_address,date_time,gmt_offset,;
        access_method, location,http_version,http_code,;
        bytes,referrer,agent,source_file ) ;
    VALUES  (lcIP , ltDatetime , lcGMTOffset , lcMethod ,;
        lcLocation , lcHTTPVer ,lcHTTPCode,lnBytes,lcReferer,;
        lcAgent,"raw_log.txt" )
 
NEXT
 
*-- Functions
FUNCTION get_monthnum
LPARAMETERS tcAbbr
 
DIMENSION aMonthList[12]
aMonthList[1] = "JAN"
aMonthList[2] = "FEB"
aMonthList[3] = "MAR"
aMonthList[4] = "APR"
aMonthList[5] = "MAY"
aMonthList[6] = "JUN"
aMonthList[7] = "JUL"
aMonthList[8] = "AUG"
aMonthList[9] = "SEP"
aMonthList[10] = "OCT"
aMonthList[11] = "NOV"
aMonthList[12] = "DEC"
 
RETURN ASCAN(aMonthList,LEFT(UPPER(ALLTRIM(tcAbbr)),3))
ENDFUNC

Next entry, I’ll see about downloading this file automatically (might take some ingenuity) and building in some logic to do so periodically. After that, I’ll write some algorithms to identify potential abuse situations, spam, and increased traffic, setting up email notifications in the process. Stay tuned!

Tags: , , , , , ,

4 Comments