to analyze user access log
Currently there are many software tools for analyzing a web server user log. However, those software packages are extremely expensive and difficult to configure. On the other hand, JMP is relatively affordable and user-friendly. But it is a generic statistical analytical software program rather than a web traffic monitor, and therefore the user must build functions to extract and manipulate information from the log. Also, JMP cannot directly extract Web user log, and therefore all analysis must be performed locally. Nevertheless, JMP is sufficient for basic web traffic troubleshooting and analysis.
JMP is a statistical discovery software released by SAS Institute. It has both Mac and Windows versions. As you know, SAS is the leader of statistical analysis and data warehousing. However, the operation of SAS is mostly syntax-based and this is intimidating for beginners. In addition, the output of SAS is numeric-centric and has difficulties in coping with the trend of data visualization. To remediate these shortcomings, SAS Institute introduced JMP, which has a graphical user interface. Users can easily explore the data in a "what-if" manner. The trade-off of this user-friendiness is the lack of powerful statistical functions and programming flexibility.
Usually a web log is huge. The default allocated memory of JMP (Mac version) may not be able to handle a file with several thousand entries. Therefore, you should increase the memory size of JMP. The procedure is as the following:
- Make sure JMP is not open. If it is open, close it first. Memory cannot be allocated when the application is active.
- Highlight the icon of JMP.
- Select Get Info from File or press Apple-I simultaneously.
- A pop-up window will show you the system info of the application. In the bottom box entitled Memory Requirements, enter 10000 for Preferred size and 6000 for Minimum size. Then close the pop-up window. If you do not have enough RAM in your computer, buy some or steal some from your coworker's computer.
In this write-up I use my File Maker Pro (FMP) web log and JMPas an example. Both FMP and JMP are located in the same hard drive, and therefore I can use JMP to open the log directly. One nice thing about JMP is that the log can be imported without any cleaning effort. I simply open the log without even changing any import options, and surprisingly, the table is as clean as the following. Then you can name the fields accordingly. Once the field is correctly defined and the file is saved, I always can import the user log data properly.
The default imported table is not good enough for analysis, of course. For example, date, month, and year are all in one field, but time and am/pm are separate. Follow the following procedures to extract month from date:
- Double click a blank column to create a new field.
- Name the new field as Month.
- In the dialog box, select numeric as data type or leave it unchanged. If the selected data type does not match the actual data nature, JMP will adjust it for you.
- Define the field width as 2.
- Select Formula for data source.
- Click on OK and the formula box will pop up.
- In the formula box, select the function group Dates.
- Select the function month.
- Select the variable date.
- Select the function Character to date.
- Close the box and the month will be extracted into the new field you just created.
- Use the same method to extract year from date.
Follow the following procedures to concatenate time and am/pm:
- Create a new field named newtime and select the proper data type, data source, and field width.
- In the formula box, select the function Concatenate from Character.
- Select the variables time and ampm, then close the box.
I want to know how many hits are from my own university and how many are from outside. Universities use Class B addresses and therefore the first two chunks of our IP addresses (129.219) are the network addresses and the last two are node addresses. In other words, I can classify the sources of hits by looking at the first two portions of IP numbers. The following procedures are used to extract network addresses from IP numbers:
- Create a new field labelled host and select the proper data type, data source, and field width.
- In the formula box, select the function Substring from the function group Character.
- Select the variable ip and put "7" in length.
- Click on the button constant so that "7" is registered, then click OK.
- Create a new field labeled location and select the proper data type, data source, and field width.
- In the formula box, choose the function If from Conditions.
- In the result box for the first condition at the left, type ASU and click constant.
- For the if statement select the variable host.
- Choose x=y from the function group Comparison.
- Enter 129.219 as the first condition.
- In the result box, for all other conditions (otherwise) type Non-ASU and click constant.
- Close the box.
Now I am ready to conduct analysis. Let's start with something simple. I want to know the number of hits by location (ASU/Non-ASU). To get this result:
- Choose Group/Summary from the pull down menu Tables.
- Select location as Group.
- Select N and % of Total from the pull down menu Stats.
- Click OK. A summary table is presented as the following.
- To see the info in graphics, select Bar/Pie Charts from Graphs.
- To see the hits by location with a break down by month, go back to Tables, in addition to location, put month into Group.
- When a summary table is available, select Bar/Pie Charts from Graphs again.
- In the dialog box put N Rows in Y. It will create bar charts with month as the break down.
To check the number of hits by day, use the following procedures:
- Choose Distribution of Y from Analyze.
- Select date from the variable list
- Click on the button Add
- Click OK
You may be interested in knowing the number of page access of a particular section. For example, if you have a folder name "present," which contains your presentations, you can find out the number of page accesses of "present" and its percentage of total by the following procedures:
- Create a new field called Present. Select Formula.
- In the formula box, choose Contains from Character.
- In the formula choose URL as the variable to be evaluated.
- In the formula, type present as character and press the button constant. The function will check whether the variable URL contains the string "present." If it is true, it will return the value "2," otherwise, "0."
- Choose Group/Summary from Tables.
- Choose Present as Group.
- From Stats choose % of Total.
- A summary table will be presented. Double click the field % of Total and select Fixed Decimal. Enter 2 for decimal positions.
The above summary table tells you that there are 325 page accesses in the section "present," which is 3.31% of the total.
The procedures described here are just for beginners. You will find many other useful features as you become more and more familiar with the software.