Jump to content

404 Error logs on site


  • Please log in to reply
18 replies to this topic

#1
~pcwiz

~pcwiz

    InsanelyMac V.I.P.

  • Retired
  • 5,900 posts
  • Gender:Male
Hi,

I have my site set up with Joomla CMS, but there is a teensy problem. I have created several static content pages on Joomla (1 page for each of those errors, e.g. 403, 404, 500 errors) and then I used .htaccess to redirect the errors to those static pages. This gives it a more pro look and it lets me check how many times the error pages have been hit.

I check today and I see that the 404 page has had 137,000 something hits, more than any article on my site. I know its been hit that many times, but I have no clue what page people browsed to to get the error. I recently converted to Joomla CMS from static HTML, but I did set up redirects for the old HTML pages to redirect to the new pages, so that can't be the problem. And it couldn't have been a site outage because then the entire site would have gone down, including the 404 pages. I have seen no erratic behaviour in the site and no one has contacted me about dead links or anything, and when I checked, everything seems to be fine.

So, with that said, is there ANY way to check WHICH URL people browsed to that caused the 404 error? That would help figure out if there is a problem. And could it be some sort of automated bot or component on my site that is causing the hits? That seems unlikely to me that a bot is causing it because then every page would have an abnormal amount of hits, not just the 404 page...Also, I do have access to a logs folder with my web host that shows the page access stuff and all, would this be of any use in finding out the pages that are giving 404?

Thanks

P.S. I'll post this on the Joomla forums too...

#2
sarahbau

sarahbau

    InsanelyMac Legend

  • Members
  • PipPipPipPipPipPipPip
  • 903 posts
  • Gender:Female
  • Location:Raleigh, NC
Actually, every page wouldn't have an abnormal number of hits if it's a bot. When I used to monitor my web server's activity, there were lots of 404s with people (or bots) trying to access /admin, /administration, etc. I think you might be able to enable extra logging options in Apache to show what page they were trying to get to, but I can't remember for sure.

Edit: I just remembered that at one point, I also had a 404 page that was "prettier" than just the standard 404 page. It was a PHP page that would tell them to email the administrator if they thought it was in error, and clicking on the email link would put the page URL in the subject of the email. Anyway, you might be able to come up with a clever way to use php or something to maybe write to a separate log file with the bad URLs.

#3
chris2k

chris2k

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 147 posts
AFAIK Joomla is only an CMS and not a webserver. So, to check access logs, these are generated by your webserver. I assume you're using Apache, so check this link: http://httpd.apache....s/1.3/logs.html

And yes, it's probably some bot causing that many hits. You will find out when digging the logfiles. Most CMS's are vulnerable to something.

#4
~pcwiz

~pcwiz

    InsanelyMac V.I.P.

  • Retired
  • 5,900 posts
  • Gender:Male
Yeah, I got another 1405 hits in a single day which is completely unreasonable. I'll look through the logs...Any ideas on what specifically I should be looking for?

#5
~pcwiz

~pcwiz

    InsanelyMac V.I.P.

  • Retired
  • 5,900 posts
  • Gender:Male
OK, well I looked through the logs, and the URL to my error page is: http://######.com/in...=...=view&id=53 so I found what was accessing it:

IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /mambots/system/jceutilities/js/embed.js HTTP/1.1" 200 2658 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /mambots/system/jceutilities/css/jceutilities.css HTTP/1.1" 200 2128 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /components/com_jomcomment/style.css HTTP/1.1" 200 6424 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /templates/md_macphoria/images/mp_shadow_l_t.png HTTP/1.1" 200 349 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "71.59.173.52"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /images/joomlarss.gif HTTP/1.1" 200 657 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /templates/md_macphoria/css/template_css.css HTTP/1.1" 200 7632 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:08 -0400] "GET /osx86search/ HTTP/1.1" 200 6380 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "71.59.173.52"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/jceutilities/js/jceutilities-150.js HTTP/1.1" 200 15774 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/jceutilities/js/embed.js HTTP/1.1" 200 2658 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/pc_includes/ajax.js HTTP/1.1" 200 6947 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/jceutilities/css/jceutilities.css HTTP/1.1" 200 2128 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /components/com_jomcomment/templates/chatter/comment_style.css HTTP/1.1" 200 4150 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /components/com_jomcomment/style.css HTTP/1.1" 200 6424 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /components/com_jomcomment/script.js? HTTP/1.1" 200 7962 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /templates/md_macphoria/css/template_css.css HTTP/1.1" 200 7632 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/jceutilities/img/blank.gif HTTP/1.1" 200 43 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /templates/md_macphoria/images/mp_shadow_l_t.png HTTP/1.1" 200 349 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/joomlarss.gif HTTP/1.1" 200 657 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /templates/md_macphoria/images/mp_shadow_r_t.png HTTP/1.1" 200 353 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /index.php?option=com_jomcomment&task=userinfo&no_html=1 HTTP/1.1" 200 8193 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /templates/md_macphoria/images/spacer.png HTTP/1.1" 200 218 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /components/com_jomcomment/busy.gif HTTP/1.1" 200 729 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /google-analytics.com/ga.js HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/loading.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/close.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/blank.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/page_go.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/prevlabel.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/nextlabel.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/comments.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/chart_bar.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/comments_alert.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/comments_voteup.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/comments_votedown.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/comment-arrow.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/comment-shadow.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/mp_header.jpg HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /smilies/bbcode_bg.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /smilies/bbcode_front.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/delicious.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/digg.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/furl.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/yahoo_myweb.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/stumbleupon.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/google_bmarks.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/technorati.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/reddit.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/facebook.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /mambots/system/jceutilities/js/embed.js HTTP/1.1" 200 2658 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /mambots/system/jceutilities/css/jceutilities.css HTTP/1.1" 200 2128 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /components/com_jomcomment/style.css HTTP/1.1" 200 6424 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /templates/md_macphoria/images/mp_shadow_l_t.png HTTP/1.1" 200 349 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "71.59.173.52"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /images/joomlarss.gif HTTP/1.1" 200 657 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /templates/md_macphoria/css/template_css.css HTTP/1.1" 200 7632 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:08 -0400] "GET /osx86search/ HTTP/1.1" 200 6380 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"

And that repeats. I have hidden the IP addresses with "IP.HID.DE.N0" to protect privacy

#6
chris2k

chris2k

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 147 posts
Doesn't look like an exploit attempt to me, but I don't know what is causing it either. I checked your site, clicked a few links, got no 404's. Also when I enter random stuff into the URL, the last page shows up again. It doesn't redirect me to your 404 page.

You will have to look at the log entry before the 404 page shows up in the logfile. Maybe thats helps...

#7
~pcwiz

~pcwiz

    InsanelyMac V.I.P.

  • Retired
  • 5,900 posts
  • Gender:Male
OK, I made some more progress. I found out that the same 2 IP addresses were repeatedly accessing the page:

67.142.130.13
66.249.85.67

I went to the ARIN WHOIS search and I typed in the first one (67.142.130.13):

Address:	DirecWAY Network Management Center
Address:	attn: Network Security Manager
City:	   Germantown
StateProv:  MD
PostalCode: 20876
Country:	US

NetRange:   67.142.0.0 - 67.143.255.255 
CIDR:	   67.142.0.0/15 
NetName:	DIRECPC-1BLK
NetHandle:  NET-67-142-0-0-1
Parent:	 NET-67-0-0-0-0
NetType:	Direct Allocation
NameServer: NS1.DIRECPC.COM
NameServer: NS2.DIRECPC.COM
Comment:	
RegDate:	2003-12-12
Updated:	2004-03-04

OrgTechHandle: NSM5-ARIN
OrgTechName:   Network Security Manager 
OrgTechPhone:  +1-301-601-7205
OrgTechEmail:  abuse@hughes.net

Not much I can recognize there, leave that alone for a sec. So I typed in the second IP (66.249.85.67) and here is the result:

OrgName:	Google Inc. 
OrgID:	  GOGL
Address:	1600 Amphitheatre Parkway
City:	   Mountain View
StateProv:  CA
PostalCode: 94043
Country:	US

NetRange:   66.249.64.0 - 66.249.95.255 
CIDR:	   66.249.64.0/19 
NetName:	GOOGLE
NetHandle:  NET-66-249-64-0-1
Parent:	 NET-66-0-0-0-0
NetType:	Direct Allocation
NameServer: NS1.GOOGLE.COM
NameServer: NS2.GOOGLE.COM
NameServer: NS3.GOOGLE.COM
NameServer: NS4.GOOGLE.COM
Comment:	
RegDate:	2004-03-05
Updated:	2007-04-10

OrgTechHandle: ZG39-ARIN
OrgTechName:   Google Inc. 
OrgTechPhone:  +1-650-318-0200
OrgTechEmail:  arin-contact@google.com

Lookie who it is. Google. I guess the Google search bot has been hammering my 404 page, I've heard about using robots.txt to prevent Google from indexing it but would that be effective in this case?

Thanks!

#8
slim2001

slim2001

    InsanelyMac Protégé

  • Members
  • PipPip
  • 50 posts
  • Gender:Male
When i click on a link on the front page it takes me here http://www.insanelym...howtopic=102313

#9
sarahbau

sarahbau

    InsanelyMac Legend

  • Members
  • PipPipPipPipPipPipPip
  • 903 posts
  • Gender:Female
  • Location:Raleigh, NC
Using robots.txt stopped Google from trying to go through my php calendar page (it was just following the 'next week' links over and over, which of course never ends).

#10
~pcwiz

~pcwiz

    InsanelyMac V.I.P.

  • Retired
  • 5,900 posts
  • Gender:Male

When i click on a link on the front page it takes me here http://www.insanelym...howtopic=102313


What do you mean? Clicking a link on my site takes you to insanelymac?

sarahbau,

How would I use robots.txt to protect only 1 page in Joomla?

#11
sarahbau

sarahbau

    InsanelyMac Legend

  • Members
  • PipPipPipPipPipPipPip
  • 903 posts
  • Gender:Female
  • Location:Raleigh, NC

What do you mean? Clicking a link on my site takes you to insanelymac?

sarahbau,

How would I use robots.txt to protect only 1 page in Joomla?

I don't know anything about Joomla, but you should just be able to create the file in your server's root. Here's how mine looks:
User-agent: *
Disallow: /calendar/

That basically just blocks any bot from looking at anything in /calendar/

#12
~pcwiz

~pcwiz

    InsanelyMac V.I.P.

  • Retired
  • 5,900 posts
  • Gender:Male
Hm..yeah I figured that much, but the Joomla URLs are dynamic so everything comes from index.php, and if I put robots.txt it would apply it for the whole site. Anyway, I found some info about robots.txt for Joomla and I'll see what I can do.

EDIT: I think I have found the problem. There was a robots.txt file that was supposed to be installed with Joomla that wasn't installed on my setup. I've now uploaded the robots.txt file, hope this solves the problem and I will report back on progress :thumbsup_anim:

#13
~pcwiz

~pcwiz

    InsanelyMac V.I.P.

  • Retired
  • 5,900 posts
  • Gender:Male
OK well I sorta had a spark of genius. There is this feature in Google Analytics that will track 404 errors and tell you what page people visited to get the error--exactly what I need. So what I did was I set up a simple HTML 404 page and inserted the 404 tracking code and set my .htaccess file to direct to the HTML tracker page instead of the standard Joomla page. This is just temporary, to see whats causing the errors.

There is no data in my Analytics panel yet for the 404, but Google says that it takes 24 hours to update so it should be updated sometime today or tomorrow. I'll see how it goes :)

#14
inimicus

inimicus

    Drunk and Angry Slurs in 31 Flavors

  • Members
  • PipPipPipPipPipPip
  • 473 posts
  • Gender:Male
  • Location:Sacramento, CA
Well, if you didn't have a robots.txt file on your server, then the bots would get a 404 every time they tried to access it.

If you are redirecting users to your 404 page with .htaccess, you're not going to get much useable results. Since Google Analytics code is JS (client-side), it's never going to see the input URL as that's handled server-side pre-redirect. All the client is going to see is the referring page to that 404 page. You'll find a lot of valid pages, but probably not many (or any) actual input 404 addresses.

But GA is pretty tricky. I could be wrong...

#15
~pcwiz

~pcwiz

    InsanelyMac V.I.P.

  • Retired
  • 5,900 posts
  • Gender:Male
I have the robots.txt file now, but after I put it in the 404 page was still getting tons of hits. I think you may be right about the Analytics, because the 404 data is not appearing. Do you know of any open source website stats tools (ones that actually reside on your server, like TraceWatch or phpMyVisites) that can track 404 errors?

#16
inimicus

inimicus

    Drunk and Angry Slurs in 31 Flavors

  • Members
  • PipPipPipPipPipPip
  • 473 posts
  • Gender:Male
  • Location:Sacramento, CA
I don't mess with 404s. It just means more bandwidth and more fuss. So sorry, I dunno of anything. I just print, "not found, mang." when a 404 occurs.


However, you can try some code on your 404 page that might pull the redirect information.

Let's say http://www.######.co...t...9&Itemid=45 is a 404.

$ref = explode('/', $_SERVER['HTTP_REFERER']);

That will put the referencing URL into an array divided by slashes. The name of the page will be in the last item of the array.

So http://www.######.co...t...9&Itemid=45 becomes...

$ref => Array
[0] => http:
[1] => www.######.com
[2] => index.php?option=com_content&task=view&id=69&Itemid=45


Then you can pull the page name with...

$name = $ref[count($ref)-1];

And now you know that index.php?option=com_content&task=view&id=69&Itemid=45 is creating a 404.


You could then write the names to a table in your database for review later.


I haven't thoroughly tested this, so it's all speculation. But it's server-side, just as you need.

#17
(MoC)

(MoC)

    InsanelyMacaholic

  • Members
  • PipPipPipPipPipPipPipPipPipPipPip
  • 2,653 posts
  • Gender:Male
  • Location:/dev/moc
  • Interests:Plenty.
This could give you some insight...

#18
~pcwiz

~pcwiz

    InsanelyMac V.I.P.

  • Retired
  • 5,900 posts
  • Gender:Male
MoC,

Thanks, I was looking at that earlier but the software is commercial :) Thanks for the link though :unsure:

inimicus,

I implemented a tracker script on my 404 page very similar to that one, except it emails me the info. So I did find out some bots that were causing it. Here are some of the notifications I got:

Requested Page: /MarkAny/Websafer/MaSiteInfo.ini
Referred By: Unknown
Remote Addr: 59.10.114.5 ()
Request URI: /MarkAny/Websafer/MaSiteInfo.ini

For this one I have not even heard of anything called "/MarkAny/Websafer/MaSiteInfo.ini". I did an IP Lookup on it and it belongs to the "Korea Telecom Network Management Center"

Requested Page: /components/com_jo
Referred By: http://######.com/index.php?option=com_content&task=view&id=76&Itemid=48
Remote Addr: 124.120.143.208 ()
Request URI: /components/com_jo

The components directory is set to disallow access by robots.txt, and com_jo is not the full folder path. The IP belongs to "True Internet" in China, sounds like some kind of ISP.

Requested Page: /mp_shadow_l_t.png
Referred By: http://######.com/index.php?option=com_content&task=view&id=76&Itemid=48
Remote Addr: 84.199.23.235 ()
Request URI: /mp_shadow_l_t.png

The IP belongs to "Telenet operaties N.V." in Belgium, the image it requested does exist but it is in a different directory. I checked the referring address and everything is fine there.

Requested Page: /arrow.png
Referred By: http://######.com/index.php?option=com_content&task=view&id=76&Itemid=48
Remote Addr: 84.199.23.235 ()
Request URI: /arrow.png

Again, image exists but its not in the root directory as requested.

Requested Page: /mambots/system/jceutilities/js/}}}A(this.number).html(E);A(
Referred By: Unknown
Remote Addr: 87.210.64.10 ()
Request URI: /mambots/system/jceutilities/js/}}}A(this.number).html(E);A(

I dunno why it was asking for that, IP belongs to "Versatel Consumer ISP" in Netherlands.

Anyway, all of these to me look like bot requests and not legit 404s, they seem to be normal. But see the thing is, before with the Joomla static 404 page I was getting like 10 hits a minute on the 404, but with the custom script I got like 10 emails in 24 hours. Something weird going on, but it seems that it is not affecting human users in any way so thats good but I would still like to find out the reason for this. Another thing, I am using a 30 day demo of an app called SortSite that does one click site analysis for 404 and other errors, I'll post up the results here.

EDIT: I did the test and no errors or broken links found ;)

#19
thomblake

thomblake

    InsanelyMac Protégé

  • Just Joined
  • Pip
  • 1 posts

OK, I made some more progress. I found out that the same 2 IP addresses were repeatedly accessing the page:

67.142.130.13
66.249.85.67

I went to the ARIN WHOIS search and I typed in the first one (67.142.130.13):

Address:	DirecWAY Network Management Center
Address:	attn: Network Security Manager
City:	   Germantown
StateProv:  MD
PostalCode: 20876
Country:	US

NetRange:   67.142.0.0 - 67.143.255.255 
CIDR:	   67.142.0.0/15 
NetName:	DIRECPC-1BLK
NetHandle:  NET-67-142-0-0-1
Parent:	 NET-67-0-0-0-0
NetType:	Direct Allocation
NameServer: NS1.DIRECPC.COM
NameServer: NS2.DIRECPC.COM
Comment:	
RegDate:	2003-12-12
Updated:	2004-03-04

OrgTechHandle: NSM5-ARIN
OrgTechName:   Network Security Manager 
OrgTechPhone:  +1-301-601-7205
OrgTechEmail:  abuse@hughes.net


Regarding this one, it's a common but often mysterious problem with Hughes. I posted a description of it on my blog: http://thomblake.com.../05/hughes-isp/





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

© 2014 InsanelyMac  |   News  |   Forum  |   Downloads  |   OSx86 Wiki  |   Mac Netbook  |   PHP hosting by CatN  |   Designed by Ed Gain  |   Logo by irfan  |   Privacy Policy