Help - Search - Members - Calendar
Full Version: 404 Error logs on site
InsanelyMac Forum > Discuss and Learn > Internet(s), Servers, and Networks
~pcwiz
Hi,

I have my site set up with Joomla CMS, but there is a teensy problem. I have created several static content pages on Joomla (1 page for each of those errors, e.g. 403, 404, 500 errors) and then I used .htaccess to redirect the errors to those static pages. This gives it a more pro look and it lets me check how many times the error pages have been hit.

I check today and I see that the 404 page has had 137,000 something hits, more than any article on my site. I know its been hit that many times, but I have no clue what page people browsed to to get the error. I recently converted to Joomla CMS from static HTML, but I did set up redirects for the old HTML pages to redirect to the new pages, so that can't be the problem. And it couldn't have been a site outage because then the entire site would have gone down, including the 404 pages. I have seen no erratic behaviour in the site and no one has contacted me about dead links or anything, and when I checked, everything seems to be fine.

So, with that said, is there ANY way to check WHICH URL people browsed to that caused the 404 error? That would help figure out if there is a problem. And could it be some sort of automated bot or component on my site that is causing the hits? That seems unlikely to me that a bot is causing it because then every page would have an abnormal amount of hits, not just the 404 page...Also, I do have access to a logs folder with my web host that shows the page access stuff and all, would this be of any use in finding out the pages that are giving 404?

Thanks

P.S. I'll post this on the Joomla forums too...
sarahbau
Actually, every page wouldn't have an abnormal number of hits if it's a bot. When I used to monitor my web server's activity, there were lots of 404s with people (or bots) trying to access /admin, /administration, etc. I think you might be able to enable extra logging options in Apache to show what page they were trying to get to, but I can't remember for sure.

Edit: I just remembered that at one point, I also had a 404 page that was "prettier" than just the standard 404 page. It was a PHP page that would tell them to email the administrator if they thought it was in error, and clicking on the email link would put the page URL in the subject of the email. Anyway, you might be able to come up with a clever way to use php or something to maybe write to a separate log file with the bad URLs.
chris2k
AFAIK Joomla is only an CMS and not a webserver. So, to check access logs, these are generated by your webserver. I assume you're using Apache, so check this link: http://httpd.apache.org/docs/1.3/logs.html

And yes, it's probably some bot causing that many hits. You will find out when digging the logfiles. Most CMS's are vulnerable to something.
~pcwiz
Yeah, I got another 1405 hits in a single day which is completely unreasonable. I'll look through the logs...Any ideas on what specifically I should be looking for?
~pcwiz
OK, well I looked through the logs, and the URL to my error page is: http://pcwizcomputer.com/index.php?option=...=view&id=53 so I found what was accessing it:

CODE
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /mambots/system/jceutilities/js/embed.js HTTP/1.1" 200 2658 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /mambots/system/jceutilities/css/jceutilities.css HTTP/1.1" 200 2128 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /components/com_jomcomment/style.css HTTP/1.1" 200 6424 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /templates/md_macphoria/images/mp_shadow_l_t.png HTTP/1.1" 200 349 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "71.59.173.52"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /images/joomlarss.gif HTTP/1.1" 200 657 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /templates/md_macphoria/css/template_css.css HTTP/1.1" 200 7632 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:08 -0400] "GET /osx86search/ HTTP/1.1" 200 6380 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "71.59.173.52"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/jceutilities/js/jceutilities-150.js HTTP/1.1" 200 15774 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/jceutilities/js/embed.js HTTP/1.1" 200 2658 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/pc_includes/ajax.js HTTP/1.1" 200 6947 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/jceutilities/css/jceutilities.css HTTP/1.1" 200 2128 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /components/com_jomcomment/templates/chatter/comment_style.css HTTP/1.1" 200 4150 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /components/com_jomcomment/style.css HTTP/1.1" 200 6424 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /components/com_jomcomment/script.js? HTTP/1.1" 200 7962 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /templates/md_macphoria/css/template_css.css HTTP/1.1" 200 7632 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/jceutilities/img/blank.gif HTTP/1.1" 200 43 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /templates/md_macphoria/images/mp_shadow_l_t.png HTTP/1.1" 200 349 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/joomlarss.gif HTTP/1.1" 200 657 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /templates/md_macphoria/images/mp_shadow_r_t.png HTTP/1.1" 200 353 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /index.php?option=com_jomcomment&task=userinfo&no_html=1 HTTP/1.1" 200 8193 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /templates/md_macphoria/images/spacer.png HTTP/1.1" 200 218 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /components/com_jomcomment/busy.gif HTTP/1.1" 200 729 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /google-analytics.com/ga.js HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/loading.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/close.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/blank.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/page_go.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/prevlabel.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/nextlabel.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/comments.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/chart_bar.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/comments_alert.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/comments_voteup.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/comments_votedown.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/comment-arrow.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/comment-shadow.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/mp_header.jpg HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /smilies/bbcode_bg.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /smilies/bbcode_front.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/delicious.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/digg.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/furl.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/yahoo_myweb.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/stumbleupon.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/google_bmarks.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/technorati.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/reddit.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/facebook.gif HTTP/1.1" 302 269 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /mambots/system/jceutilities/js/embed.js HTTP/1.1" 200 2658 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /mambots/system/jceutilities/css/jceutilities.css HTTP/1.1" 200 2128 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /components/com_jomcomment/style.css HTTP/1.1" 200 6424 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /templates/md_macphoria/images/mp_shadow_l_t.png HTTP/1.1" 200 349 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "71.59.173.52"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /images/joomlarss.gif HTTP/1.1" 200 657 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /templates/md_macphoria/css/template_css.css HTTP/1.1" 200 7632 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"
IP.HID.DE.N0 - - [05/May/2008:02:41:08 -0400] "GET /osx86search/ HTTP/1.1" 200 6380 pcwizcomputer.com "http://pcwizcomputer.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0"


And that repeats. I have hidden the IP addresses with "IP.HID.DE.N0" to protect privacy
chris2k
Doesn't look like an exploit attempt to me, but I don't know what is causing it either. I checked your site, clicked a few links, got no 404's. Also when I enter random stuff into the URL, the last page shows up again. It doesn't redirect me to your 404 page.

You will have to look at the log entry before the 404 page shows up in the logfile. Maybe thats helps...
~pcwiz
OK, I made some more progress. I found out that the same 2 IP addresses were repeatedly accessing the page:

67.142.130.13
66.249.85.67

I went to the ARIN WHOIS search and I typed in the first one (67.142.130.13):

CODE
Address:    DirecWAY Network Management Center
Address:    attn: Network Security Manager
City:       Germantown
StateProv:  MD
PostalCode: 20876
Country:    US

NetRange:   67.142.0.0 - 67.143.255.255
CIDR:       67.142.0.0/15
NetName:    DIRECPC-1BLK
NetHandle:  NET-67-142-0-0-1
Parent:     NET-67-0-0-0-0
NetType:    Direct Allocation
NameServer: NS1.DIRECPC.COM
NameServer: NS2.DIRECPC.COM
Comment:    
RegDate:    2003-12-12
Updated:    2004-03-04

OrgTechHandle: NSM5-ARIN
OrgTechName:   Network Security Manager
OrgTechPhone:  +1-301-601-7205
OrgTechEmail:  abuse@hughes.net


Not much I can recognize there, leave that alone for a sec. So I typed in the second IP (66.249.85.67) and here is the result:

CODE
OrgName:    Google Inc.
OrgID:      GOGL
Address:    1600 Amphitheatre Parkway
City:       Mountain View
StateProv:  CA
PostalCode: 94043
Country:    US

NetRange:   66.249.64.0 - 66.249.95.255
CIDR:       66.249.64.0/19
NetName:    GOOGLE
NetHandle:  NET-66-249-64-0-1
Parent:     NET-66-0-0-0-0
NetType:    Direct Allocation
NameServer: NS1.GOOGLE.COM
NameServer: NS2.GOOGLE.COM
NameServer: NS3.GOOGLE.COM
NameServer: NS4.GOOGLE.COM
Comment:    
RegDate:    2004-03-05
Updated:    2007-04-10

OrgTechHandle: ZG39-ARIN
OrgTechName:   Google Inc.
OrgTechPhone:  +1-650-318-0200
OrgTechEmail:  arin-contact@google.com


Lookie who it is. Google. I guess the Google search bot has been hammering my 404 page, I've heard about using robots.txt to prevent Google from indexing it but would that be effective in this case?

Thanks!
slim2001
When i click on a link on the front page it takes me here http://www.insanelymac.com/{ipb.script_url}showtopic=102313
sarahbau
Using robots.txt stopped Google from trying to go through my php calendar page (it was just following the 'next week' links over and over, which of course never ends).
~pcwiz
QUOTE(slim2001 @ May 5 2008, 03:01 PM) *
When i click on a link on the front page it takes me here http://www.insanelymac.com/{ipb.script_url}showtopic=102313


What do you mean? Clicking a link on my site takes you to insanelymac?

sarahbau,

How would I use robots.txt to protect only 1 page in Joomla?
sarahbau
QUOTE(~pcwiz @ May 5 2008, 06:29 PM) *
What do you mean? Clicking a link on my site takes you to insanelymac?

sarahbau,

How would I use robots.txt to protect only 1 page in Joomla?

I don't know anything about Joomla, but you should just be able to create the file in your server's root. Here's how mine looks:
CODE
User-agent: *
Disallow: /calendar/


That basically just blocks any bot from looking at anything in /calendar/
~pcwiz
Hm..yeah I figured that much, but the Joomla URLs are dynamic so everything comes from index.php, and if I put robots.txt it would apply it for the whole site. Anyway, I found some info about robots.txt for Joomla and I'll see what I can do.

EDIT: I think I have found the problem. There was a robots.txt file that was supposed to be installed with Joomla that wasn't installed on my setup. I've now uploaded the robots.txt file, hope this solves the problem and I will report back on progress smile.gif
~pcwiz
OK well I sorta had a spark of genius. There is this feature in Google Analytics that will track 404 errors and tell you what page people visited to get the error--exactly what I need. So what I did was I set up a simple HTML 404 page and inserted the 404 tracking code and set my .htaccess file to direct to the HTML tracker page instead of the standard Joomla page. This is just temporary, to see whats causing the errors.

There is no data in my Analytics panel yet for the 404, but Google says that it takes 24 hours to update so it should be updated sometime today or tomorrow. I'll see how it goes wink.gif
inimicus
Well, if you didn't have a robots.txt file on your server, then the bots would get a 404 every time they tried to access it.

If you are redirecting users to your 404 page with .htaccess, you're not going to get much useable results. Since Google Analytics code is JS (client-side), it's never going to see the input URL as that's handled server-side pre-redirect. All the client is going to see is the referring page to that 404 page. You'll find a lot of valid pages, but probably not many (or any) actual input 404 addresses.

But GA is pretty tricky. I could be wrong...
~pcwiz
I have the robots.txt file now, but after I put it in the 404 page was still getting tons of hits. I think you may be right about the Analytics, because the 404 data is not appearing. Do you know of any open source website stats tools (ones that actually reside on your server, like TraceWatch or phpMyVisites) that can track 404 errors?
inimicus
I don't mess with 404s. It just means more bandwidth and more fuss. So sorry, I dunno of anything. I just print, "not found, mang." when a 404 occurs.


However, you can try some code on your 404 page that might pull the redirect information.

Let's say http://www.pcwizcomputer.com/index.php?opt...9&Itemid=45 is a 404.

CODE
$ref = explode('/', $_SERVER['HTTP_REFERER']);


That will put the referencing URL into an array divided by slashes. The name of the page will be in the last item of the array.

So http://www.pcwizcomputer.com/index.php?opt...9&Itemid=45 becomes...

$ref => Array
[0] => http:
[1] => www.pcwizcomputer.com
[2] => index.php?option=com_content&task=view&id=69&Itemid=45


Then you can pull the page name with...

CODE
$name = $ref[count($ref)-1];


And now you know that index.php?option=com_content&task=view&id=69&Itemid=45 is creating a 404.


You could then write the names to a table in your database for review later.


I haven't thoroughly tested this, so it's all speculation. But it's server-side, just as you need.
(MoC)
This could give you some insight...
~pcwiz
MoC,

Thanks, I was looking at that earlier but the software is commercial sad.gif Thanks for the link though smile.gif

inimicus,

I implemented a tracker script on my 404 page very similar to that one, except it emails me the info. So I did find out some bots that were causing it. Here are some of the notifications I got:

CODE
Requested Page: /MarkAny/Websafer/MaSiteInfo.ini
Referred By: Unknown
Remote Addr: 59.10.114.5 ()
Request URI: /MarkAny/Websafer/MaSiteInfo.ini


For this one I have not even heard of anything called "/MarkAny/Websafer/MaSiteInfo.ini". I did an IP Lookup on it and it belongs to the "Korea Telecom Network Management Center"

CODE
Requested Page: /components/com_jo
Referred By: http://pcwizcomputer.com/index.php?option=com_content&task=view&id=76&Itemid=48
Remote Addr: 124.120.143.208 ()
Request URI: /components/com_jo


The components directory is set to disallow access by robots.txt, and com_jo is not the full folder path. The IP belongs to "True Internet" in China, sounds like some kind of ISP.

CODE
Requested Page: /mp_shadow_l_t.png
Referred By: http://pcwizcomputer.com/index.php?option=com_content&task=view&id=76&Itemid=48
Remote Addr: 84.199.23.235 ()
Request URI: /mp_shadow_l_t.png


The IP belongs to "Telenet operaties N.V." in Belgium, the image it requested does exist but it is in a different directory. I checked the referring address and everything is fine there.

CODE
Requested Page: /arrow.png
Referred By: http://pcwizcomputer.com/index.php?option=com_content&task=view&id=76&Itemid=48
Remote Addr: 84.199.23.235 ()
Request URI: /arrow.png


Again, image exists but its not in the root directory as requested.

CODE
Requested Page: /mambots/system/jceutilities/js/}}}A(this.number).html(E);A(
Referred By: Unknown
Remote Addr: 87.210.64.10 ()
Request URI: /mambots/system/jceutilities/js/}}}A(this.number).html(E);A(


I dunno why it was asking for that, IP belongs to "Versatel Consumer ISP" in Netherlands.

Anyway, all of these to me look like bot requests and not legit 404s, they seem to be normal. But see the thing is, before with the Joomla static 404 page I was getting like 10 hits a minute on the 404, but with the custom script I got like 10 emails in 24 hours. Something weird going on, but it seems that it is not affecting human users in any way so thats good but I would still like to find out the reason for this. Another thing, I am using a 30 day demo of an app called SortSite that does one click site analysis for 404 and other errors, I'll post up the results here.

EDIT: I did the test and no errors or broken links found smile.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.