~pcwiz Posted May 5, 2008 Share Posted May 5, 2008 Hi, I have my site set up with Joomla CMS, but there is a teensy problem. I have created several static content pages on Joomla (1 page for each of those errors, e.g. 403, 404, 500 errors) and then I used .htaccess to redirect the errors to those static pages. This gives it a more pro look and it lets me check how many times the error pages have been hit. I check today and I see that the 404 page has had 137,000 something hits, more than any article on my site. I know its been hit that many times, but I have no clue what page people browsed to to get the error. I recently converted to Joomla CMS from static HTML, but I did set up redirects for the old HTML pages to redirect to the new pages, so that can't be the problem. And it couldn't have been a site outage because then the entire site would have gone down, including the 404 pages. I have seen no erratic behaviour in the site and no one has contacted me about dead links or anything, and when I checked, everything seems to be fine. So, with that said, is there ANY way to check WHICH URL people browsed to that caused the 404 error? That would help figure out if there is a problem. And could it be some sort of automated bot or component on my site that is causing the hits? That seems unlikely to me that a bot is causing it because then every page would have an abnormal amount of hits, not just the 404 page...Also, I do have access to a logs folder with my web host that shows the page access stuff and all, would this be of any use in finding out the pages that are giving 404? Thanks P.S. I'll post this on the Joomla forums too... Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/ Share on other sites More sharing options...
sarahbau Posted May 5, 2008 Share Posted May 5, 2008 Actually, every page wouldn't have an abnormal number of hits if it's a bot. When I used to monitor my web server's activity, there were lots of 404s with people (or bots) trying to access /admin, /administration, etc. I think you might be able to enable extra logging options in Apache to show what page they were trying to get to, but I can't remember for sure. Edit: I just remembered that at one point, I also had a 404 page that was "prettier" than just the standard 404 page. It was a PHP page that would tell them to email the administrator if they thought it was in error, and clicking on the email link would put the page URL in the subject of the email. Anyway, you might be able to come up with a clever way to use php or something to maybe write to a separate log file with the bad URLs. Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-735931 Share on other sites More sharing options...
chris2k Posted May 5, 2008 Share Posted May 5, 2008 AFAIK Joomla is only an CMS and not a webserver. So, to check access logs, these are generated by your webserver. I assume you're using Apache, so check this link: http://httpd.apache.org/docs/1.3/logs.html And yes, it's probably some bot causing that many hits. You will find out when digging the logfiles. Most CMS's are vulnerable to something. Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-736329 Share on other sites More sharing options...
~pcwiz Posted May 5, 2008 Author Share Posted May 5, 2008 Yeah, I got another 1405 hits in a single day which is completely unreasonable. I'll look through the logs...Any ideas on what specifically I should be looking for? Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-736488 Share on other sites More sharing options...
~pcwiz Posted May 5, 2008 Author Share Posted May 5, 2008 OK, well I looked through the logs, and the URL to my error page is: http://######.com/index.php?option=...=view&id=53 so I found what was accessing it: IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /mambots/system/jceutilities/js/embed.js HTTP/1.1" 200 2658 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0" IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /mambots/system/jceutilities/css/jceutilities.css HTTP/1.1" 200 2128 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0" IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /components/com_jomcomment/style.css HTTP/1.1" 200 6424 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0" IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /templates/md_macphoria/images/mp_shadow_l_t.png HTTP/1.1" 200 349 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "71.59.173.52" IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /images/joomlarss.gif HTTP/1.1" 200 657 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0" IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /templates/md_macphoria/css/template_css.css HTTP/1.1" 200 7632 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0" IP.HID.DE.N0 - - [05/May/2008:02:41:08 -0400] "GET /osx86search/ HTTP/1.1" 200 6380 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "71.59.173.52" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/jceutilities/js/jceutilities-150.js HTTP/1.1" 200 15774 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/jceutilities/js/embed.js HTTP/1.1" 200 2658 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/pc_includes/ajax.js HTTP/1.1" 200 6947 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/jceutilities/css/jceutilities.css HTTP/1.1" 200 2128 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /components/com_jomcomment/templates/chatter/comment_style.css HTTP/1.1" 200 4150 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /components/com_jomcomment/style.css HTTP/1.1" 200 6424 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /components/com_jomcomment/script.js? HTTP/1.1" 200 7962 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /templates/md_macphoria/css/template_css.css HTTP/1.1" 200 7632 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /mambots/system/jceutilities/img/blank.gif HTTP/1.1" 200 43 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /templates/md_macphoria/images/mp_shadow_l_t.png HTTP/1.1" 200 349 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/joomlarss.gif HTTP/1.1" 200 657 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /templates/md_macphoria/images/mp_shadow_r_t.png HTTP/1.1" 200 353 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /index.php?option=com_jomcomment&task=userinfo&no_html=1 HTTP/1.1" 200 8193 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /templates/md_macphoria/images/spacer.png HTTP/1.1" 200 218 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /components/com_jomcomment/busy.gif HTTP/1.1" 200 729 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /google-analytics.com/ga.js HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/loading.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/close.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/blank.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/page_go.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/prevlabel.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /img/nextlabel.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/comments.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/chart_bar.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/comments_alert.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/comments_voteup.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:23 -0400] "GET /images/comments_votedown.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/comment-arrow.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/comment-shadow.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/mp_header.jpg HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /smilies/bbcode_bg.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /smilies/bbcode_front.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/delicious.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/digg.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/furl.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/yahoo_myweb.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/stumbleupon.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/google_bmarks.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/technorati.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/reddit.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:03:22:24 -0400] "GET /images/bookmarks/facebook.gif HTTP/1.1" 302 269 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "CFNetwork/129.22" "-" IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /mambots/system/jceutilities/js/embed.js HTTP/1.1" 200 2658 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0" IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /mambots/system/jceutilities/css/jceutilities.css HTTP/1.1" 200 2128 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0" IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /components/com_jomcomment/style.css HTTP/1.1" 200 6424 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0" IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /templates/md_macphoria/images/mp_shadow_l_t.png HTTP/1.1" 200 349 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "71.59.173.52" IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /images/joomlarss.gif HTTP/1.1" 200 657 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0" IP.HID.DE.N0 - - [05/May/2008:02:41:07 -0400] "GET /templates/md_macphoria/css/template_css.css HTTP/1.1" 200 7632 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0" IP.HID.DE.N0 - - [05/May/2008:02:41:08 -0400] "GET /osx86search/ HTTP/1.1" 200 6380 ######.com "http://######.com/index.php?option=com_content&task=view&id=53" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" "IP.HID.DE.N0" And that repeats. I have hidden the IP addresses with "IP.HID.DE.N0" to protect privacy Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-736639 Share on other sites More sharing options...
chris2k Posted May 5, 2008 Share Posted May 5, 2008 Doesn't look like an exploit attempt to me, but I don't know what is causing it either. I checked your site, clicked a few links, got no 404's. Also when I enter random stuff into the URL, the last page shows up again. It doesn't redirect me to your 404 page. You will have to look at the log entry before the 404 page shows up in the logfile. Maybe thats helps... Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-736704 Share on other sites More sharing options...
~pcwiz Posted May 5, 2008 Author Share Posted May 5, 2008 OK, I made some more progress. I found out that the same 2 IP addresses were repeatedly accessing the page: 67.142.130.13 66.249.85.67 I went to the ARIN WHOIS search and I typed in the first one (67.142.130.13): Address: DirecWAY Network Management Center Address: attn: Network Security Manager City: Germantown StateProv: MD PostalCode: 20876 Country: US NetRange: 67.142.0.0 - 67.143.255.255 CIDR: 67.142.0.0/15 NetName: DIRECPC-1BLK NetHandle: NET-67-142-0-0-1 Parent: NET-67-0-0-0-0 NetType: Direct Allocation NameServer: NS1.DIRECPC.COM NameServer: NS2.DIRECPC.COM Comment: RegDate: 2003-12-12 Updated: 2004-03-04 OrgTechHandle: NSM5-ARIN OrgTechName: Network Security Manager OrgTechPhone: +1-301-601-7205 OrgTechEmail: abuse@hughes.net Not much I can recognize there, leave that alone for a sec. So I typed in the second IP (66.249.85.67) and here is the result: OrgName: Google Inc. OrgID: GOGL Address: 1600 Amphitheatre Parkway City: Mountain View StateProv: CA PostalCode: 94043 Country: US NetRange: 66.249.64.0 - 66.249.95.255 CIDR: 66.249.64.0/19 NetName: GOOGLE NetHandle: NET-66-249-64-0-1 Parent: NET-66-0-0-0-0 NetType: Direct Allocation NameServer: NS1.GOOGLE.COM NameServer: NS2.GOOGLE.COM NameServer: NS3.GOOGLE.COM NameServer: NS4.GOOGLE.COM Comment: RegDate: 2004-03-05 Updated: 2007-04-10 OrgTechHandle: ZG39-ARIN OrgTechName: Google Inc. OrgTechPhone: +1-650-318-0200 OrgTechEmail: arin-contact@google.com Lookie who it is. Google. I guess the Google search bot has been hammering my 404 page, I've heard about using robots.txt to prevent Google from indexing it but would that be effective in this case? Thanks! Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-736710 Share on other sites More sharing options...
slim2001 Posted May 5, 2008 Share Posted May 5, 2008 When i click on a link on the front page it takes me here http://www.insanelymac.com/{ipb.script_url}showtopic=102313 Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-736713 Share on other sites More sharing options...
sarahbau Posted May 5, 2008 Share Posted May 5, 2008 Using robots.txt stopped Google from trying to go through my php calendar page (it was just following the 'next week' links over and over, which of course never ends). Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-736720 Share on other sites More sharing options...
~pcwiz Posted May 5, 2008 Author Share Posted May 5, 2008 When i click on a link on the front page it takes me here http://www.insanelymac.com/{ipb.script_url}showtopic=102313 What do you mean? Clicking a link on my site takes you to insanelymac? sarahbau, How would I use robots.txt to protect only 1 page in Joomla? Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-736747 Share on other sites More sharing options...
sarahbau Posted May 5, 2008 Share Posted May 5, 2008 What do you mean? Clicking a link on my site takes you to insanelymac? sarahbau, How would I use robots.txt to protect only 1 page in Joomla? I don't know anything about Joomla, but you should just be able to create the file in your server's root. Here's how mine looks: User-agent: * Disallow: /calendar/ That basically just blocks any bot from looking at anything in /calendar/ Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-736789 Share on other sites More sharing options...
~pcwiz Posted May 6, 2008 Author Share Posted May 6, 2008 Hm..yeah I figured that much, but the Joomla URLs are dynamic so everything comes from index.php, and if I put robots.txt it would apply it for the whole site. Anyway, I found some info about robots.txt for Joomla and I'll see what I can do. EDIT: I think I have found the problem. There was a robots.txt file that was supposed to be installed with Joomla that wasn't installed on my setup. I've now uploaded the robots.txt file, hope this solves the problem and I will report back on progress Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-736828 Share on other sites More sharing options...
~pcwiz Posted May 6, 2008 Author Share Posted May 6, 2008 OK well I sorta had a spark of genius. There is this feature in Google Analytics that will track 404 errors and tell you what page people visited to get the error--exactly what I need. So what I did was I set up a simple HTML 404 page and inserted the 404 tracking code and set my .htaccess file to direct to the HTML tracker page instead of the standard Joomla page. This is just temporary, to see whats causing the errors. There is no data in my Analytics panel yet for the 404, but Google says that it takes 24 hours to update so it should be updated sometime today or tomorrow. I'll see how it goes Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-737624 Share on other sites More sharing options...
inimicus Posted May 6, 2008 Share Posted May 6, 2008 Well, if you didn't have a robots.txt file on your server, then the bots would get a 404 every time they tried to access it. If you are redirecting users to your 404 page with .htaccess, you're not going to get much useable results. Since Google Analytics code is JS (client-side), it's never going to see the input URL as that's handled server-side pre-redirect. All the client is going to see is the referring page to that 404 page. You'll find a lot of valid pages, but probably not many (or any) actual input 404 addresses. But GA is pretty tricky. I could be wrong... Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-737714 Share on other sites More sharing options...
~pcwiz Posted May 6, 2008 Author Share Posted May 6, 2008 I have the robots.txt file now, but after I put it in the 404 page was still getting tons of hits. I think you may be right about the Analytics, because the 404 data is not appearing. Do you know of any open source website stats tools (ones that actually reside on your server, like TraceWatch or phpMyVisites) that can track 404 errors? Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-737871 Share on other sites More sharing options...
inimicus Posted May 8, 2008 Share Posted May 8, 2008 I don't mess with 404s. It just means more bandwidth and more fuss. So sorry, I dunno of anything. I just print, "not found, mang." when a 404 occurs. However, you can try some code on your 404 page that might pull the redirect information. Let's say http://www.######.com/index.php?opt...9&Itemid=45 is a 404. $ref = explode('/', $_SERVER['HTTP_REFERER']); That will put the referencing URL into an array divided by slashes. The name of the page will be in the last item of the array. So http://www.######.com/index.php?opt...9&Itemid=45 becomes... $ref => Array [0] => http: [1] => www.######.com [2] => index.php?option=com_content&task=view&id=69&Itemid=45 Then you can pull the page name with... $name = $ref[count($ref)-1]; And now you know that index.php?option=com_content&task=view&id=69&Itemid=45 is creating a 404. You could then write the names to a table in your database for review later. I haven't thoroughly tested this, so it's all speculation. But it's server-side, just as you need. Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-739841 Share on other sites More sharing options...
(MoC) Posted May 8, 2008 Share Posted May 8, 2008 This could give you some insight... Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-739992 Share on other sites More sharing options...
~pcwiz Posted May 8, 2008 Author Share Posted May 8, 2008 MoC, Thanks, I was looking at that earlier but the software is commercial Thanks for the link though inimicus, I implemented a tracker script on my 404 page very similar to that one, except it emails me the info. So I did find out some bots that were causing it. Here are some of the notifications I got: Requested Page: /MarkAny/Websafer/MaSiteInfo.ini Referred By: Unknown Remote Addr: 59.10.114.5 () Request URI: /MarkAny/Websafer/MaSiteInfo.ini For this one I have not even heard of anything called "/MarkAny/Websafer/MaSiteInfo.ini". I did an IP Lookup on it and it belongs to the "Korea Telecom Network Management Center" Requested Page: /components/com_jo Referred By: http://######.com/index.php?option=com_content&task=view&id=76&Itemid=48 Remote Addr: 124.120.143.208 () Request URI: /components/com_jo The components directory is set to disallow access by robots.txt, and com_jo is not the full folder path. The IP belongs to "True Internet" in China, sounds like some kind of ISP. Requested Page: /mp_shadow_l_t.png Referred By: http://######.com/index.php?option=com_content&task=view&id=76&Itemid=48 Remote Addr: 84.199.23.235 () Request URI: /mp_shadow_l_t.png The IP belongs to "Telenet operaties N.V." in Belgium, the image it requested does exist but it is in a different directory. I checked the referring address and everything is fine there. Requested Page: /arrow.png Referred By: http://######.com/index.php?option=com_content&task=view&id=76&Itemid=48 Remote Addr: 84.199.23.235 () Request URI: /arrow.png Again, image exists but its not in the root directory as requested. Requested Page: /mambots/system/jceutilities/js/}}}A(this.number).html(E);A( Referred By: Unknown Remote Addr: 87.210.64.10 () Request URI: /mambots/system/jceutilities/js/}}}A(this.number).html(E);A( I dunno why it was asking for that, IP belongs to "Versatel Consumer ISP" in Netherlands. Anyway, all of these to me look like bot requests and not legit 404s, they seem to be normal. But see the thing is, before with the Joomla static 404 page I was getting like 10 hits a minute on the 404, but with the custom script I got like 10 emails in 24 hours. Something weird going on, but it seems that it is not affecting human users in any way so thats good but I would still like to find out the reason for this. Another thing, I am using a 30 day demo of an app called SortSite that does one click site analysis for 404 and other errors, I'll post up the results here. EDIT: I did the test and no errors or broken links found Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-740254 Share on other sites More sharing options...
thomblake Posted August 6, 2010 Share Posted August 6, 2010 OK, I made some more progress. I found out that the same 2 IP addresses were repeatedly accessing the page: 67.142.130.13 66.249.85.67 I went to the ARIN WHOIS search and I typed in the first one (67.142.130.13): Address: DirecWAY Network Management Center Address: attn: Network Security Manager City: Germantown StateProv: MD PostalCode: 20876 Country: US NetRange: 67.142.0.0 - 67.143.255.255 CIDR: 67.142.0.0/15 NetName: DIRECPC-1BLK NetHandle: NET-67-142-0-0-1 Parent: NET-67-0-0-0-0 NetType: Direct Allocation NameServer: NS1.DIRECPC.COM NameServer: NS2.DIRECPC.COM Comment: RegDate: 2003-12-12 Updated: 2004-03-04 OrgTechHandle: NSM5-ARIN OrgTechName: Network Security Manager OrgTechPhone: +1-301-601-7205 OrgTechEmail: abuse@hughes.net Regarding this one, it's a common but often mysterious problem with Hughes. I posted a description of it on my blog: http://thomblake.com/2010/08/05/hughes-isp/ Link to comment https://www.insanelymac.com/forum/topic/103308-404-error-logs-on-site/#findComment-1525799 Share on other sites More sharing options...
Recommended Posts