Author Topic: [GOOGLE SITEMAPPER TECHNICAL QUERY] WORRYING RESULTS - GOOGLE SITEMAPPER  (Read 7972 times)

0 Members and 1 Guest are viewing this topic.

Offline sajwal

  • Jr. Member
  • **
  • Posts: 61
    • View Profile
Hello People,
                   I used google sitemapper to check the links to make a sitemap for my site and got worrying results displayed...when sitemapper spiders my site it shows thousands of results where pages of ADD TO LIGHT BOX are shown several(most of ) times with amp;amp;amp;amp;amp; characters in line added by session id.

I am attaching the part of spidered results . Pls. let me know if it counts as spaming and how can i reduce these pages to get spidered if its a factor of worries

Pls. see the sitemapper.txt and guide me what it means and what is good/bad about it

Offline V@no

  • If you don't tell me what to do, I won't tell you where you should go :)
  • Global Moderator
  • 4images Guru
  • *****
  • Posts: 17.849
  • mmm PHP...
    • View Profile
    • 4images MODs Demo
I guess, there is nothing you can do about it, unless you do something to limit available functionality for search bots on your website...but its way too many changes to do in the code...
Your first three "must do" before you ask a question:
Please do not PM me asking for help unless you've been specifically asked to do so. Such PMs will be deleted without answer. (forum rule #6)
Extension for Firefox/Thunderbird: Master Password+    Back/Forward History Tweaks (restartless)    Cookies Manager+    Fit Images (restartless for Thunderbird)

Offline sajwal

  • Jr. Member
  • **
  • Posts: 61
    • View Profile
Thankyou vano for taking time to reply,
                                                       Only thing to help me pls just tell me code that I do not allow spidering of action=lightbox as seen in the attachment

Offline V@no

  • If you don't tell me what to do, I won't tell you where you should go :)
  • Global Moderator
  • 4images Guru
  • *****
  • Posts: 17.849
  • mmm PHP...
    • View Profile
    • 4images MODs Demo
let try this. Asuming you already installed [MOD] Treat bots as users with less rights (if you havent, then do at least Step 3 and remove     $site_sess->login($val, "12345"); line from it).

Then in includes/functions.php find:
Code: [Select]
  if ($user_info['user_level'] != GUEST) {
    $lightbox_url = $self_url;
Replace with:
Code: [Select]
  global $user_bot;
  if ($user_info['user_level'] != GUEST && !$user_bot) {
    $lightbox_url = $self_url;
It should not show lightbox link to the bots anymore.
Your first three "must do" before you ask a question:
Please do not PM me asking for help unless you've been specifically asked to do so. Such PMs will be deleted without answer. (forum rule #6)
Extension for Firefox/Thunderbird: Master Password+    Back/Forward History Tweaks (restartless)    Cookies Manager+    Fit Images (restartless for Thunderbird)

Offline sajwal

  • Jr. Member
  • **
  • Posts: 61
    • View Profile
Thank you very much V@no :P you made me happy again :P

Offline AntiNSA2

  • Hero Member
  • *****
  • Posts: 774
  • As long as I can finish my site before I die.
    • View Profile
    • http://www.thelifephotography.com
why not ad a robot.txt nofollow to not allow google to access lightbox.php?
As long as I can finish my site before I die.

Offline Anarchology

  • Jr. Member
  • **
  • Posts: 60
  • I LULZ too much!
    • View Profile
    • Tainted Pix
I have found it easier, and much faster to use a sitemap program rather than to wait for Google to create a sitemap of your site. Pretty much, you just sit back, while the program trolls through your site under your own permissions. Now, the one that I have used (I think it is the one below), allows you to set words/scripts NOT to crawl. Depending on exactly how many images you have and internet speed, it may take up to a couple of hours to properly scroll through your site.

The programs create exportable sitemap files that you are then able to upload to your root directory, and then point google to it. Personally, I know that I had Google index 9,000 pages thanks to one of the sitemap programs.

This is one, but not sure if this is the one I used. I don't think there are any restricted permissions for the trial, since I only used it during the trial...
http://download.cnet.com/A1-Sitemap-Generator/3000-10248_4-10496253.html?tag=mncol

BUT REMEMBER: Search through the features of sitemap programs allowing you to omit certain words like "admin, lightbox, edit, delete, etc"... which ever words or pages you do not want to be indexed. Also, keep in mind that doing so drastically cuts down on the time consumed when it scrolls your site. Finally, the more you restrict from the robot of these programs, the less amount of bandwidth it will obviously use up when scrolling.

EDIT: Forgot to mention the most important part... LOGOUT of your site (especially if under admin) when these programs are running. This will stop the program from scrolling admin or member actions that could be a security risk!
A personal THANK YOU to all of the great programmers on this site for helping me get my site from something basic to what it is today!

My site: http://taintedpix.com
(warning: some adult content)