Author Topic: Googlebot is indexing sessionid  (Read 21501 times)

0 Members and 1 Guest are viewing this topic.

Offline Lucifix

  • Hero Member
  • *****
  • Posts: 710
    • View Profile
    • http://www.slo-foto.net
Re: Googlebot is indexing sessionid
« Reply #15 on: September 17, 2008, 12:41:01 PM »
So far from what I can see your session.php doesn't work right. If I block cookies for your site I get different sessionid after each click - this is wrong.
When I alter user agent to "googlebot" it still attach sessionid which it shouldn't if you did the modifications right.

Guess you were right.

I found out that I get different sessionid because of different URL structure (test it with original 4images gallery - no modifications made). I tried to found out why sessionid doesn't work with such URL structure, but without luck.

My gallery is using these URLs:
http://www.slo-foto.net/modules.php?name=Galerija
http://www.slo-foto.net/modules.php?name=Galerija&file=details&image_id=40880
http://www.slo-foto.net/modules.php?name=Galerija&file=hall_of_fame

Anyone has any idea how does sessions.php work?

Offline V@no

  • If you don't tell me what to do, I won't tell you where you should go :)
  • Global Moderator
  • 4images Guru
  • *****
  • Posts: 17.849
  • mmm PHP...
    • View Profile
    • 4images MODs Demo
Re: Googlebot is indexing sessionid
« Reply #16 on: September 17, 2008, 02:57:02 PM »
can you send me the following files:
index.php
global.php
includes/constants.php
includes/functions.php
includes/page_footer.php
includes/page_header.php
includes/sessions.php
?
Your first three "must do" before you ask a question:
Please do not PM me asking for help unless you've been specifically asked to do so. Such PMs will be deleted without answer. (forum rule #6)
Extension for Firefox/Thunderbird: Master Password+    Back/Forward History Tweaks (restartless)    Cookies Manager+    Fit Images (restartless for Thunderbird)

Offline Lucifix

  • Hero Member
  • *****
  • Posts: 710
    • View Profile
    • http://www.slo-foto.net
Re: Googlebot is indexing sessionid
« Reply #17 on: September 17, 2008, 03:05:08 PM »
I think you'll need to install whole php-nuke code, because only with few files it won't work. If you are willing to install it you get get it here:
http://www.nukeresources.com/downloadview-details-943-Nuke_7.6_Patched.html

I attached integrated 4images gallery for PHP-Nuke. I assume that this will take you some time, so no hard feelings from me if you don't do it ;)

How important is sessionsid? If it's not so important maybe I could completely remove it.


Offline V@no

  • If you don't tell me what to do, I won't tell you where you should go :)
  • Global Moderator
  • 4images Guru
  • *****
  • Posts: 17.849
  • mmm PHP...
    • View Profile
    • 4images MODs Demo
Re: Googlebot is indexing sessionid
« Reply #18 on: September 17, 2008, 03:10:46 PM »
I'll see what I can do in about 9 hours ;)

How important is sessionsid? If it's not so important maybe I could completely remove it.
Well, if you trust that all your members have cookies enabled, then you shouldn't worry about it, otherwise these who has blocked cookies will not be able login, or even search on your 4images site, and other features that stores information in session would obviously not work.

Basically new sessionid means 4images creates new session (empty) session and doesn't use data from previous
Your first three "must do" before you ask a question:
Please do not PM me asking for help unless you've been specifically asked to do so. Such PMs will be deleted without answer. (forum rule #6)
Extension for Firefox/Thunderbird: Master Password+    Back/Forward History Tweaks (restartless)    Cookies Manager+    Fit Images (restartless for Thunderbird)

Offline V@no

  • If you don't tell me what to do, I won't tell you where you should go :)
  • Global Moderator
  • 4images Guru
  • *****
  • Posts: 17.849
  • mmm PHP...
    • View Profile
    • 4images MODs Demo
Re: Googlebot is indexing sessionid
« Reply #19 on: September 18, 2008, 01:58:25 AM »
What can I say, I just installed phpnuke, installed 4images module, finally got them work together, blocked cookies and I got one sessionid that wouldn't change when I was clicking on links.
So, I guess there is something else you've changed either in phpnuke, or in 4images code...
Your first three "must do" before you ask a question:
Please do not PM me asking for help unless you've been specifically asked to do so. Such PMs will be deleted without answer. (forum rule #6)
Extension for Firefox/Thunderbird: Master Password+    Back/Forward History Tweaks (restartless)    Cookies Manager+    Fit Images (restartless for Thunderbird)

Offline Lucifix

  • Hero Member
  • *****
  • Posts: 710
    • View Profile
    • http://www.slo-foto.net
Re: Googlebot is indexing sessionid
« Reply #20 on: September 18, 2008, 07:51:08 AM »
Thx, when I come at home I'll install on my computer fresh installation of PHP-Nuke and check it out. Maybe I'll upgrade PHP-Nuke. I didn't mean that there is problem with my version of PHP-Nuke.

Thanks again v@no for you help!

Offline cpuswe

  • Newbie
  • *
  • Posts: 47
    • View Profile
Re: Googlebot is indexing sessionid
« Reply #21 on: October 13, 2008, 03:41:59 PM »
You can tell Google to not index sessionid in your robots.txt

This is how mine looks

User-Agent: *
Allow: /
Disallow: /*?sessionid=
Disallow: /member.php
Disallow: /*.search.htm

http://www.robotstxt.org/
http://www.smart-it-consulting.com/article.htm?node=140&page=46

Offline V@no

  • If you don't tell me what to do, I won't tell you where you should go :)
  • Global Moderator
  • 4images Guru
  • *****
  • Posts: 17.849
  • mmm PHP...
    • View Profile
    • 4images MODs Demo
Re: Googlebot is indexing sessionid
« Reply #22 on: October 13, 2008, 04:08:35 PM »
You can tell Google to not index sessionid in your robots.txt

That is probably the best way to do it.
Your first three "must do" before you ask a question:
Please do not PM me asking for help unless you've been specifically asked to do so. Such PMs will be deleted without answer. (forum rule #6)
Extension for Firefox/Thunderbird: Master Password+    Back/Forward History Tweaks (restartless)    Cookies Manager+    Fit Images (restartless for Thunderbird)

Offline Lucifix

  • Hero Member
  • *****
  • Posts: 710
    • View Profile
    • http://www.slo-foto.net
Re: Googlebot is indexing sessionid
« Reply #23 on: October 14, 2008, 08:24:52 AM »
Will this dissalow bot to crawl all URL's with session at the end or only remove ?sessionid= at the end of URL?

Offline AntiNSA2

  • Hero Member
  • *****
  • Posts: 774
  • As long as I can finish my site before I die.
    • View Profile
    • http://www.thelifephotography.com
Re: Googlebot is indexing sessionid
« Reply #24 on: May 17, 2009, 09:13:10 PM »
Can you tell me what you did and can anyone say the bad things that can result in denying a session Id to robots?
As long as I can finish my site before I die.

Offline AntiNSA2

  • Hero Member
  • *****
  • Posts: 774
  • As long as I can finish my site before I die.
    • View Profile
    • http://www.thelifephotography.com
Re: Googlebot is indexing sessionid
« Reply #25 on: May 19, 2009, 06:51:59 AM »
@Lucifix

... and you use also [MOD] Treat bots as users with less rights ... ?
... both modifications (by martrix and v@no) works perfect together for hiding session IDs for google bots ... !

I am trying to understand... wouldnt this
Code: [Select]
I stubled upon an article about the robots.txt file accepting wildcards (or more exact the bots that read robots.txt) I haveŽnt seen any forumpost so...

To keep bots from indexing urls containing sessionind i have added the following to my robots.txt

User-Agent: *
Allow: /
Disallow: /*?sessionid=

I have tested this against the Google bot via Google sitemaps and it is OK by the tool used.

I am no expert on this and i have just implemented it so the impact on indexing is a bit unclear but for me worth a try...

To read more:

http://www.smart-it-consulting.com/article.htm?node=140&page=46
http://www.ysearchblog.com/archives/000372.html
http://www.webmasterstalks.com/seo-4-smf/robots-txt-t1040.0.html

If you dont know what robots.txt is, start here: http://www.robotstxt.org/
be better? if not , why?
As long as I can finish my site before I die.