4images Forum & Community

4images Issues / Ausgaben => Discussion & Troubleshooting => Topic started by: Lucifix on September 16, 2008, 11:30:50 AM

Title: Googlebot is indexing sessionid
Post by: Lucifix on September 16, 2008, 11:30:50 AM
I don't know why but googlebot is indexing with sessionid?

I find v@no modifications but this doesn't work, maybe because he did modification for older 4images version:
http://www.4homepages.de/forum/index.php?topic=6729.msg59251#msg59251

This line doesn't exist in 1.7.6 version any more:
Code: [Select]
    if ($this->mode == "get" && !preg_match("/".SESSION_NAME."=/i", $url)) {
You can check my results here:
http://www.google.si/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=sl&q=site%3Awww.slo-foto.net+sessionid&meta=&btnG=Iskanje+Google
Title: Re: Googlebot is indexing sessionid
Post by: V@no on September 16, 2008, 12:34:27 PM
I've updated the post you referring to. I guess it'll take a few days before you see the effect...
Title: Re: Googlebot is indexing sessionid
Post by: Lucifix on September 16, 2008, 12:41:24 PM
Thx for help v@no, I'm aware that google cache will be updated in few days, but I'm tracking google IP on my site and I still see that it is indexing my site with session IDs:

Printscreen:
(http://img228c.imageshack.us/img228/8021/gcyy8.jpg)
Title: Re: Googlebot is indexing sessionid
Post by: mawenzi on September 16, 2008, 01:10:50 PM
@Lucifix

... and you use also [MOD] Treat bots as users with less rights ... ?
... both modifications (by martrix and v@no) works perfect together for hiding session IDs for google bots ... !
Title: Re: Googlebot is indexing sessionid
Post by: Lucifix on September 16, 2008, 01:13:26 PM
@Lucifix

... and you use also [MOD] Treat bots as users with less rights ... ?
... both modification works perfect together for hiding session IDs for google bots ... !

V@no wrote that you have to install only 3rd step, which I did.
Title: Re: Googlebot is indexing sessionid
Post by: V@no on September 16, 2008, 01:20:48 PM
So far from what I can see your session.php doesn't work right. If I block cookies for your site I get different sessionid after each click - this is wrong.
When I alter user agent to "googlebot" it still attach sessionid which it shouldn't if you did the modifications right.
Title: Re: Googlebot is indexing sessionid
Post by: Lucifix on September 16, 2008, 01:23:00 PM
So far from what I can see your session.php doesn't work right. If I block cookies for your site I get different sessionid after each click - this is wrong.
When I alter user agent to "googlebot" it still attach sessionid which it shouldn't if you did the modifications right.

I have the same thing in mind. Is it sessions.php file associate with any other file (for sessions)? (I don't mean with etc. details.php session-url)
Title: Re: Googlebot is indexing sessionid
Post by: V@no on September 16, 2008, 01:25:45 PM
Is it sessions.php file associate with any other file (for sessions)? (I don't mean with etc. details.php session-url)
Yes.
And if this is the only change you've made in there, I'd suggest you restore original sessions.php, make sure its working properly, then try the modification again ;)
Title: Re: Googlebot is indexing sessionid
Post by: mawenzi on September 16, 2008, 01:33:30 PM
V@no wrote that you have to install only 3rd step, which I did.

... ok ... I see ...
... I use the whole mod by martrix and so the hiding works perfect for me ...
... but as V@no already said, restore and try it again ... ;)
Title: Re: Googlebot is indexing sessionid
Post by: Lucifix on September 16, 2008, 01:37:38 PM
sessions.php has been restored to original one (Ban MOD is integrated in, but nothing else). Okey, looks like I hard coding is waiting for me  :roll:
Title: Re: Googlebot is indexing sessionid
Post by: V@no on September 16, 2008, 01:42:32 PM
Maybe I'm looking somewhere else but it looks like you are using some kind of integration with a "nuke"? I couldn't find a direct link to 4images installation...
Title: Re: Googlebot is indexing sessionid
Post by: Lucifix on September 16, 2008, 01:48:07 PM
You are right, my gallery is integrated in PHP-Nuke. But don't know how this would effect problem with sessions ID.
Title: Re: Googlebot is indexing sessionid
Post by: V@no on September 16, 2008, 02:02:09 PM
I have no idea how your integration works, but most integrations are required changes in session.php...

P.S. I still get random sessionid after each click.
Title: Re: Googlebot is indexing sessionid
Post by: Lucifix on September 16, 2008, 02:12:11 PM
How can I check by myself if I'll get random sessionid after each click?

It's little hard to explain how 4images is integrated in PHP-Nuke. There is tutorial on this forum, but link to download site doesn't work any more
http://www.4homepages.de/forum/index.php?topic=5547.0

4images is inserted in modules directory and it uses only header.php, footer.php, and some other files from PHP-nuke. But it is using same tables as 4images gallery. As I recall PHP-Nuke doesn't use any sessionID (except phpBB integrated in PHP-Nuke).
Title: Re: Googlebot is indexing sessionid
Post by: V@no on September 16, 2008, 02:39:55 PM
How can I check by myself if I'll get random sessionid after each click?
Disable cookies in your browser. sessionid being added to links only when no cookies are stored.
Title: Re: Googlebot is indexing sessionid
Post by: Lucifix on September 17, 2008, 12:41:01 PM
So far from what I can see your session.php doesn't work right. If I block cookies for your site I get different sessionid after each click - this is wrong.
When I alter user agent to "googlebot" it still attach sessionid which it shouldn't if you did the modifications right.

Guess you were right.

I found out that I get different sessionid because of different URL structure (test it with original 4images gallery - no modifications made). I tried to found out why sessionid doesn't work with such URL structure, but without luck.

My gallery is using these URLs:
http://www.slo-foto.net/modules.php?name=Galerija
http://www.slo-foto.net/modules.php?name=Galerija&file=details&image_id=40880
http://www.slo-foto.net/modules.php?name=Galerija&file=hall_of_fame

Anyone has any idea how does sessions.php work?
Title: Re: Googlebot is indexing sessionid
Post by: V@no on September 17, 2008, 02:57:02 PM
can you send me the following files:
index.php
global.php
includes/constants.php
includes/functions.php
includes/page_footer.php
includes/page_header.php
includes/sessions.php
?
Title: Re: Googlebot is indexing sessionid
Post by: Lucifix on September 17, 2008, 03:05:08 PM
I think you'll need to install whole php-nuke code, because only with few files it won't work. If you are willing to install it you get get it here:
http://www.nukeresources.com/downloadview-details-943-Nuke_7.6_Patched.html

I attached integrated 4images gallery for PHP-Nuke. I assume that this will take you some time, so no hard feelings from me if you don't do it ;)

How important is sessionsid? If it's not so important maybe I could completely remove it.

Title: Re: Googlebot is indexing sessionid
Post by: V@no on September 17, 2008, 03:10:46 PM
I'll see what I can do in about 9 hours ;)

How important is sessionsid? If it's not so important maybe I could completely remove it.
Well, if you trust that all your members have cookies enabled, then you shouldn't worry about it, otherwise these who has blocked cookies will not be able login, or even search on your 4images site, and other features that stores information in session would obviously not work.

Basically new sessionid means 4images creates new session (empty) session and doesn't use data from previous
Title: Re: Googlebot is indexing sessionid
Post by: V@no on September 18, 2008, 01:58:25 AM
What can I say, I just installed phpnuke, installed 4images module, finally got them work together, blocked cookies and I got one sessionid that wouldn't change when I was clicking on links.
So, I guess there is something else you've changed either in phpnuke, or in 4images code...
Title: Re: Googlebot is indexing sessionid
Post by: Lucifix on September 18, 2008, 07:51:08 AM
Thx, when I come at home I'll install on my computer fresh installation of PHP-Nuke and check it out. Maybe I'll upgrade PHP-Nuke. I didn't mean that there is problem with my version of PHP-Nuke.

Thanks again v@no for you help!
Title: Re: Googlebot is indexing sessionid
Post by: cpuswe on October 13, 2008, 03:41:59 PM
You can tell Google to not index sessionid in your robots.txt

This is how mine looks

User-Agent: *
Allow: /
Disallow: /*?sessionid=
Disallow: /member.php
Disallow: /*.search.htm

http://www.robotstxt.org/
http://www.smart-it-consulting.com/article.htm?node=140&page=46
Title: Re: Googlebot is indexing sessionid
Post by: V@no on October 13, 2008, 04:08:35 PM
You can tell Google to not index sessionid in your robots.txt

That is probably the best way to do it.
Title: Re: Googlebot is indexing sessionid
Post by: Lucifix on October 14, 2008, 08:24:52 AM
Will this dissalow bot to crawl all URL's with session at the end or only remove ?sessionid= at the end of URL?
Title: Re: Googlebot is indexing sessionid
Post by: AntiNSA2 on May 17, 2009, 09:13:10 PM
Can you tell me what you did and can anyone say the bad things that can result in denying a session Id to robots?
Title: Re: Googlebot is indexing sessionid
Post by: AntiNSA2 on May 19, 2009, 06:51:59 AM
@Lucifix

... and you use also [MOD] Treat bots as users with less rights ... ?
... both modifications (by martrix and v@no) works perfect together for hiding session IDs for google bots ... !

I am trying to understand... wouldnt this
Code: [Select]
I stubled upon an article about the robots.txt file accepting wildcards (or more exact the bots that read robots.txt) I haveŽnt seen any forumpost so...

To keep bots from indexing urls containing sessionind i have added the following to my robots.txt

User-Agent: *
Allow: /
Disallow: /*?sessionid=

I have tested this against the Google bot via Google sitemaps and it is OK by the tool used.

I am no expert on this and i have just implemented it so the impact on indexing is a bit unclear but for me worth a try...

To read more:

http://www.smart-it-consulting.com/article.htm?node=140&page=46
http://www.ysearchblog.com/archives/000372.html
http://www.webmasterstalks.com/seo-4-smf/robots-txt-t1040.0.html

If you dont know what robots.txt is, start here: http://www.robotstxt.org/
be better? if not , why?