Home Search Member List Faq Register Login  
UltimateSearch
Re: Issues working with a robots.txt file

Thread Starter: cameron   Started: 05-28-2009 9:54 PM   Replies: 13
 Karamasoft Support Forums » General Discussions » UltimateSearch » Re: Issues working with a robots.txt file
 Printable Version    « Previous Thread   Next Thread »
  28 May 2009, 9:54 PM
cameron is not online. Last active: 3/16/2010 5:42:48 PM cameron

Top 10 Posts
Joined on 06-19-2008
Melbourne
Posts 7
Issues working with a robots.txt file
Hi, I'm having an issue getting a robots.txt file to work with UltimateSearch 3.2.

I am indexing urls listed in an XML document (at the moment this is just one email being the root of the site I am testing on):
            <scanXmlList>
                <scanXml>
                    <filePath>~/SearchResourceList.xml</filePath>
                    <urlXPath>/resourcesToIndex/resource/@url</urlXPath>
                </scanXml>
            </scanXmlList>
robots.txt and meta files are turned on:
            <add key="useRobotsFile" value="true" />
            <add key="useRobotsMeta" value="true" />

I have set the user agent (I originally tried this with the default userAgent setting):
      <add key="userAgent" value="Karamasoft" />
and set up a robots.txt file in the root of the site being indexed with these settings:
User-agent: Karamasoft
Disallow: /showcase/
However, even after a full reindex, the pages in that folder are still being indexed and appear in search results, I've even tried both with and without the trailing / in the Disallow statement.

I also tried using a robots META tag INDEX, NOFOLLOW -- this was also ignored.

Am I missing something?





www.areeba.com.au

  
  29 May 2009, 1:42 PM
Karamasoft is not online. Last active: 9/2/2010 12:02:18 PM Karamasoft

Top 10 Posts
Joined on 09-05-2004
Posts 5,325
Re: Issues working with a robots.txt file
UserAgent in the robots file is disregarded. Have you tried setting Disallow to a full prefix such as http://www.yourdomain.com/showcast?
  
  31 May 2009, 5:00 PM
cameron is not online. Last active: 3/16/2010 5:42:48 PM cameron

Top 10 Posts
Joined on 06-19-2008
Melbourne
Posts 7
Re: Issues working with a robots.txt file
Sorry, I don't think I quite understand what you are suggesting. 

Are you saying that the robots.txt file specified in external sites isn't used, and that we need to create settings in the robots.txt file of the local site (the one running karamasoft) using full URLs of the areas to disallow?




www.areeba.com.au

  
  31 May 2009, 7:55 PM
Karamasoft is not online. Last active: 9/2/2010 12:02:18 PM Karamasoft

Top 10 Posts
Joined on 09-05-2004
Posts 5,325
Re: Issues working with a robots.txt file
Sorry for the confusion. Robots.txt file must have been used properly. We're just trying to figure out the issue on your machine. Under normal circumstances, it was supposed to find the domain name properly and the robots file and tag settings should have worked. You may consider making some changes to those settings to see how it affects the behavior. You may test the provided C# and VB apps, and compare to your own website.
  
  31 May 2009, 8:31 PM
cameron is not online. Last active: 3/16/2010 5:42:48 PM cameron

Top 10 Posts
Joined on 06-19-2008
Melbourne
Posts 7
Re: Issues working with a robots.txt file
Hi, hopefully I understanad you correctly:

I've updated the robots.txt file on the remote site to include it's full URL.
User-agent: Karamasoft
Disallow: http://www.example.com/showcast/
Unfortunately it still seems to be ignoring the robots.txt file.

Is there somewhere I can send a test solution that reproduces the issue that you can have a look at?






www.areeba.com.au

  
  31 May 2009, 11:21 PM
Karamasoft is not online. Last active: 9/2/2010 12:02:18 PM Karamasoft

Top 10 Posts
Joined on 09-05-2004
Posts 5,325
Re: Issues working with a robots.txt file

Could you change your robots.txt file from:

User-agent: Karamasoft
Disallow: /showcase

# Instructions for all robots
User-agent: *

# Do not access the following directories
Disallow: /bin
Disallow: /aspnet_client

to:

# Instructions for all robots
User-agent: *

# Do not access the following directories
Disallow: /bin
Disallow: /aspnet_client
Disallow: /showcase

 


  
  31 May 2009, 11:42 PM
cameron is not online. Last active: 3/16/2010 5:42:48 PM cameron

Top 10 Posts
Joined on 06-19-2008
Melbourne
Posts 7
Re: Issues working with a robots.txt file
Hi, I've made that update now.



www.areeba.com.au

  
  31 May 2009, 11:45 PM
Karamasoft is not online. Last active: 9/2/2010 12:02:18 PM Karamasoft

Top 10 Posts
Joined on 09-05-2004
Posts 5,325
Re: Issues working with a robots.txt file
Now try to index again.
  
  31 May 2009, 11:53 PM
cameron is not online. Last active: 3/16/2010 5:42:48 PM cameron

Top 10 Posts
Joined on 06-19-2008
Melbourne
Posts 7
Re: Issues working with a robots.txt file
That is working correctly.  Is is then just that it isn't set up for processing instructions based on a particular user agent string?



www.areeba.com.au

  
  01 Jun 2009, 12:01 AM
Karamasoft is not online. Last active: 9/2/2010 12:02:18 PM Karamasoft

Top 10 Posts
Joined on 09-05-2004
Posts 5,325
Re: Issues working with a robots.txt file

That's correct.


  
  01 Jun 2009, 12:12 AM
cameron is not online. Last active: 3/16/2010 5:42:48 PM cameron

Top 10 Posts
Joined on 06-19-2008
Melbourne
Posts 7
Re: Issues working with a robots.txt file
Any plans to add that sort of functionality?

In the usage we were looking at, we were looking at indexing some portions of partners sites for inclusion in a seperate specialist directory.  In these cases the client sites would want to disallow sections of the sites seperately from the general rules used for search engines such as Google -- the best way to allow that would be to allow a user agent string to be specified for processing the robots.txt rules.

Obviously if only the default * user agent rule is used, then Ultimate Search isn't going to be an effective solution for this system.





www.areeba.com.au

  
  01 Jun 2009, 12:58 PM
Karamasoft is not online. Last active: 9/2/2010 12:02:18 PM Karamasoft

Top 10 Posts
Joined on 09-05-2004
Posts 5,325
Re: Issues working with a robots.txt file
We added this issue to our enhancements list. We're planning to release a new build with this feature in four to six weeks.
  
  01 Jun 2009, 4:04 PM
cameron is not online. Last active: 3/16/2010 5:42:48 PM cameron

Top 10 Posts
Joined on 06-19-2008
Melbourne
Posts 7
Re: Issues working with a robots.txt file
Excellent news, thanks for your assistance.



www.areeba.com.au

  
  13 Jul 2009, 7:48 AM
Karamasoft is not online. Last active: 9/2/2010 12:02:18 PM Karamasoft

Top 10 Posts
Joined on 09-05-2004
Posts 5,325
Re: Issues working with a robots.txt file
Please download the latest build.
  
 Page 1 of 1 (14 items)
Karamasoft Support Forums » General Discussions » UltimateSearch » Re: Issues working with a robots.txt file

You can add attachments
You can post new topics
You can reply to topics
You can delete your posts
You can edit your posts
You can create polls
You can vote in polls
Forum statistics are enabled
Forum is unmoderated

© 2002-2010 Karamasoft LLC. All rights reserved.