Home Search Member List Faq Register Login  
UltimateSearch
Re: blank index file created

Thread Starter: crypton   Started: 01-22-2007 8:18 PM   Replies: 13
 Karamasoft Support Forums » General Discussions » UltimateSearch » Re: blank index file created
 Printable Version    « Previous Thread   Next Thread »
  22 Jan 2007, 8:18 PM
crypton is not online. Last active: 1/25/2007 9:10:03 PM crypton

Top 10 Posts
Joined on 11-26-2006
Posts 13
blank index file created

I've been  messing with this for 4 hours now, and can't figure it out.  The index file is created, however it's for all intents and purposes empty (not actually empty, but contains no indexing info).  It's 498 bytes.  If I delete this file it's immediately created again.  Delete... delete... delete.... always recreates itself.  Running a search always returns no results very quickly.

event.log contains a record:


1/22/2007 8:09:31 PM, Read config file
1/22/2007 8:09:31 PM, Add config to cache
1/22/2007 8:09:31 PM, Crawl using config file
1/22/2007 8:09:31 PM, Write index file
1/22/2007 8:09:31 PM, Add index to cache

That shows up every time I delete the index file and it's recreated.  The directory has full control, so it doesn't seem like a permissions error.  I'm at a complete loss here.  Below is my config file:

<ultimateSearch>
  <configuration>
    <appSettings>
      <!-- Starts scanning (crawling and indexing) the files under the following
     directories and continues until it covers all subdirectories underneath.
        If you don't specify anything in scanDirectoryList, scanXmlList or scanUrlList
     it scans the files under the current web application by default.
        Note that if you enter anything in scanDirectoryList you also need to set
     mapPathList below so that it can map to the virtual path to crawl properly.
        For example, you may have a list as below:
    <scanDirectoryList>
     <scanDirectory>c:\inetpub\wwwroot\WebApplication1</scanDirectory>
     <scanDirectory>c:\inetpub\wwwroot\WebApplication2\PublicFolder</scanDirectory>
    </scanDirectoryList>
    -->
      <scanDirectoryList>
        <scanDirectory></scanDirectory>
      </scanDirectoryList>

      <!-- Parses the local XML file specified by "filePath" to extract the urls
     from the elements or attributes specified by "urlXPath".
        You can list one or more website navigation files such as UltimateMenu, UltimatePanel and
     UltimateSitemap source XML files, each one specified in a separate "scanXml" element.
        Note that "urlXPath" is case-sensitive. Also note that you can set "filePath"
     in three different forms.
        For example, you may have a list as below:
    <scanXmlList>
     <scanXml>
      <filePath>http://localhost/WebApplication1/menu.xml</filePath>
      <urlXPath>//@URL</urlXPath>
     </scanXml>
     <scanXml>
      <filePath>C:\inetpub\wwwroot\WebApplication1\panel.xml</filePath>
      <urlXPath>//@URL</urlXPath>
     </scanXml>
     <scanXml>
      <filePath>~/web.sitemap</filePath>
      <urlXPath>//@url</urlXPath>
     </scanXml>
    </scanXmlList>
    -->
      <scanXmlList>
        <scanXml>
          <filePath></filePath>
          <urlXPath></urlXPath>
        </scanXml>
      </scanXmlList>

      <!-- Starts scanning (crawling and indexing) with each of the following urls and continues
     with the urls inside each page until it covers all urls within each domain.
     You can list multiple domains, home pages, sitemap pages, or any other url.
        Note that scanUrl can be set to any URL that opens as a page
     in your browser window. If you set it to a directory like WebApplication2 below
     you should enable default documents on the Documents tab of the IIS settings.
        For example, you may have a list as below:
    <scanUrlList>
     <scanUrl>http://localhost/WebApplication1/WebForm1.aspx</scanUrl>
     <scanUrl>http://localhost/WebApplication2</scanUrl>
     <scanUrl>http://localhost/WebApplication3/Sitemap.aspx</scanUrl>
    </scanUrlList>
    -->
      <scanUrlList>
        <scanUrl>http://timbertech.strata-g.com/</scanUrl>
      </scanUrlList>

      <!-- Urls starting with the following prefixes will be discarded.
        Note that you can also use the robots.txt file to disallow paths, or
     robots meta tags to set noindex and nofollow flags in each page.
        You may visit http://www.robotstxt.org/wc/exclusion-admin.html
     to get more familiar with the robots.txt file and meta tags.
        If you don't specify anything it will exclude the UltimateSearchInclude
     directory under the current web application by default.
        For example, you may have a list as below:
    <excludePathList>
     <excludePath>http://localhost/WebApplication1/UltimateEditorInclude</excludePath>
     <excludePath>http://localhost/WebApplication1/UltimateSpellInclude</excludePath>
     <excludePath>http://localhost/WebApplication1/UltimateSearchInclude</excludePath>
     <excludePath>http://localhost/WebApplication1/WebForm2.aspx</excludePath>
     <excludePath>http://localhost/WebApplication2/HiddenFolder</excludePath>
    </excludePathList>
    -->
      <excludePathList>
        <excludePath>http://timbertech.strata-g.com/App_Code</excludePath>
        <excludePath>http://timbertech.strata-g.com/App_Data</excludePath>
        <excludePath>http://timbertech.strata-g.com/Bin</excludePath>
        <excludePath>http://timbertech.strata-g.com/UltimateSearchInclude</excludePath>
        <excludePath>http://timbertech.strata-g.com/UltimateSpellInclude</excludePath>
        <excludePath>http://timbertech.strata-g.com/images</excludePath>
      </excludePathList>

      <!-- You can exclude a portion of your pages in three different ways:
        1. Use UltimateSearch_IgnoreBegin and UltimateSearch_IgnoreEnd tags
    to exclude everything between these tags from indexing.
        2. Use UltimateSearch_IgnoreTextBegin and UltimateSearch_IgnoreTextEnd tags
    to exclude only the text between these tags from indexing, while following the links.
        3. Use UltimateSearch_IgnoreLinksBegin and UltimateSearch_IgnoreLinksEnd tags
    to exclude only the links between these tags from indexing, while indexing the text.

        See how you can define these ignore tags below:
    -->

      <!-- UltimateSearch_IgnoreBegin -->
      <!-- Everything here will be ignored -->
      <!-- UltimateSearch_IgnoreEnd -->

      <!-- UltimateSearch_IgnoreTextBegin -->
      <!-- Text here will be ignored, but links will be followed -->
      <!-- UltimateSearch_IgnoreTextEnd -->

      <!-- UltimateSearch_IgnoreLinksBegin -->
      <!-- Links here will be ignored, but text will be indexed -->
      <!-- UltimateSearch_IgnoreLinksEnd -->

      <!-- Only the files with the following extensions will be scanned.
    Note that these files must be of text/html type so that they can be crawled
    properly. For non text/html file types you will need to use IFilters
    as explained in the ifilterList and ifilterMapPathList elements below.
    -->
      <includeFileTypeList>
        <includeFileType>asp</includeFileType>
        <includeFileType>aspx</includeFileType>
        <includeFileType>asmx</includeFileType>
        <includeFileType>mspx</includeFileType>
        <includeFileType>htm</includeFileType>
        <includeFileType>html</includeFileType>
        <includeFileType>txt</includeFileType>
      </includeFileTypeList>

      <!-- IFilters are used to open and parse the non text/html file types such as
    pdf, doc, xls, ppt, etc. that are not in the default includeFileTypeList above.
    You need to install the specific IFilter for each file type. You may visit
    http://www.ifilter.org to download the necessary IFilters for free.
        Note that you don't need to install an IFilter for doc, xls, ppt since
    they already exist on Windows server. You only need to add the file extensions
    here in separate ifilter elements.
        Also note that you need to set mapPathList below since it requires
    a physical path or UNC in order to open these file types.
    They can't be crawled as text/html files. So they have to reside on your local network.
        For example, you may have a list as below:
    <ifilterList>
     <ifilter>doc</ifilter>
     <ifilter>xls</ifilter>
     <ifilter>ppt</ifilter>
    </ifilterList>
    -->
      <ifilterList>
        <ifilter></ifilter>
      </ifilterList>

      <!-- Virtual to physical path mappings must be provided if you use scanDirectoryList or ifilterList.
        For example, you may have a list as below:
    <mapPathList>
        <mapPath>
            <virtualPath>http://localhost/WebApplication1</virtualPath>
            <physicalPath>c:\inetpub\wwwroot\WebApplication1</physicalPath>
        </mapPath>
        <mapPath>
            <virtualPath>http://server2.mywebsite.com/WebApplication2</virtualPath>
            <physicalPath>\\server2\d$\inetpub\wwwroot\WebApplication2</physicalPath>
        </mapPath>
    </mapPathList>
    -->
      <mapPathList>
        <mapPath>
          <virtualPath></virtualPath>
          <physicalPath></physicalPath>
        </mapPath>
      </mapPathList>

      <!-- If you don't have full permission on the production (deployment) machine
     to save index file, update config file, write event and search log files, etc.
     you will need to create your index file on your development machine, and then
     copy it onto your production machine.
        First you have to provide the following "devProdMapPathList" so that the generated
     index file updates the urls to point to the actual production machine instead of
     your development machine. After copying the new index file onto the remote machine
     you will also need to update the config file on that machine to set
     "saveIndex", "saveEventLog", and "saveSearchLog" to "false" since you're not
     allowed to write onto that machine.
        Note that this feature requires the development and production machine need to be
     on a compatible environment (same .NET Framework version, operating system, etc.)
     in order for the serialization/deserialization of index file to work properly.
        For example, you may have a list as below:
    <devProdMapPathList>
     <devProdMapPath>
      <devPath>localhost</devPath>
      <prodPath>www.mydomain.com</prodPath>
     </devProdMapPath>
     <devProdMapPath>
      <devPath>myVirtualDir</devPath>
      <prodPath>subdomain.mydomain.com</prodPath>
     </devProdMapPath>
    </devProdMapPathList>
    -->
      <devProdMapPathList>
        <devProdMapPath>
          <devPath>timbertech.strata-g.com</devPath>
          <prodPath>www.timbertech.com</prodPath>
        </devProdMapPath>
      </devProdMapPathList>

      <!-- Following words will not be indexed.
     No need to list words that are shorter than
     "minWordLength" specified in this configuration file. -->
      <stopWordList>
        <stopWord>as</stopWord>
      </stopWordList>

      <!-- Files under the UltimateSearchInclude directory.
     If you want you can move these files to another directory, and change these settings.
    -->
      <add key="indexFile" value="~/UltimateSearchInclude/UltimateSearch.index" />
      <add key="eventLogFile" value="~/UltimateSearchInclude/UltimateSearch.event.log" />
      <add key="searchLogFile" value="~/UltimateSearchInclude/UltimateSearch.search.log" />
      <add key="outputCssFile" value="~/UltimateSearchInclude/UltimateSearch.output.css" />
      <add key="suggestCssFile" value="~/UltimateSearchInclude/UltimateSearch.suggest.css" />
      <add key="suggestScriptFile" value="~/UltimateSearchInclude/UltimateSearch.suggest.js" />
      <add key="suggestWebPage" value="~/UltimateSearchInclude/UltimateSearch.suggest.aspx" />
      <add key="adminWebPage" value="~/UltimateSearchInclude/UltimateSearch.admin.aspx" />

      <!-- If you don't have write permission on your production server
     you may set these flags to false.
    -->
      <add key="saveIndex" value="true" />
      <add key="saveEventLog" value="true" />
      <add key="saveSearchLog" value="true" />

      <!-- You may visit http://www.robotstxt.org/wc/exclusion-admin.html
     to get more familiar with the robots.txt file and meta tags.
    -->
      <add key="useRobotsFile" value="false" />
      <add key="useRobotsMeta" value="false" />

      <!-- If you want to keep the querystrings as part of the indexed urls
     you should set this flag to false.
    -->
      <add key="removeQueryString" value="false" />

      <!-- If you set this flag to false your index file size will be minimized,
     but you won't be able to do partial match search and you cannot display
     text snippets in search results.
    -->
      <add key="saveTextWithIndex" value="true" />

      <!-- If you set this flag to true indexed urls will be case-sensitive,
     i.e. search results may show both http://www.mydomain.com and
     http://www.MyDomain.com if both links exist on your pages.
     This feature is especially useful if the values in querystrings
     need to be case-sensitive.
    -->
      <add key="urlCaseSensitive" value="false" />

      <!-- Index file will not expire by default. If you set it to a non-zero value
     it expires after the specified number of days, and reindexes automatically
     during the next search operation.
    -->
      <add key="reindexFrequencyInDays" value="5" />

      <!-- Maximum number of pages allowed to be indexed. There is no limitation on this setting.
     You can set it to a larger number if you have enough memory to support.
    -->
      <add key="maxPageCount" value="5000" />

      <!-- Maximum number of characters to be read from each page. There is no limitation on this setting.
     You can set it to a greater number if your pages are too big and you want to index
     all page content.
        Note that this value needs to be greater than the number of characters displayed
     on a page because of the HTML tags and hidden text in the source of the page.
     In other words you should take into account the actual number of characters
     that you see when you make a view source on a page.
    -->
      <add key="maxPageLength" value="1000000" />

      <!-- Minimum number of characters allowed in a word to be indexed.
     Words with less number of characters won't be indexed.
    -->
      <add key="minWordLength" value="2" />

      <!-- Maximum number of characters allowed in a word to be indexed.
     Words with more number of characters won't be indexed.
    -->
      <add key="maxWordLength" value="30" />

      <!-- Score page content differently based on its location.
     If you set score to 0 (zero) then the words in that portion won't be indexed at all.
     For example, you may set scoreUrl to 0 if you don't want the words in urls to be indexed.
    -->
      <add key="scoreUrl" value="16" />
      <add key="scoreTitle" value="8" />
      <add key="scoreKeywords" value="4" />
      <add key="scoreDescription" value="2" />
      <add key="scoreText" value="1" />

      <!-- Proxy settings.
    -->
      <add key="useDefaultProxy" value="true" />
      <add key="proxyAddress" value="" />
      <add key="proxyUsername" value="" />
      <add key="proxyPassword" value="" />
      <add key="proxyDomain" value="" />

      <!-- Network credentials.
    -->
      <add key="useDefaultCredentials" value="true" />
      <add key="networkUsername" value="" />
      <add key="networkPassword" value="" />
      <add key="networkDomain" value="" />

    </appSettings>
  </configuration>
</ultimateSearch>


  
  22 Jan 2007, 8:20 PM
crypton is not online. Last active: 1/25/2007 9:10:03 PM crypton

Top 10 Posts
Joined on 11-26-2006
Posts 13
Re: blank index file created
I'll also mention, the same thing happens on dev (localhost) and stage, which is the file I supplied.
  
  22 Jan 2007, 8:49 PM
Karamasoft is not online. Last active: 5/8/2018 10:36:45 AM Karamasoft

Top 10 Posts
Joined on 09-05-2004
Posts 6,820
Re: blank index file created
Have you check the admin page? Please open UltimateSearch.admin.aspx in IE and see the indexed pages and words. You may try setting useDeafultProxy to false in the config file.
  
  22 Jan 2007, 9:04 PM
Karamasoft is not online. Last active: 5/8/2018 10:36:45 AM Karamasoft

Top 10 Posts
Joined on 09-05-2004
Posts 6,820
Re: blank index file created
We tried the same config by setting scanUrl to http://timbertech.strata-g.com in our test environment, and it worked fine. It crawled 78 pages, and indexed 69 of them successfully, vreating an index file of 641KB. We sent you the zip file that contains the UltimateSearchInclude directory for this test by email.
  
  23 Jan 2007, 7:21 AM
crypton is not online. Last active: 1/25/2007 9:10:03 PM crypton

Top 10 Posts
Joined on 11-26-2006
Posts 13
Re: blank index file created
Setting useDefaultProxy to false seems to have fixed this.  What exactly is that setting?  I had assumed that if I didn't use the default I was required to set the proxy to use, which doesn't seem to be the case.

Thanks for the quick response by the way!

  
  23 Jan 2007, 2:27 PM
Karamasoft is not online. Last active: 5/8/2018 10:36:45 AM Karamasoft

Top 10 Posts
Joined on 09-05-2004
Posts 6,820
Re: blank index file created
There is a proxy property of the HttpWebRequest class, and it's calling the GetDefaultProxy method when it's true in order to use the default proxy. However, this has become obsolete in .NET 2.0. If this change made the difference you're probably using .NET 1.1.
  
  25 Jan 2007, 12:34 PM
crypton is not online. Last active: 1/25/2007 9:10:03 PM crypton

Top 10 Posts
Joined on 11-26-2006
Posts 13
Re: blank index file created
No, I'm definitely using .NET 2.0, and for some reason right after I changed that to false it worked, but now it's not working anymore, I'm having the same problem.  Even when I upload an index file I created on another server it doesn't seem to take.  The index file is over 600 k and works locally, but not when I upload it to the server.  Any other ideas?

-Nathan

  
  25 Jan 2007, 12:57 PM
Karamasoft is not online. Last active: 5/8/2018 10:36:45 AM Karamasoft

Top 10 Posts
Joined on 09-05-2004
Posts 6,820
Re: blank index file created

An index created on one machine may not work on another since the serialize/deserialize methods of .NET are somehow bound to the environment (OS, .NET, IIS, etc.) the file is created. however, if you want to try that route you shouod set the save flags (saveIndex, saveEventLog, saveSearchLog) to false in the config file on the destination machine before copying the index file over. you may also need to use devProdMapPathList if you have a different directory structure in case you used iFilters or scanDirectoryList that are related to mapPathList for local directory structure.


  
  25 Jan 2007, 1:04 PM
crypton is not online. Last active: 1/25/2007 9:10:03 PM crypton

Top 10 Posts
Joined on 11-26-2006
Posts 13
Re: blank index file created
Thanks for that super fast response!

Setting those to false did allow me to use that index.  However I'd still like to figure out why the index can't be built on the server, the content will constantly be changing in the production environmetn and the index needs to be rebuilt automatically every few days, so I can't count on this method in the future.  Is there anything else I might try?  I can provide anything you need in the way of config settings, file structure, permissions, etc.

Thanks a bunch!

  
  25 Jan 2007, 1:25 PM
Karamasoft is not online. Last active: 5/8/2018 10:36:45 AM Karamasoft

Top 10 Posts
Joined on 09-05-2004
Posts 6,820
Re: blank index file created
We wish there was a way to figure out what's causing this on your prod enviroment. We can't reproduce it on our test machines as you can't on your dev box. We are using the same control on our website (Windows 2003, .NET 2.0, IIS 6.0) without any problems. Have you tried the latest version that we released on 1/18/2007?
  
  25 Jan 2007, 1:37 PM
crypton is not online. Last active: 1/25/2007 9:10:03 PM crypton

Top 10 Posts
Joined on 11-26-2006
Posts 13
Re: blank index file created
If I install that demo version will it overwrite the licensed version I have?  We purchased an enterprise UI Suite.  I'm sure you've answered this before, just point me to the post/page.

Thanks!

-Nathan

  
  25 Jan 2007, 1:58 PM
Karamasoft is not online. Last active: 5/8/2018 10:36:45 AM Karamasoft

Top 10 Posts
Joined on 09-05-2004
Posts 6,820
Re: blank index file created
First of all take a backup of your existing work. Then uninstall the existing version, and install the new one. Replace UltimateSearch.dll and UltimateSpell.dll with the new ones. You should also replace the UltimateSearchInclude and UltimateSpellInclude directories, but keep in mind to update the UltimateSearch.config file with your original settings. Then rebuild your app and delete the index file in order to reindex.
  
  25 Jan 2007, 3:36 PM
crypton is not online. Last active: 1/25/2007 9:10:03 PM crypton

Top 10 Posts
Joined on 11-26-2006
Posts 13
Re: blank index file created
Perhaps I should have clarified, this doesn't work on dev or stage, it only works if I point to stage from dev and run it on dev.  If I try to index the site on my local machine I get the same result, a blank index file.  What's most perplexing is I've indexed another project locally with this without incident.  No errors are written to the log or anything, it just doesn't crawl.  I'm thinking maybe it's a dns/proxy issue, but I have no idea how to debug this.  Would UltimateSearch take a different network path than just typing it in my browser?  I'm at a complete loss here, but I really need to get this working.  Should the following setting work locally using vs2005?  Going to this url in the browser works.

<scanUrlList>

<scanUrl>http://localhost:3965/Web.UI/Default.aspx</scanUrl>

</scanUrlList>


  
  25 Jan 2007, 3:48 PM
Karamasoft is not online. Last active: 5/8/2018 10:36:45 AM Karamasoft

Top 10 Posts
Joined on 09-05-2004
Posts 6,820
Re: blank index file created
Putting a url will not work because that virtual directory is temporarily created by VS2005 at runtime only. When you close VS it disappears. You may need to check with your system admin to see whether there is any dns/proxy issues. If there is any you may update the proxy settings accordingly.

Have you tested our sample apps in C# and VB? Do they crawl and index properly? If so, you either use one of them as your template or create a virtual directory for your app just like our apps (if you check IIS you should see the virtual directories for our sample apps).

  
 Page 1 of 1 (14 items)
Karamasoft Support Forums » General Discussions » UltimateSearch » Re: blank index file created

You can add attachments
You can post new topics
You can reply to topics
You can delete your posts
You can edit your posts
You can create polls
You can vote in polls
Forum statistics are enabled
Forum is unmoderated

© 2002-2021 Karamasoft LLC. All rights reserved.