How to install a SharePoint 2010 Complete (Dev) Server without AD


This post details how you do install a SharePoint 2010 as a “complete” install without an AD, which is very useful to me as a development server (with or without Visual Studio).

This applies to both virtual and physical machines but I always work with VMs because traditionally SharePoint dev environments need to be re-installed once in a while and that’s easier with VMs.

Why would you want this?

  • A development server with all components is likely to resemble your test/QA/production environment a lot more than the alternative standalone install
  • A server with a local install of SharePoint with non-AD accounts can be run with or without an AD domain – you can even run the VM as a domain server disconnected from the actual domain e.g. at home or the commute
    • Alternative 1: Make your server an AD server but that changes all sorts of stuff with user management and will definitely not resemble any of your production servers
    • Alternative 2: Create two VMs one being the AD one being the actual development server connected on the same VLAN and waste a lot of resources for (almost) nothing.
  • To eliminate any need for rogue AD servers on your network that some develop accidently connected directly to the network and running DHCP, DNS etc. Don’t trust your developers or external consultants to care about your network!
  • I want a full SQL Server!

Why would you not want this?

  • This is not a supported development environment from Microsoft – they support installing a so-called standalone development environment without any of the frills. It’s easier and it’s officially supported. It’s even doable on Windows 7.
    • Why anyone would want to develop SharePoint on a Windows 7 machine is beyond me, the runtime environment for your code will always be a server 2008 so why not develop and test it directly on such a box? Surely you develop only in VMs so that you are able to create a clean dev environment easily once in a while…

How to

The procedure is fairly simple except for the final steps. Note that you can (and should) use whatever tools you can to help you out, I’ll point at the promising AutoSPInstaller at CodePlex.

Procedure:

  1. Create / install a Server 2008 (R2) 64 bit with
    1. Visual Studio 2010
    2. SQL Server 2008 (use a local user as service account)
    3. … and whatever other tools you are fond of …
    4. (Remember to sysprep/snapshot it at this stage)
  2. Install SharePoint 2010 with all prerequisites
    1. Scripted or not – do not run Config Wizard yet (It would result in “Local accounts should only be used in standalone mode” error)
  3. Create the farm by (trick #1)
    1. Start the SharePoint PowerShell
    2. Create a local service account
    3. Create a farm by running the “New-SPConfigurationDatabase” cmdlet and supply parameters for the service account, DB name, DB server and passphrase (thanks to Neil ‘The Doc’ Hodgkinson for that)
    4. After it finishes start the Config Wizard (interactive or not) and configure your server with all components
  4. Configure the farm services as you like
    1. I usually just use the wizard in Central Admin to configure all the Service Applications with some fairly useful values it works well enough
  5. Enterprise Search doesn’t work to fix it see below… (trick #2)

The Trouble with Search

Search will fail with a number of errors and in the search administration the Query Component will remain stuck in the initializing state:

The other bunch of event log errors etc. is listed at the end of this post for the benefit of Google.

As far as I can conjecture the problem is that the timer service is trying to setup a network share for every query component where the crawlers can dump their data. It is trying to setup that share with a domain account that happens to be a local user instead in this case and fails with either an “Access Denied” error or a “System.ArgumentException: The SDDL string contains an invalid sid or a sid that cannot be translated”.

The share name it’s trying to use is the same as the query role, i.e. “Guid-query-0″ pointing to (if using default locations) “C:\program files\Microsoft Office Servers\14.0\Data\Office Server\Applications” with change permissions for the “WSS_WPG” group.

Unfortunately it does not help to just create the share for it apparently the query components insist on waiting for the timer job to complete successfully. L

The Search Fix

The fix is fairly simple and almost completed by Gary Lapointe whom I owe great thanks for doing most of the hard work in his post on scripting the Enterprise Search installation and the comments below his post (thanks to Marco van Wieren).

The fix is simply to create and configure all the enterprise search components from PowerShell as it allows you to set a few more options, specifically the share name for the query components so that you are then allowed to create them yourself.

The script was originally made for configuring search components across an entire farm and therefore a bit more complicated than it strictly has to be. I left it in there while adding support for single server install as well. Gary’s script was made for beta 2 and I’ve fixed a few simple errors/typos, corrected the few API changes between beta 2 and RTM and finally added the share name support.

The script is quite long a not suitable for pasting into a blog – download instead.

The script needs a configuration file with something like this:

<Services>
    <EnterpriseSearchService ContactEmail="no-reply@SharePointDev1.com"
                             ConnectionTimeout="60"
                             AcknowledgementTimeout="60"
                             ProxyType="Default"
                             IgnoreSSLWarnings="false"
                             InternetIdentity="Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 6.0 Robot)"
                             IndexLocation="C:\Program Files\Microsoft Office Servers\14.0\Data\Office Server\Applications"
                             PerformanceLevel="PartlyReduced"
                             Account="localhost\saservice"
                             ShareName="SearchShare">

        <EnterpriseSearchServiceApplications>
            <EnterpriseSearchServiceApplication Name="Enterprise Search Service Application"
                                                DatabaseServer="localhost"
                                                DatabaseName="SharePoint_Search"
                                                FailoverDatabaseServer=""
                                                Partitioned="false"
                                                Partitions="1"
                                                SearchServiceApplicationType="Regular">
                <ApplicationPool Name="SharePoint Enterprise Search Application Pool" Account="localhost\saservice" />
                <CrawlServers>
                    <Server Name="localhost" />
                </CrawlServers>
                <QueryServers>
                    <Server Name="localhost" />
                </QueryServers>
                <SearchQueryAndSiteSettingsServers>
                    <Server Name="localhost" />
                </SearchQueryAndSiteSettingsServers>
                <AdminComponent>
                    <Server Name="localhost" />
                    <ApplicationPool Name="SharePoint Enterprise Search Application Pool" Account="localhost\saservice" />
                </AdminComponent>
                <Proxy Name="Enterprise Search Service Application Proxy" Partitioned="false">
                    <ProxyGroup Name="Default" />
                </Proxy>
            </EnterpriseSearchServiceApplication>
        </EnterpriseSearchServiceApplications>
    </EnterpriseSearchService>
</Services>

Remarks:

  • I replace “localhost” with the actual computer name in the script
  • The Share Name (here “SearchShare”) will be created by the script as well, so whatever you call it doesn’t matter
  • The config file shown can be reused on every machine provided that the local service account “saservice” has been created before

To continue and complete step 5 in the procedure above (sorry for the numbering wordpress is messing up the html):

  1. Start PowerShell shell (I will load the SharePoint snapin if it’s not a SharePoint Management Shell)
    1. Load the “SetupEnterpriseSearch.ps1″ script (just drag the file into the shell and execute) which will define the required functions
    2. Execute “Start-EnterpriseSearch “<path>\searchconfig.xml””
    3. Wait for a few minutes and watch for errors
  2. Go to the Search Administration and verify that your new search topology works
    1. It should look something like this:

    2. If you configured search in step 4 you will have two
    3. If you have two you can safely go back to “Manage service applications” and delete the one named “Search Service Application 1″ (and associated databases) – the one created by the script is “Enterprise Search Service Application”
  3. Try it! Go to a local SharePoint site and search for something
    1. Before the search would return a server error 500 so anything else than that can be considered a success
    2. I like to add a few documents and have them show up in the search before I call it a success…

Caveats / Fast Search

Don’t know if it fair to call it a caveat however only the Enterprise Search is demonstrated here, the Fast Search behaves similarly in respect to the “share trouble” and will probably need the same fix as the enterprise search. I’ve not found the time or need to poke around with that just yet, but it should be doable in less than a day given the foundation above (for someone skilled in SharePoint and powershell).

Conclusions

It works; I’ll use it from now on :-)

… and I hope the nice chaps at AutoSPInstall will include this fix in their tool.

Scared of being in unsupported land?

  • It’s only your dev server and it did move a lot closer to production that the standalone dev machine option
  • It also protected your network from rogue AD servers that might potentially kill half your network if you are unlucky

So how well is this tested? Quite well for a single server install and not at all for a farm install (not by me at least). Trust it with the former and test it yourself if you need the latter.

Observed Errors

I got a lot of different errors, here they are for the benefit of Google.

Event log entry after completing the configuration wizard in Central Administration:

Log Name: Application
Source: Microsoft-SharePoint Products-SharePoint Server Search
Date: 11-06-2010 22:20:17
Event ID: 2579
Task Category: Administration
Level: Error
Keywords:
User: SHAREPOINTDEV1\saservice
Computer: SharePointDev1
Description:
Component a61ca0ca-194f-4cf0-bb5c-8ca998178935-query-0 of search application ‘Search Service Application’ has failed to execute transition sequence ‘initialize with empty catalog’ due to the following error: System.ArgumentException: The SDDL string contains an invalid sid or a sid that cannot be translated.
Parameter name: sddlForm
at System.Security.AccessControl.RawSecurityDescriptor.BinaryFormFromSddlForm(String sddlForm)
at System.Security.AccessControl.RawSecurityDescriptor..ctor(String sddlForm)
at Microsoft.SharePoint.Win32.SPNetApi32.CreateShareSecurityDescriptor(String[] readNames, String[] changeNames, String[] fullControlNames, String& sddl)
at Microsoft.SharePoint.Win32.SPNetApi32.CreateFileShare(String name, String description, String path)
at Microsoft.SharePoint.Administration.SPServer.CreateFileShare(String name, String description, String path)
at Microsoft.Office.Server.Search.Administration.QueryComponent.CreatePropagationShare(QueryComponent component)
at Microsoft.Office.Server.Search.Administration.QueryComponent.ExecuteCurrentStage(). It is now in state Uninitialized.
Event Xml:
<Event xmlns=”http://schemas.microsoft.com/win/2004/08/events/event“>
<System>
<Provider Name=”Microsoft-SharePoint Products-SharePoint Server Search” Guid=”{C8263AFE-83A5-448C-878C-1E5F5D1C4252}” />
<EventID>2579</EventID>
<Version>14</Version>
<Level>2</Level>
<Task>14</Task>
<Opcode>0</Opcode>
<Keywords>0×4000000000000000</Keywords>
<TimeCreated SystemTime=”2010-06-11T20:20:17.723875000Z” />
<EventRecordID>3926</EventRecordID>
<Correlation ActivityID=”{B1431F7E-1D0C-4CB7-B690-F0F016447FE4}” />
<Execution ProcessID=”956? ThreadID=”3484? />
<Channel>Application</Channel>
<Computer>SharePointDev1</Computer>
<Security UserID=”S-1-5-21-452889701-636363473-2591022535-1012? />
</System>
<EventData>
<Data Name=”string0?>a61ca0ca-194f-4cf0-bb5c-8ca998178935-query-0</Data>
<Data Name=”string1?>Search Service Application</Data>
<Data Name=”string2?>initialize with empty catalog</Data>
<Data Name=”string3?>System.ArgumentException: The SDDL string contains an invalid sid or a sid that cannot be translated.
Parameter name: sddlForm
at System.Security.AccessControl.RawSecurityDescriptor.BinaryFormFromSddlForm(String sddlForm)
at System.Security.AccessControl.RawSecurityDescriptor..ctor(String sddlForm)
at Microsoft.SharePoint.Win32.SPNetApi32.CreateShareSecurityDescriptor(String[] readNames, String[] changeNames, String[] fullControlNames, String&amp; sddl)
at Microsoft.SharePoint.Win32.SPNetApi32.CreateFileShare(String name, String description, String path)
at Microsoft.SharePoint.Administration.SPServer.CreateFileShare(String name, String description, String path)
at Microsoft.Office.Server.Search.Administration.QueryComponent.CreatePropagationShare(QueryComponent component)
at Microsoft.Office.Server.Search.Administration.QueryComponent.ExecuteCurrentStage()</Data>
<Data Name=”string4?>Uninitialized</Data>
</span
</EventData>
</Event>

And from the ULS log:

06/11/2010 22:20:17.72         OWSTIMER.EXE (0x03BC)         0x0D9C        SharePoint Server Search         Administration         fea9        Critical        Component a61ca0ca-194f-4cf0-bb5c-8ca998178935-query-0 of search application ‘Search Service Application’ has failed to execute transition sequence ‘initialize with empty catalog’ due to the following error: System.ArgumentException: The SDDL string contains an invalid sid or a sid that cannot be translated. Parameter name: sddlForm at System.Security.AccessControl.RawSecurityDescriptor.BinaryFormFromSddlForm(String sddlForm) at System.Security.AccessControl.RawSecurityDescriptor..ctor(String sddlForm) at Microsoft.SharePoint.Win32.SPNetApi32.CreateShareSecurityDescriptor(String[] readNames, String[] changeNames, String[] fullControlNames, String& sddl) at Microsoft.SharePoint.Win32.SPNetApi32.CreateFileShare(String name, String description, String path) at Microsoft.S…        b1431f7e-1d0c-4cb7-b690-f0f016447fe4
06/11/2010 22:20:17.72*        OWSTIMER.EXE (0x03BC)         0x0D9C        SharePoint Server Search         Administration         fea9        Critical        …harePoint.Administration.SPServer.CreateFileShare(String name, String description, String path) at Microsoft.Office.Server.Search.Administration.QueryComponent.CreatePropagationShare(QueryComponent component) at Microsoft.Office.Server.Search.Administration.QueryComponent.ExecuteCurrentStage(). It is now in state Uninitialized.        b1431f7e-1d0c-4cb7-b690-f0f016447fe4
06/11/2010 22:20:17.72         OWSTIMER.EXE (0x03BC)         0x0D9C        SharePoint Server         Unified Logging Service         2m1i        Verbose         Adding event 2579 (category: Administration, product: SharePoint Server Search) to spam monitoring list        b1431f7e-1d0c-4cb7-b690-f0f016447fe4
06/11/2010 22:20:17.72         OWSTIMER.EXE (0x03BC)         0x0D9C        SharePoint Server Search         Administration         djs2        Medium         SearchApi (): executing SetQueryComponent(d355048f-d4fa-4f31-88b0-342b5ed48e5c, null, null, null, null, Uninitialized, Uninitialized, null, -1, Failed, null, False, null, null, False, null)        b1431f7e-1d0c-4cb7-b690-f0f016447fe4

And another event log:

Log Name: Application
Source: Microsoft-SharePoint Products-SharePoint Foundation
Date: 12-06-2010 20:40:26
Event ID: 6398
Task Category: Timer
Level: Critical
Keywords:
User: SHAREPOINTDEV1\saservice
Computer: SharePointDev1
Description:
The Execute method of job definition Microsoft.Office.Server.Search.Administration.CrawlReportJobDefinition (ID 9529aace-a679-4fc9-ab8d-325780484cf0) threw an exception. More information is included below.
The search service is not able to connect to the machine that hosts the administration component. Verify that the administration component ’3147b99c-8f3a-41e9-a08b-296f930af877' in search application ‘Enterprise Search Service Application’ is in a good state and try again.
Event Xml:
<Event xmlns=”http://schemas.microsoft.com/win/2004/08/events/event“>
<System>
<Provider Name=”Microsoft-SharePoint Products-SharePoint Foundation” Guid=”{6FB7E0CD-52E7-47DD-997A-241563931FC2}” />
<EventID>6398</EventID>
<Version>14</Version>
<Level>1</Level>
<Task>12</Task>
<Opcode>0</Opcode>
<Keywords>0×4000000000000000</Keywords>
<TimeCreated SystemTime=”2010-06-12T18:40:26.553054700Z” />
<EventRecordID>4159</EventRecordID>
<Correlation ActivityID=”{6CED0041-2038-43E3-AB79-4DEFBB4216B3}” />
<Execution ProcessID=”1324? ThreadID=”1532? />
<Channel>Application</Channel>
<Computer>SharePointDev1</Computer>
<Security UserID=”S-1-5-21-452889701-636363473-2591022535-1012? />
</System>
<EventData>
<Data Name=”string0?>Microsoft.Office.Server.Search.Administration.CrawlReportJobDefinition</Data>
<Data Name=”string1?>9529aace-a679-4fc9-ab8d-325780484cf0</Data>
<Data Name=”string2?>The search service is not able to connect to the machine that hosts the administration component. Verify that the administration component ’3147b99c-8f3a-41e9-a08b-296f930af877' in search application ‘Enterprise Search Service Application’ is in a good state and try again.</Data>
</EventData>
</Event>

And one for foundation search:

Log Name:      Application
Source:        Microsoft-SharePoint Products-SharePoint Foundation
Date:          …
Event ID:      6398
Task Category: Timer
Level:         Critical
Keywords:
User:          …
Computer:      …
Description:
The Execute method of job definition Microsoft.Office.Server.Search.Administration.QueryTopologyActivationJobDefinition (ID de8eac2b-57db-4069-896d-747ae4fb35ed) threw an exception. More information is included below.
Topology activation was aborted because of System.ArgumentException: The SDDL string contains an invalid sid or a sid that cannot be translated.
Parameter name: sddlForm
at System.Security.AccessControl.RawSecurityDescriptor.BinaryFormFromSddlForm(String sddlForm)
at System.Security.AccessControl.RawSecurityDescriptor..ctor(String sddlForm)
at Microsoft.SharePoint.Win32.SPNetApi32.CreateShareSecurityDescriptor(String[] readNames, String[] changeNames, String[] fullControlNames, String& sddl)
at Microsoft.SharePoint.Win32.SPNetApi32.CreateFileShare(String name, String description, String path)
at Microsoft.SharePoint.Administration.SPServer.CreateFileShare(String name, String description, String path)
at Microsoft.Office.Server.Search.Administration.QueryComponent.CreatePropagationShare(QueryComponent component)
at Microsoft.Office.Server.Search.Administration.QueryComponent.ExecuteCurrentStage().
Event Xml:
<Event xmlns=”http://schemas.microsoft.com/win/2004/08/events/event“>
<System>
<Provider Name=”Microsoft-SharePoint Products-SharePoint Foundation” Guid=”{6fb7e0ce-52e7-47dd-997a-241563931fc2}” />
<EventID>6398</EventID>
<Version>14</Version>
<Level>1</Level>
<Task>12</Task>
<Opcode>0</Opcode>
<Keywords>0×4000000000000000</Keywords>
<EventRecordID>10895</EventRecordID>
<Correlation ActivityID=”{6E239D20-A2CD-45B4-AC87-4477A82558BB}” />
<Execution ProcessID=”2016? ThreadID=”2288? />
<Channel>Application</Channel>
<Computer>id1314</Computer>
<Security UserID=”S-1-5-21-30024279817-590149927-1659320300-1003? />
</System>
<EventData>
<Data Name=”string0?>Microsoft.Office.Server.Search.Administration.QueryTopologyActivationJobDefinition</Data>
<Data Name=”string1?>de8eac2b-57db-4069-896d-747ae4fb35ed</Data>
<Data Name=”string2?>Topology activation was aborted because of System.ArgumentException: The SDDL string contains an invalid sid or a sid that cannot be translated.
Parameter name: sddlForm
at System.Security.AccessControl.RawSecurityDescriptor.BinaryFormFromSddlForm(String sddlForm)
at System.Security.AccessControl.RawSecurityDescriptor..ctor(String sddlForm)
at Microsoft.SharePoint.Win32.SPNetApi32.CreateShareSecurityDescriptor(String[] readNames, String[] changeNames, String[] fullControlNames, String&amp; sddl)
at Microsoft.SharePoint.Win32.SPNetApi32.CreateFileShare(String name, String description, String path)
at Microsoft.SharePoint.Administration.SPServer.CreateFileShare(String name, String description, String path)
at Microsoft.Office.Server.Search.Administration.QueryComponent.CreatePropagationShare(QueryComponent component)
at Microsoft.Office.Server.Search.Administration.QueryComponent.ExecuteCurrentStage().</Data>
</EventData>
</Event>

And finally from Gary’s blog (Marco van Wieren):

Component: 3b609311-67da-4df8-8c12-e597e9228dd3-crawl-0
Details:
The system cannot find the file specified. 0x80070002Propagation for search application 3b609311-67da-4df8-8c12-e597e9228dd3-crawl-0: failed to communicate with query server 3b609311-67da-4df8-8c12-e597e9228dd3-query-0.

Make the Search Work for You


The SharePoint 2007 search engine (MOSS) is head and shoulders above the one found in SPS 2003 and it’s a breeze to setup. Right? Yes and maybe.

In this iteration of SharePoint I consider the search engine to be very good and if you spend the time to configure it properly it will work great for your site. But nothing comes for free and I’ve collected the few issues I ran into configuring the search on my farms which include several local SharePoint collaboration and publishing sites, people search, and external websites with and without SSL.

I’m not too fond of the search webparts and their configuration options, but consider them out of scope for this entry.

Don’t consider this to be an exhaustive guide to search setup, there’s plenty of areas that I don’t cover. I didn’t need them and chances are you don’t, e.g. crawl impact rules etc.

Setting up the Indexer Role

I’ll recommend setting up the index server as

  • Behind the firewall so no users can access it directly
  • Hosting the “Windows SharePoint Web Application”, i.e. the front-end for all your sites
  • The indexer should use a particular server to index all (local) content, namely itself (set this on the Central Administration / Operations / Services on Server / Office SharePoint Server Search Service page)

That way your indexing does not affect your front-end web servers significantly, only indirectly as you’re still querying the same database.

It works great if you know and accept the following two caveats:

  • The timer service will execute a job that tries to modify your host file (%SystemDrive%\windows\system32\drivers\etc\hosts) which is a rare and alarming thing for any application to do. It will add the default access mapping for all your local site to the host file pointing to one of the local IP addresses (so be sure that your web sites are responding to all IP addresses in your IIS manager, or at least the one SharePoint chooses – it will not use 127.0.0.1).

    By default no web application is allowed to do this, so you’ll have to allow it explicitly (have a look in the SharePoint and/or Windows Application log for this error)! Grant you Central Admin service user write/modify access to the “%SystemDrive%\windows\system32\drivers\etc\hosts” folder to fix the issue (note: Surprisingly it’s not the service user running your timer service).

    If you are just a little bit paranoid you can remove the access again afterwards. I choose not to as subsequent changes to access mappings would otherwise require me to fiddle with this again and again.

    One nice thing about this scheme is that any SSL certificates that you utilize on your site will be valid as the hostname will match the certificate hostname, provided that you had a valid one in the first place – just remember to install your certificates on the index server as well.

  • The “Check services enabled in this farm” feature might now report that “Web Front End servers in the server farm are not consistent in the services they run”. Technically your index server is now also a front-end server (though users can’t access it) and therefore things look a bit fishy to SharePoint. Obviously this warning is a false positive and can safely be ignored.

Finally remember to install iFilters for any files that are not supported out of the box, e.g. Pdf. I generally install these filters on all servers to ensure that they will work as expected the day I decide to shuffle the server roles a bit.

Note: You should also add icons for these extra file types in the docicon.xml file. I’ll not dive into this other people has done so (here).

Accessing Search Configuration: Fixing 403 forbidden (on /ssp/admin/_layouts/searchsspsettings.aspx)

This is a rare issue that you run into if you use the same farm topology as me, if you haven’t seen this error then jump happily to next section.

I had to open a premier support case for this issue and spend a number of hours looking at log files and having live sessions with the nice MS guys. At the end of the day it was a security issue that you’ll encounter in the following scenario:

  • Your farm has at least two servers
  • You access (try at least) the SSP search configuration page on a server that does not hold the index role
  • You have different service users for the SSP site and your Central Administration site (this is best practice that should always be followed). Note that the service user for your Central Administration site is the same that is being used for DB access

The cause of the error is that the Search Settings page executes some web service calls to query the index server on indexing status – if the page is hosted on the same server as the index server it will just use the OM and you’ll have no problems at all. I’m talking about all the values that are listed as “Retrieving…” when you first enter the page that rapidly changes to something useful.

The page queries a web service hosted on the “Office Server Web Services” web application on your index server that is restricted to administrative users. As the SSP site is running as a different service user than the central administration (and definitely not as any kind of domain or local admin) that call fails. The solution is simply to add your SSP service user to the local SharePoint admin group, WSS_ADMIN_WPG and WSS_RESTRICTED_WPG on the index server.

Setting up SharePoint Content Sources

Any new Shared Services Provider (SSP) that you associate with your farm will come with a default content source named “Local Office SharePoint Server sites” that you can happily use for most of your needs.

Whether you want several content sources for your various sites or just group all your sites into this one doesn’t matter much. It’s basically a question of how much granularity you need in controlling the crawl schedules and the ability to start/stop/reset each site. When you are troubleshooting or testing your search settings it is convenient to split it up in several parts other than that I don’t see the big need. It’s very easy to change later on when you change your mind one way or the other.

What you need to do here is to add the root address of each of your site collections. Be sure to use DNS names that are also part of the access mappings for the sites and use http:// and https:// as appropriate. If you use SSL sites I’ll recommend that you let the indexer crawl the sites through SSL instead of adding an “internal access mapping” without SSL, i.e. use the same DNS as your users to simplify things as much as possible.

Remember that the crawl user must have at least read rights on your sites to make them searchable. Don’t worry that your crawl account has access to all areas of the sites, the search result will be trimmed so users only get search results from the subset of items/pages/sites that they can access.

Searching MySites and People Search

If you want to be able to search documents on the MySites you also need to add the MySite web application to the list of start addresses – just the root, not the managed path, e.g. use https://mysites.company.com not https://mysites.company.com/personal.

To make people search work you need to add a second entry for the MySites web application with the protocol sps3:// or sps3s:// for SSL, e.g.

  • If you host your MySites on “https://mysites.company.com/personal/user1” using SSL you should add “sps3s://mysites.company.com/” as a start address
  • If you host your MySites on “http://mysites.company.com/personal/user1” add “sps3://mysites.company.com/”

Finally, the very last step, is to grant the crawl user read permissions for the MySites. On the SSP main administration site, go to “Personalization services permissions” and add your crawl user with the “Use personal features”. You might already have granted the rights for this through other groups, e.g. if you enable all users to use and create MySites by granting “Use personal features” and “Create personal site” rights to “NT Authority\Authenticated Users”.

Note: On the Search Settings page the “Default content access account” is what I call “the crawl user”.

Handling SSL and Certificate Errors

If you use SSL for some of your sites chances are that you are using some self-issued certificates for some of your dev and test environments that are not valid. Or perhaps you use the real certificates but with a different DNS name than specified in the cert.

Any of these errors will cause the indexer to stop crawling the site.

To ignore these errors go to Central Administration / Applications / Manage search service / Farm-level search settings and check the “Ignore SSL certificate name warnings”.

Note: If you use a self-issued certificate I’m not sure whether or not you need to add it to the list of certificates that the server trusts, regardless of this switch.

Setting up External Content Sources

To enable search of external non-SharePoint sources is generally fairly easy, however I found a lot of special cases that needed some tweaking.

First off, understand that the indexer is not a browser:

  • It does not execute flash content
  • It does not store cookies
  • It does not execute JavaScript.
  • It follows some index/robot rules that your browser don’t – usually nothing to worry about and indicates that the people creating the site actually had some thoughts about search engines and made an effort to support it

So you got to test whether or not the site in question is accessible without any of these things. A surprising amount of websites render their menu through pure JavaScript and the crawler might not be able to access much more than the front page and a few links from there.

It’s simple to test. Start your browser, disable JavaScript, cookies and flash and navigate to the start address that you plan to use for the content source – usually the front page. Can you access the entire site? If not you might be better off to search for a sitemap page and use that as your start address instead.

So with that out of the way, happily create a new content source with type “Web Sites” and add all of your start addresses to it. Now you can start the indexing and afterwards have a look at the crawl log (see below) and then go back and forth and tweak the stuff you need.

Crawl Rules might be needed

If you crawl sites that use query parameters to distinguish pages you need to add a crawl rule to have SharePoint crawl all those pages, i.e. http://wiredispatch.com/news/?id=36938 . I guess about half the CMS systems out there uses such a scheme, so you very likely end up here.

The solution is easy:

  1. Go Search Settings / Crawl Rule and click “New Crawl Rule”
  2. Enter the pattern for which this rule should be effective. You can just enter http://SiteToBeCrawled.com/* or if you just want a single rule for all those sites out there use http://* (I do that). The generic approach works fairly well, just be careful of sites that automatically add dynamic session/caching query parameters. In that case the indexer will be very confused and recreate the full index of that site every time it crawls. In that case you can add another higher priority rule to limit the behavior for that site.
  3. Enable “Crawl complex URLs (URLs that contain a question mark (?))”

Additional File types might be needed

Some sites uses non-default file extensions for their URLs that SharePoint won’t crawl. I’ve only found one example of this where one of my external sites used xxx.exe at the end of the URL with some query parameters attached. I guess it’s some kind of CGI – who uses that these days anyway?

I suppose likely candidates for unsupported-out-of-the-box extensions would be: exe, dll, cgi, bin, etc.

Don’t go about adding file types to the crawler unless you have to. First add a crawl rule (if needed), perform a full indexing of the content source and use the log to verify that the pages are still not being crawled.

Then go to Search Settings / File types and add a new one. Just write the extension without the dot, e.g. “exe”.

Note: This is hardly a security risk in my mind as you are not enabling users to upload exe files to any of your SharePoint sites, you just let the crawler include them in the index.

Troubleshoot the Crawl Logs

After all the setup steps you need to have a long hard look at the crawl logs – or more like – you have been doing this all along and are trying to fix the problems spotted there (which turned you towards this blog).

It’s fairly easy. Go to Search Settings / Crawl logs, which will give you a view of the crawl log grouped by start address (regardless of content source).

Look for (and drill-down in):

  • Any start address with none or only a very limited number of “successfully crawled” documents.
  • Any start address with a number of errors.
  • Warnings are to be expected. Whenever a document/page is no longer found at the start address (i.e. removed) it will be flagged with a warning. When you fiddle around with the search settings there’s bound to be a left-over from your fumbling ;-) Look at the time stamps to verify that it’s nothing current.

Follow appropriate steps above to solve the problem or use a crawl rule to exclude parts of the content if needed.

One annoying error I had several times in my log was “The crawler could not communicate with the server. Check that the server is available and that the firewall access is configured correctly”. That is a rather generic error message and I found out that it generally covers problems communicating with the server, i.e. the target server is responding with a http response code 5xx “internal server error” or not at all.

Quite often if I hit that particular page I would see the error. For instance in one site an email contact form was failing because it used the referral header that wasn’t given by the indexer, or if you hit it directly with a browser. If you followed the links on the site it worked fine… Guess that one went through their tests ;-)

If you’re having this problem for local SharePoint sites (and you verified that the page worked) remember to test it on the index server, not just the front-end, as the index server is using itself for indexing. You might have forgotten to deploy some resources or about a billion other things. Enable stack trace on the index server (fiddle with web config) and fix the actual problem afterwards.

Final Impressions

Baring the issue with forbidden access to the Search Settings page I must say that I’m pleasantly surprised at the versatility and ease of setup.

One small complaint would be the people search that requires you to use a strange custom protocol (yes I know there are others too, e.g. “sts3://” and “sts3s://” that I haven’t covered).

Other stuff that has been left out:

  • Any mentioning of the search role. You can host it on the index server (and avoid propagation issues) or on the front-end servers. I generally put it on the index server as I expect that server to be less busy than the rest
  • Any recommendation of crawl schedules; you should use whatever you find appropriate, but please ensure that you do a full crawl once in a while, don’t trust incremental update with your life ;-) Have an eye on CPU utilization to help decide this and a dozen other things like crawl impact rules, indexing performance settings etc

As with all things, once you know how, it’s easy to make it work ;-)

Tool for Deployment of SSP search settings


I recently had the dubious honor to transfer search settings from one SSP to another. Going through every managed property, content source, search scope etc. just wasn’t something I looked forward to. On top of that – in the near future I will have to do it again when we deploy another SharePoint site to production.

Searching the net I found a tool created by Sahil Malik that could create the managed properties for me (link), provided that you manually merged some xml dumps of crawled and managed properties. Thanks Sahil for that great start – I needed something more therefore this post.

I modified Sahils code to suit my additional needs. It took me two full days to complete and test the code and in the end I guess that about 30% of the code base is Sahils original code.

I now have a tool that can import/export content sources, crawled properties, managed properties and (shared) search scopes – and it works!

I designed the import procedures so that they create, or synchronize, the destination SSP search settings with the xml files given, but do not delete anything not in those files, i.e. it will synchronize/create all the managed properties in the input xml file but not tough the existing ones not mentioned in the input file.

Ok, here are the details for the various operation types. The order listed here is the order that they should be imported in a complete SSP import.

Content Sources

Type, name, schedules, start addresses etc. are all handled. As far as I know that is everything, I’ve not been able to test the Exchange and BDC content sources, but they should work.

If you are transferring settings between two servers you probably want to correct the search start addresses as they are likely wrong. I’ve not tried to do anything fancy with automatic recognition of the local farm address and the like as the risk of error is too great, I wanted to keep the focus on the SSP settings not the various sites and their access mappings etc. Sorry for that you can’t have everything.

There is an option to start – and wait – for a full crawl after the import (“-FullCrawl”). This will allow the indices to be built and crawled properties will automatically be added for the crawled content. This is the “normal” way to create crawled properties.

Currently the program will wait a maximum of two hours for the crawl to complete, it will probably be configurable in the future (if I need it).

Crawled Properties

It is possible to import as well as export these. I should stress that the import operation should be considered experimental.

Why would you want to import crawled properties? They are usually created by the crawler and are available for use in managed properties immediately afterwards. However if the content in question have not yet been created (e.g. you are deploying a site to a new farm) or if you don’t want to wait for a full crawl before you create the managed properties, you might want to import them.

I’m not really using this feature myself so I don’t consider my testing to be conclusive enough.

Managed Properties

The code to import and export managed properties is originally from Sahil Malik, though considerable redesigned and bug fixed. It is now possible to dump all managed properties from one site and import them to another – there is no need to extract the standard system managed properties from your own custom (you are welcome if you want to), all can be imported with no changes.

The import will fail if one of the managed properties maps to an unknown crawled property, then you might need to either schedule a full crawl to create the crawl properties or import them too.

The “remove excess mappings” option (“-RemoveExcessMappings”)can be used to delete mappings from existing managed properties to crawled when those properties exists in the input xml file with other mappings, i.e. using this option will ensure that the SSP managed properties are exactly the same as those in the xml file after the import.

Search Scopes

The shared search scopes (those defined in the SSP) are fully supported – settings and rules are all transferred. The import will prune the scope rules to match the import xml file.

The import will fail for scopes that use property rules if the managed properties used has not been defined or marked for use in scopes (the “allow this property to be used in scopes” switch. Import of the managed property includes this setting).

The option “-StartCompilation” starts a scope compilation after the import but not wait for completion (not much point in waiting for that).

The one thing is missing from the scope import is scope display groups. They are of used on sites to populate the search scope dropdown (and some of my own search webparts as well) and are quite important for the end user search experience. You will have to set those yourself as I limited the scope (sorry for the pun) of the program to the setting stored in the SSP. Should be fairly easy for a site collection administrator to enter them however. In a similar vein any site specific search scopes are not handled. I don’t use that feature at all so there’s no support. Perhaps a topic for future improvement.

How to use

Usage: sspc.exe -o <operation> <target type> <parameters>

Operation = Export|Import

Target type = ContentSources|CrawledProperties|ManagedProperties|SearchScopes

Parameters = -url <ssp url> -file <input/output file name> [-FullCrawl|-RemoveExcessMappings|-StartCompilation]

Note all arguments are case insensitive.

This post is quite long enough as is so if you want to see the exact xml format needed download the code and run the export.

Sample Export

SSPC.exe -o export ContentSources -url http://moss:7000/ssp/admin -file output_contentsources.xml

SSPC.exe -o export CrawlProperties -url http://moss:7000/ssp/admin -file output_crawlproperties.xml

SSPC.exe -o export ManagedProperties -url http://moss:7000/ssp/admin -file output_managedproperties.xml

SSPC.exe -o export SearchScopes -url http://moss:7000/ssp/admin -file output_searchscopes.xml

I created a batch file for a full export (excluding crawled properties):

“Export SSP settings.bat” http://moss:7000/ssp/admin

which will create the output files “output_contentsources.xml”, “output_managedproperties.xml” and “output_searchscopes.xml”.

Sample Import

SSPC.exe -o import ContentSources -fullcrawl -url http://moss:7002/ssp/admin -file input_contentsources.xml

SSPC.exe -o import CrawlProperties -url http://moss:7002/ssp/admin -file input_crawlproperties.xml

SSPC.exe -o import ManagedProperties -removeexcessmappings -url http://moss:7002/ssp/admin -file input_managedproperties.xml

SSPC.exe -o import SearchScopes -startcompilation -url http://moss:7002/ssp/admin -file input_searchscopes.xml

The corresponding batch import file:

“Import SSP settings.bat” http://moss:7002/ssp/admin

which assumes the presence of input files “output_contentsources.xml”, “output_managedproperties.xml” and “output_searchscopes.xml” generated above.

Code Design Notes

Sahil Malik named the program SSPC (supposedly short for “Shared Services Provider Property Creation”) and the corresponding project name on the codeplex site is SSSPPC (“Sharepoint Shared Services Search Provider Property Creation”). It’s a mess and now that I’ve expanded the scope of the program considerably the name is even more misleading now.

Just to avoid further confusion I’ve refrained from renaming the program.

Sahil Malik spent some time doing a proper code design for the initial version. I personally think that he did go a bit over the top (sorry Sahil), but I’ve nevertheless retained most of the basic design.

He split up the code in a number of layers (we all love that) where each layer is a different class-library project. I kept that design and therefore the download will contain a number of dll files as well as the actual exe file. Just keep them all in the same directory and all should be well.

Some comments:

  • I did not change the naming of the existing projects (i.e. they are all named “Winsmarts.*” though I did change a lot of the code) but the ones I added are named “Carlsberg.*”
  • I redesigned/recoded the managed property import section as I simply hate duplicated code and deleted the duplicated BO classes that were present in the old “MAMC project” (now moved to “Winsmarts.SSPC.ManagedProperies”).
  • The import code is now always present in the same project that performs the export.
  • The managed property import/export is now complete in the sense that it can now export and import everything including the system properties. No need to sort through it all and find the ones you are responsible for (though it might still be a good idea to sift through and ensure that old test data are removed)
  • I renamed a number of the classes as some of the BO objects were named as their SharePoint counterparts and the code was quite a bit harder to read than it needed to be.
  • Version number of all (sub) projects has been changed to 1.1.0.0.
  • Error handling is still pretty basic so you’ll get an exception with a stack trace in the console if anything is amiss

[Updated]

My code changes has now been merged into the main code base at the codeplex site. These changes breaks everything in the original code, so you will need to update xml and script files…

Future Improvements

This is the list of future improvements I’ve noted that might be added if I find the time and need for it.

  • [Updated: Done] The code could be cleaned up somewhat (there shouldn’t be any todo’s in released code)
  • Perhaps site scopes should be added
  • Scope display groups might be added (requires some connection from SSP to the sites)
  • It might make sense to add these commands to the list of operations supported by stsadm, which is fairly easy to do (see Andrew Connells excellent post for a sample)
  • [Updated: Done] I’m not too fond of the serialization classes – basically the same piece of code is copied four times with minimal changes. I always consider duplicated code as a bug

Downloads

[Updated]

The code has now been merged with the existing code base at codeplex, so head over there for the latest download.

Codeplex/SSSPPC

References

Sahil Maliks original post

The current Codeplex site

A couple of useful MS articles: Creating Content Sources to Crawl Business Data in SharePoint Server 2007 Enterprise Search and Creating and Exposing Search Scopes in SharePoint Server 2007 Enterprise Search

Follow

Get every new post delivered to your Inbox.