<?xml version="1.0" encoding="us-ascii"?>
<rss version="2.0" xml:base="http://www.cmswatch.com" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/">
   <channel>
      <title>CMS Watch Enterprise Search Feed</title>
      <link>http://www.cmswatch.com</link>
      <description>CMS Watch headlines about Enterprise Search</description>
      <language>en-us</language>
      <lastBuildDate>Thu,  7 Aug 2008 21:09:31 -0400</lastBuildDate>
      <dc:creator>editor@cmswatch.com (Tony Byrne)</dc:creator>
      <dc:rights>Copyright 2005, CMS Watch</dc:rights>
      <dc:publisher>CMS Watch</dc:publisher>
      <image>
         <title>CMS Watch</title>
         <url>http://www.cmswatch.com/images/cmswatch_logo.gif</url>
         <link>http://www.cmswatch.com</link>
         <width>82</width>
         <height>36</height>
         <description>CMS Watch logo</description>
      </image>
      <item>
         <title>Join us in Copenhagen and London for search, IA and more....</title>
         <description>I'm elated to invite you to two seminars and a full-day conference on enterprise search, findability and information architecture, taking place next month in the wonderful cities of Copenhagen and London.
&lt;/p&gt;
&lt;p&gt;
In cooperation with our Denmark-based partner &lt;a href=&quot;http://eng.jboye.dk/&quot;&gt;J. Boye&lt;/a&gt;, I'll be teaching a full-day seminar on &lt;a href=&quot;http://eng.jboye.dk/arrangementer/masterclass_information_architecture_for_findability_and_web_publishing&quot;&gt;Information Architecture for Findability and Web Publishing&lt;/a&gt; on September 11. This seminar is cumulative learnings from my time as a taxonomist and implementer of many a WCMS, as well as my more recent few years as an analyst (speaking with hundreds of users of CM, DAM and search systems), about what kinds of information structures help, or hinder, the implementation of specific technologies. If you're an information architect, web content manager, portal manager or search project manager, this class is geared towards the kinds of challenges you face.
&lt;/p&gt;
&lt;p&gt;
In London a few days later, together with &lt;a href=&quot;http://jboye08.dk/speakers/claudia_urschbach&quot;&gt;Claudia Urschbach&lt;/a&gt;, Senior Information Architect with the &lt;a href=&quot;http://www.bbc.co.uk/&quot;&gt;BBC&lt;/a&gt;, I'm chairing a &lt;a href=&quot;http://www.damusers.com/otherevents/index.php?eventid=9&quot;&gt;full-day event on enterprise search&lt;/a&gt;. The speaker &lt;a href=&quot;http://www.damusers.com/otherevents/index.php?eventid=9&amp;pageid=28&quot;&gt;lineup&lt;/a&gt; includes &lt;a href=&quot;http://www.cmswatch.com/Analyst/20-Bloem&quot;&gt;Adriaan Bloem&lt;/a&gt;, &lt;a href=&quot;http://www.intranetfocus.com/martinwhite.php&quot;&gt;Martin White&lt;/a&gt;, &lt;a href=&quot;http://www.cmswatch.com/Analyst/8-Boye&quot;&gt;Janus Boye&lt;/a&gt;, &lt;a href=&quot;http://www.steptwo.com.au/about/staff/jamesr/index.html&quot;&gt;James Robertson&lt;/a&gt; and others, and we'll spend the day discussing the &lt;a href=&quot;http://www.damusers.com/otherevents/index.php?eventid=9&amp;pageid=28&quot;&gt;current thinking on optimizing search within the enterprise&lt;/a&gt;. Our goal is both practical and tactical: we'll discuss search UI best practices, the vendor landscape, managing search projects, &lt;a href=&quot;http://www.cmswatch.com/DAM/Report/&quot;&gt;audio/video search&lt;/a&gt;, what you need to know about &lt;a href=&quot;http://www.cmswatch.com/SharePoint/Report/&quot;&gt;search in SharePoint 2007&lt;/a&gt;, and of course, we'll do some future-gazing. 
&lt;/p&gt;
&lt;p&gt;
Finally, Adriaan Bloem and I will teach a &lt;a href=&quot;http://www.damusers.com/otherevents/index.php?eventid=9&amp;pageid=29&quot;&gt;half-day intensive course&lt;/a&gt; on &lt;a href=&quot;http://www.cmswatch.com/Search/Report/&quot;&gt;enterprise search&lt;/a&gt; in London on September 16th. If you're looking for a deep dive into today's search technology, join Adriaan and I for a combined presentation of our recent research. We hope to see you there!</description>
         <link>http://www.cmswatch.com/Trends/1334-Join-us-in-Copenhagen-and-London-for-search,-IA-and-more....?source=RSS</link>
         <category>Web Content Management</category>
         <author>tregli@cmswatch.com(Theresa Regli)</author>
         <pubDate>Mon,  4 Aug 2008 08:14:00 -0400</pubDate>
      </item>
      <item>
         <title>Cuil could be cool</title>
         <description>&lt;p&gt;As the buzz has it, public website search engine &lt;a href=&quot;http://www.cuil.com&quot;&gt;Cuil&lt;/a&gt; is the new &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Google/&quot;&gt;Google&lt;/a&gt; challenger. &quot;Cuil&quot; is apparently pronounced &quot;cool&quot;, and &quot;an old Irish word for knowledge&quot;.&lt;/p&gt;
&lt;p&gt;The search engine was officially launched a few days ago and is enjoying its time in the spotlight. There's two reasons for that: the company was started by ex-Google employees; and it has an index that's supposed to be three times as large as Google's. Now, that's all very nice, but since CMS Watch doesn't evaluate the public search engines, but enterprise search tools (&quot;behind the firewall search&quot;), you may ask: what's the relevancy?&lt;/p&gt;
&lt;p&gt;Well, the word is still out on Cuil's relevancy ranking -- or the freshness of its index, for that matter. One thing is certain: a larger index doesn't necessarily mean better results. The Cuil folks must have realized, though, that to be any kind of competition, your index has to be huge; it's the old numbers game that especially Yahoo! and Google used to engage in. Google was the first to quit playing that game, but somewhat &quot;coincidentally&quot; &lt;a href=&quot;http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html&quot;&gt;suddenly made a statement&lt;/a&gt; about their 1 trillion pages indexed.&lt;/p&gt;
&lt;p&gt;As for how relevant this is for enterprise search: well, Cuil doesn't play that particular game (though many search companies do both or at least used to: &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Microsoft&quot;&gt;Microsoft&lt;/a&gt;, &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Fast%20Search%20&amp;%20Transfer&quot;&gt;FAST&lt;/a&gt;, &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Exalead&quot;&gt;Exalead&lt;/a&gt;, &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Vivisimo&quot;&gt;Vivisimo&lt;/a&gt;, the list goes on... and oh yes, &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Google&quot;&gt;Google&lt;/a&gt;). What struck me as most interesting is that Cuil attempts to change the way people don't just search, but &lt;i&gt;find&lt;/i&gt;, by using an &lt;a href=&quot;http://www.cuil.com/search?q=Alan+Pelz-Sharpe&quot;&gt;innovative new results interface&lt;/a&gt;. And that's always pretty good news... since so far, most vendors have rather unimaginatively been copying Google's design of search results, since that's what most users have grown used to on the web.&lt;/p&gt;
&lt;p&gt;Of course, again, they're not the first to innovate: notable examples are the public search engines of &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Exalead&quot;&gt;Exalead&lt;/a&gt; (&lt;a href=&quot;http://www.exalead.com&quot;&gt;exalead.com&lt;/a&gt;) and &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Vivisimo&quot;&gt;Vivisimo&lt;/a&gt; (&lt;a href=&quot;http://www.clusty.com&quot;&gt;Clusty&lt;/a&gt;). Both are quite experimental, and especially Exalead is continuously updating the interface. What you like best is rather personal, but for me, both are more useful than Cuil, where a static footer on the bottom takes up too much of my screen real estate: frames are soooo 1996 (even if they're not actual HTML frames). But Exalead and Vivisimo's public search engines are more interesting because they are not just marketing, but also ongoing research: what you see there might actually turn up in an enterprise search interface near you soon.&lt;/p&gt;
&lt;p&gt;Still, if Cuil will get people used to more varieties than just plain vanilla Google behind the firewall, as well, that would be nice for anyone trying to implement search. I think many would be quite happy to have users clamor for something that's more like Cuil, rather than &quot;why can't we just have Google&quot;. It's time to innovate the interfaces beyond just Googlesque results listings and &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Endeca&quot;&gt;Endeca&lt;/a&gt;'s facets. That wouldn't just be old Irish knowledge, it would actually be pretty cool.&lt;/p&gt;</description>
         <link>http://www.cmswatch.com/Trends/1333-Cuil-could-be-cool?source=RSS</link>
         <category>Enterprise Search</category>
         <author>bloem@radagio.com(Adriaan Bloem)</author>
         <pubDate>Fri,  1 Aug 2008 15:40:00 -0400</pubDate>
      </item>
      <item>
         <title>Ongoing confusion in the land of MS search technology</title>
         <description>The SharePoint IT Pro Documentation Team recently published &lt;a href=&quot;http://blogs.technet.com/tothesharepoint/archive/2008/07/17/3090292.aspx&quot;&gt;a blog post&lt;/a&gt; on the various Microsoft &amp;quot;enterprise&amp;quot; search technologies. The post did a nice job of clarifying the role of each of Microsoft's various search tools, save &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Fast%20Search%20&amp;%20Transfer&quot;&gt;FAST&lt;/a&gt; (but more on that in a bit). Even the post's author, &lt;a href=&quot;http://blogs.technet.com/user/Profile.aspx?UserID=37167&quot;&gt;Kathy Narvaez&lt;/a&gt;, admits she has trouble distinguishing the various &amp;quot;...flavors of Microsoft enterprise search;&amp;quot;&amp;#160; hence the blog post.&lt;/p&gt;
 
&lt;p&gt;This post is significant for two reasons: first, she used the word &amp;quot;enterprise&amp;quot; when describing &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Microsoft&quot;&gt;SharePoint search&lt;/a&gt;, as well as Search Server Express and Search Server 2008 (although she also used the term &amp;quot;entry level&amp;quot;); and second, there was absolutely no mention of FAST. In some ways, the exclusion of FAST could be a result of the continuing ambiguity around how Microsoft will integrate the tool into the overall product set.  Further, this blog post is coming from the IT Pro Documentation group (think Infrastructure Team - TechNet, not MSDN subscribers) and they don't seem to be as close to the product teams as the developer-based bloggers.&lt;/p&gt;

&lt;p&gt;As our &lt;a href=&quot;http://www.cmswatch.com/Search/Report/&quot;&gt;&lt;i&gt;Enterprise Search Report 2008&lt;/i&gt;&lt;/a&gt; and &lt;a href=&quot;http://www.cmswatch.com/SharePoint/Report/&quot;&gt;&lt;i&gt;SharePoint Report 2008&lt;/i&gt;&lt;/a&gt; both point out, Microsoft does not have a particularly strong native search offering, especially when we consider search across multiple repositories of various types. I don't think anyone but Microsoft considers their offerings outside of FAST truly enterprise class or scale. And that's why this blog post is a bit surprising: it strikes me as odd that they have so much trouble controlling the language they use to describe their own products.&lt;/p&gt;</description>
         <link>http://www.cmswatch.com/Trends/1332-Ongoing-confusion-in-the-land-of-MS-search-technology?source=RSS</link>
         <category>Enterprise Search</category>
         <author>shawn_shell@consejoinc.com(Shawn Shell)</author>
         <pubDate>Fri,  1 Aug 2008 10:04:00 -0400</pubDate>
      </item>
      <item>
         <title>A new (and wearable) Content Technologies Subway Map</title>
         <description>A new season brings an updated &lt;a href=&quot;http://www.cmswatch.com/vendormap/&quot;&gt;vendor map&lt;/a&gt;:&lt;/p&gt;

&lt;table width=&quot;500&quot; border=&quot;0&quot; align=&quot;center&quot;&gt;
&lt;tr&gt;&lt;td align=&quot;center&quot;&gt;&lt;a href=&quot;http://www.cmswatch.com/vendormap/&quot;&gt;&lt;img src=&quot;/images/CMS-Watch-Subway-2008-small.gif&quot; alt=&quot;CMS Watch Q3 2008 Subway Vendor Map low-rez&quot; width=&quot;500&quot; height=&quot;375&quot; border=&quot;0&quot;&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;br /&gt;
&lt;p&gt;We added a Yellow Line -- for &lt;a href=&quot;http://www.cmswatch.com/CCM/Report/&quot;&gt;XML &amp;amp; Component Content Management vendors&lt;/a&gt;, 
  and reflected some other  station changes.&lt;/p&gt;
&lt;p&gt;And now, if you like what you see, you and your wall can wear it. &lt;a href=&quot;http://www.cafepress.com/cmswatch&quot;&gt;Our new store 
  at Cafe Press&lt;/a&gt; offers t-shirt and posters of various sizes, along 
  with other CMS Watch tchotchkes. &lt;/p&gt;
&lt;p&gt;Regarding the latter, perhaps you already own your fill of mugs and mousepads, but can you ever 
  have enough &lt;a href=&quot;http://www.cafepress.com/cmswatch.285260609&quot;&gt;beer steins&lt;/a&gt;? Bring it to the next &lt;a href=&quot;http://www.cmswatch.com/About/Events/&quot;&gt;event where we're speaking&lt;/a&gt; and 
  we'll fill it up with the closest available brew. ;-)</description>
         <link>http://www.cmswatch.com/Trends/1320-A-new-(and-wearable)-Content-Technologies-Subway-Map?source=RSS</link>
         <category>Enterprise Portals</category>
         <author>tbyrne@cmswatch.com(Tony Byrne)</author>
         <pubDate>Wed, 23 Jul 2008 14:55:00 -0400</pubDate>
      </item>
      <item>
         <title>Infrastructure Updates for SharePoint</title>
         <description>Through the &lt;a href=&quot;http://blogs.msdn.com/sharepoint/&quot;&gt;SharePoint product 
team's MSDN blog&lt;/a&gt;, Microsoft announced that it had released &lt;a href=&quot;http://blogs.msdn.com/sharepoint/archive/2008/07/15/announcing-availability-of-infrastructure-updates.aspx&quot;&gt;a 
significant infrastructure update for SharePoint&lt;/a&gt; (and related technologies 
like Project Server that leverages SharePoint components). The update seems to 
primarily address three areas:&lt;/p&gt; 
&lt;ul&gt;
  &lt;p&gt;Search functionality and search-related performance (like index performance). 
  &lt;/p&gt;
  &lt;p&gt;Content Deployment bug fixes (which hopefully will correct a series of irritating 
    bugs related to deploying content from one SharePoint environment to another 
    in web content management scenarios). These are include the hotfix packs Microsoft 
    released for content deployment back in May of this year. &lt;/p&gt;
  &lt;p&gt;General interface and performance improvements. In reading the three or four 
    pages in Microsoft's site that aimed to describe what was actually included, 
    it was difficult to pinpoint what these &amp;quot;improvements&amp;quot; actual mean 
    to SharePoint administrators. However, Microsoft describes them as &amp;quot;...fixes 
    and product performance updates driven by customer feedback which have resulted 
    in significant platform performance improvements...&amp;quot; Again, I was unable 
    to nail what precisely has changed or how significant the improvements were. 
  &lt;/p&gt;
&lt;/ul&gt;
&lt;p&gt;What's interesting, at least with regard to search, is that it seems the &amp;quot;ancillary&amp;quot; 
  search products like Search Server 2008 (and it's &amp;quot;free&amp;quot; sibling Search 
  Server Express 2008) are driving updates to SharePoint's search technology. 
  As mentioned in the &lt;a title=&quot;CMS Watch SharePoint Report 2008&quot; href=&quot;http://www.cmswatch.com/SharePoint/Report/&quot;&gt;&lt;em&gt;SharePoint 
  Report 2008&lt;/em&gt;&lt;/a&gt;, Microsoft has invested heavily in improving SharePoint 
  search. In fact, historically, it seemed as if SharePoint Search was the the 
  parent of these independent search tools, but it now appears as if &amp;quot;the 
  student [has become] the master&amp;quot; as Darth Vader said to Obi Wan. &lt;/p&gt;
&lt;p&gt;In particular, SharePoint is getting Search Server's federated search capabilities 
  and &amp;quot;a unified search dashboard.&amp;quot; From what I saw at the last SharePoint 
  conference, both of these search products borrowed very heavily from the SharePoint 
  interface construct, but improved the visibility of certain configuration settings. 
  In particular, I liked the ease with which you could configure the federated 
  search. &lt;/p&gt;
&lt;p&gt;However, these changes call into question how this will all play out within 
  the Shared Services provider and whether administrators who are struggling to 
  figure out where to go to change search settings -- at the site, site collection, 
  Central Administration (in the Application or Operation tab) or in Shared Services. 
  While most key search settings reside in Shared Services, SharePoint has search-relate 
  configuration in spread over virtually every administrative interface. My hope 
  is that this &amp;quot;unified search dashboard&amp;quot; brings some order to search 
  within SharePoint.&lt;/p&gt;

&lt;p&gt;In the end, these changes (along with the FAST search integration) also add 
  more evidence to the theory that Microsoft is going to decouple search from 
  SharePoint entirely (and potentially the Office team) -- making SharePoint a 
  client technology. As I blogged about in &lt;a href=&quot;http://www.cmswatch.com/Trends/1219-Thoughts-on-SharePoint-and-FAST-Search&quot;&gt;a post on the completion of the FAST 
  acquisition&lt;/a&gt;, Microsoft seems to be leaning very heavily towards and independent 
  search product team. And just to add fuel to the conspiratorial fire, this type 
  of organizational structure might make sense if, say, Microsoft were to acquire 
  a large Internet-centric search company (although it begs the question what 
  they'd do with all of this overlapping technology). </description>
         <link>http://www.cmswatch.com/Trends/1313-Infrastructure-Updates-for-SharePoint?source=RSS</link>
         <category>Enterprise Search</category>
         <author>shawn_shell@consejoinc.com(Shawn Shell)</author>
         <pubDate>Fri, 18 Jul 2008 00:20:00 -0400</pubDate>
      </item>
      <item>
         <title>Narrowcasting to your feed aggregator</title>
         <description>We're pleased that CMS Watch now covers &lt;a href=&quot;http://www.cmswatch.com/Reports/&quot;&gt;ten different technologies&lt;/a&gt;, but I suspect 
  that many of you take an interest in only one or two families of tools. If that's 
  you, here's a list of technology-specific RSS feeds that will just send relevant 
  postings to your reader or aggregator.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Digital Asset Management&lt;/strong&gt;: &lt;a href=&quot;http://www.cmswatch.com/RSS/cmswatch.channel.xml/DAM&quot;&gt;http://www.cmswatch.com/RSS/cmswatch.channel.xml/DAM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ECM Suites&lt;/strong&gt;: &lt;a href=&quot;http://www.cmswatch.com/RSS/cmswatch.channel.xml/ECM&quot;&gt;http://www.cmswatch.com/RSS/cmswatch.channel.xml/ECM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;E-mail Archiving &amp;amp; Management&lt;/strong&gt;: &lt;a href=&quot;http://www.cmswatch.com/RSS/cmswatch.channel.xml/E-mail&quot;&gt;http://www.cmswatch.com/RSS/cmswatch.channel.xml/E-mail&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Enterprise Portals&lt;/strong&gt;: &lt;a href=&quot;http://www.cmswatch.com/RSS/cmswatch.channel.xml/Portal&quot;&gt;http://www.cmswatch.com/RSS/cmswatch.channel.xml/Portal&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Enterprise Search&lt;/strong&gt;: &lt;a href=&quot;http://www.cmswatch.com/RSS/cmswatch.channel.xml/Search&quot;&gt;http://www.cmswatch.com/RSS/cmswatch.channel.xml/Search&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;SharePoint&lt;/strong&gt;: &lt;a href=&quot;http://www.cmswatch.com/RSS/cmswatch.channel.xml/SharePoint&quot;&gt;http://www.cmswatch.com/RSS/cmswatch.channel.xml/SharePoint&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Social Software&lt;/strong&gt;: &lt;a href=&quot;http://www.cmswatch.com/RSS/cmswatch.channel.xml/Social&quot;&gt;http://www.cmswatch.com/RSS/cmswatch.channel.xml/Social&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Web Analytics&lt;/strong&gt;: &lt;a href=&quot;http://www.cmswatch.com/RSS/cmswatch.channel.xml/Analytics&quot;&gt;http://www.cmswatch.com/RSS/cmswatch.channel.xml/Analytics&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Web CMS / WCM&lt;/strong&gt;: &lt;a href=&quot;http://www.cmswatch.com/RSS/cmswatch.channel.xml/CMS&quot;&gt;http://www.cmswatch.com/RSS/cmswatch.channel.xml/CMS&lt;/a&gt; 
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;XML &amp;amp; Component Content Management&lt;/strong&gt;: &lt;a href=&quot;http://www.cmswatch.com/RSS/cmswatch.channel.xml/CCM&quot;&gt;http://www.cmswatch.com/RSS/cmswatch.channel.xml/CCM&lt;/a&gt;</description>
         <link>http://www.cmswatch.com/Trends/1303-Narrowcasting-to-your-feed-aggregator?source=RSS</link>
         <category>Enterprise Portals</category>
         <author>tbyrne@cmswatch.com(Tony Byrne)</author>
         <pubDate>Fri, 11 Jul 2008 15:39:00 -0400</pubDate>
      </item>
      <item>
         <title>Enterprise Search Scalability: A Big Issue</title>
         <description>I was talking to a search vendor the other day who said something that really got my attention. He remarked that a customer recently came to him and asked what it would take, in terms of software, hardware, and time, to index &lt;em&gt;30 billion documents&lt;/em&gt;. Mind you, this was not some hypothetical exercise. The question came from somebody whose company actually has 30 billion documents under management.&lt;/p&gt;

&lt;p&gt;Consider the dimensions of the problem. Assuming (for purposes of argument) you could index a thousand documents per second on one machine, it would take a full year just to build the index for 30 billion docs. If the solution scales linearly, building (or rebuilding) the index would keep a 100-machine server farm busy for the better part of a week.&lt;/p&gt;

&lt;p&gt;That's considering the scenario in a static context only. In the real world, of course, documents are revised (some frequently, others never, most somewhere in-between). New docs enter the system. Old ones are dropped. Otherwise-unchanged docs are moved to new locations. Unless you can update your index(es) incrementally, in real time, as docs are added, deleted, modified, or moved, you have an index shelf-life problem. &lt;/p&gt;

&lt;p&gt;The traditional answer to the shelf-life problem is to rebuild the index every few days (or every night, if resources allow). At the level of ten or twenty thousand docs, a total rebuild of the index every few days isn't a huge issue. But when you get beyond something like a few million docs, performing total-rebuilds on a frequent basis quickly becomes a worst practice (if indeed it's practicable at all). At some point, you &lt;em&gt;need&lt;/em&gt; the ability to do incremental indexing.&lt;/p&gt;

&lt;p&gt;But, someone will ask, can't you just throw more machine resources (threads, memory, cycles) at the problem? Yes and no. If you're spidering files over the wire, bandwidth exhaustion becomes an issue. If you're indexing files locally, there's an OS-imposed limit on how many files you can have open at once. There's also the question of how much file data you can hold in memory. The reason this is important is that some search systems (quite a few, actually) need to load an entire document into memory before the doc can be indexed. If you're indexing 10-megabyte PDFs, it might not matter how many threads you have available. (Note, incidentally, that most docs occupy a &lt;em&gt;lot &lt;/em&gt;more space in memory than on disk.) And anyway, the CPU can execute only so many instructions per second, no matter how many docs you can load at once. &lt;/p&gt;

&lt;p&gt;I bring all this up for a couple of reasons. First, if you're shopping for a search solution, you need to regard the various vendors' performance claims with more than a modicum of caution. No two search scenarios are the same, obviously. But more than that, the parameters that affect scalability and performance are numerous and non-obvious (and their interactions subtle), tending to moot most performance claims straight out of the gate.&lt;/p&gt;

&lt;p&gt;Takeaway No. 1: If you care about performance (and you should), &lt;em&gt;do your own testing.&lt;/em&gt; Insist on it as part of any product evaluation.&lt;/p&gt;

&lt;p&gt;Takeaway No. 2: Get your programmers involved in the evaluation process early. Some of these issues require computer-science expertise to evaluate properly.&lt;/p&gt;

&lt;p&gt;Also (very important), when shopping for a search solution, don't buy for your present needs. Shop for your &lt;em&gt;future&lt;/em&gt; needs. Your company probably has ten times more content under management today than it had just five years ago. &lt;em&gt;Five years from now, it could have ten times more than today.&lt;/em&gt; Will your search solution scale appropriately? More particularly, &lt;em&gt;how&lt;/em&gt; will it scale? Will it scale linearly? Will it hit a brick wall?&lt;/p&gt;

If I were searching for a search solution, I'd ask every vendor a few simple questions: &lt;ul&gt;&lt;li&gt;How big is your biggest customer installation and what did it take to build it? &lt;/li&gt;&lt;li&gt;Can your system do incremental indexing? How often is a full rebuild required?&lt;br /&gt; &lt;/li&gt;&lt;li&gt;Does your indexer need to read a document into memory (whole) before indexing it, or can files be stream-processed?&lt;/li&gt;&lt;li&gt;What's the largest document your system can index without either choking or stopping after a particular number of characters?
&lt;/li&gt;&lt;li&gt;How does indexing performance change as the index gets bigger? (Not just &amp;quot;does it slow down?&amp;quot; but &lt;em&gt;how&lt;/em&gt; does it slow? Linearly? Exponentially? If it's the latter, you're going to hit a brick wall.)&lt;/li&gt;&lt;li&gt;And: &lt;em&gt;Do you support 64-bit architectures?&lt;/em&gt; &lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Those are just a few conversation-starters. For more (lots more), be sure to consult our &lt;em&gt;&lt;a href=&quot;http://www.cmswatch.com/Search/Report/&quot;&gt;Enterprise Search Report 2008&lt;/a&gt;&lt;/em&gt;. (You can get a free sample of it &lt;a href=&quot;http://www.cmswatch.com/Reports/Try&quot;&gt;online here&lt;/a&gt;.) And if you end up evaluating one or more search offerings in depth, please drop us a line and let us know what you learned. We're always interested in your feedback.</description>
         <link>http://www.cmswatch.com/Trends/1299-Enterprise-Search-Scalability:-A-Big-Issue?source=RSS</link>
         <category>Enterprise Search</category>
         <author>kthomas@cmswatch.com(Kas Thomas)</author>
         <pubDate>Wed,  9 Jul 2008 13:02:00 -0400</pubDate>
      </item>
      <item>
         <title>How Fast is Attivio?</title>
         <description>The &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Fast%20Search%20&amp;%20Transfer&quot;&gt;Fast Search &amp;amp; Transfer&lt;/a&gt; news continues to keep me on my toes. The Norwegian business weekly &lt;a href=&quot;http://www.dn.no/forsiden/borsMarked/article1434702.ece&quot;&gt;Dagens N&amp;aelig;ringsliv&lt;/a&gt; seems to have come up with &lt;a href=&quot;http://www.dn.no/forsiden/borsMarked/article1434702.ece&quot;&gt;some decent evidence&lt;/a&gt; (&lt;a href=&quot;http://www.scribd.com/doc/3809691/Fasts-Stock-Market-Bluff&quot;&gt;English translation&lt;/a&gt;) of many things everybody already suspected -- and a couple of new ones.&lt;/p&gt;

&lt;p&gt;It covers the &lt;a href=&quot;http://www.cmswatch.com/Trends/878-FAST-buys-Convera%27s-RetrievalWare&quot;&gt;Convera acquisition&lt;/a&gt; and the &amp;quot;separate&amp;quot; but surprisingly coincidental deal where Convera bought several million's worth of Fast software it didn't need. Or as rival &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Autonomy&quot;&gt;Autonomy&lt;/a&gt; (which was also in the running for buying Convera) has pointed out, Fast pumped up its revenues for that quarter with part of the money it paid for Convera, then got back for licenses. The DN article also covers a few other very suspicious deals, and some outright fraud. It's now even getting to the point where calling Fast &amp;quot;the Enron of Norway&amp;quot; &lt;a href=&quot;http://www.google.com/search?as_epq=enron+of+norway&quot;&gt;is getting long in the tooth&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;While that train wreck was unfolding before my eyes in slow motion, my fellow analyst &lt;a href=&quot;http://www.cmswatch.com/Analyst/15-Regli&quot;&gt;Theresa Regli&lt;/a&gt; pinged me last February about a new enterprise search company called &lt;a href=&quot;http://www.attivio.com&quot;&gt;Attivio&lt;/a&gt;. Information Today &lt;a href=&quot;http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=40763&quot;&gt;raved about their new product AIE&lt;/a&gt;, with analysts quoted as saying things like &amp;quot;they are moving rapidly to develop tools that will eliminate many of the practical barriers to easily and efficiently deploy robust enterprise search solutions,&amp;quot; with the unique selling point of &amp;quot;data integration plus search and content processing,&amp;quot; a &amp;quot;hot niche for the next few years.&amp;quot;&lt;/p&gt;

&lt;p&gt;Since I'm always interested to find out more about robust enterprise search tools to fill hot niches for the next few years, I scrolled down to read what the Attivio CTO would explain about &lt;i&gt;how&lt;/i&gt; the product would achieve what &amp;quot;should have been solved by the integration of text search and XML into relational database managers such as Oracle.&amp;quot; As it turns out, it is based on a &amp;quot;mash-up&amp;quot; of open source &lt;a href=&quot;http://www.cmswatch.com/Portal/Vendors/Apache&quot;&gt;Apache Lucene&lt;/a&gt; and &amp;quot;licensed commercial software.&amp;quot;&lt;/p&gt;

&lt;p&gt;As described in our &lt;a href=&quot;http://www.cmswatch.com/Search/Report/&quot;&gt;Enterprise Search Report&lt;/a&gt; and &lt;a href=&quot;http://www.cmswatch.com/Trends/1247&quot;&gt;on this blog&lt;/a&gt;, Lucene itself is just a Java text search API. To be able to actually gather, convert, and query content you need many more components. It is perfectly feasible to put together a working enterprise search product around the core Lucene JAR (as demonstrated by &lt;a href=&quot;http://www.cmswatch.com/Trends/1247&quot;&gt;IBM's Omnifind Yahoo! Edition&lt;/a&gt;). But in order to get there, and to have Lucene index, for instance, Office documents and PDFs, you will have to first convert those documents to text. The filters to perform that conversion can be &lt;a href=&quot;http://www.cmswatch.com/Trends/1185&quot;&gt;bought from other vendors&lt;/a&gt;, based on open source such as &lt;a href=&quot;http://sourceforge.net/projects/pdftohtml/&quot;&gt;pdftohtml&lt;/a&gt;, or you'll have to build them yourself, which is a lot of work. There aren't too many vendors building their own filters, or even just modifying open source to do so. So if you &lt;i&gt;do&lt;/i&gt; build the filters needed to use Lucene yourself, you'd probably like to mention this as an advantage, and as Attivio states, &amp;quot;we developed our own Microsoft Office, WordPerfect, and PDF connectors to improve performance and reach deeper into the files than the conventional converters.&amp;quot;&lt;/p&gt;

&lt;p&gt;Since, like most enterprise search products, Lucene isn't based on a database and couldn't even connect to such content without help, it isn't surprising Attivio had to develop a &amp;quot;unique RDBMS data loader&amp;quot; which &amp;quot;indexes the tables individually.&amp;quot; This, again, is presented as a major advantage -- remember, converting documents and integrating structured and unstructured data are &amp;quot;a hot niche.&amp;quot;&lt;/p&gt;

&lt;p&gt;I remember seeing a vendor at a conference a few years back, with banners jokingly stating its product was &amp;quot;buzzword compliant!&amp;quot; Attivio certainly seems to have that skill down. The engineering effort is marketed as a &amp;quot;technology mashup,&amp;quot; &amp;quot;breaking down silos&amp;quot; between &amp;quot;open source and commercial software.&amp;quot; &amp;quot;We have lived with the challenge of having to choose between the precision of databases and the richness of search for a long time, but no longer&amp;quot; sounds great, but I don't see &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Oracle&quot;&gt;Oracle&lt;/a&gt; and &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Thunderstone&quot;&gt;Thunderstone&lt;/a&gt;'s RDBMS-based solutions breaking out in a sweat just yet.&lt;/p&gt;

&lt;p&gt;Maybe my over-exposure to marketing materials and flashy demos has turned me into a cynic, and Attivio's &lt;a href=&quot;http://attivio.com/ourproducts_ektid134.aspx&quot;&gt;downloadable trial version&lt;/a&gt; will have to do at least a decent job to convince me of the product's added value. Fortunately, that free download is &amp;quot;coming soon!&amp;quot; Yes, I'm sorry, I'm finding it increasingly hard to turn off that cynicism, especially when I turn back to the DN article about Fast Search &amp;amp; Transfer. Attivio was founded by &lt;a href=&quot;http://attivio.com/aboutus_ektid90.aspx&quot;&gt;former Fast employees&lt;/a&gt; and the Attivio CTO is Sid Probstein, formerly vice president of technology at FAST. More importantly, Attivio's CEO is Ali Riaz, who was COO at Fast but unexpectedly left the company in late 2006. Well, in hindsight, perhaps not so unexpectedly, though DN quotes him as saying &amp;quot;I had nothing to gain from manipulation of the accounts. I had no shares in the company. I wanted shares and quit because I didn't get any. If you want to find out what's wrong with the accounts, you need to look at those who could gain from it. And it wasn't me.&amp;quot;&lt;/p&gt;

&lt;p&gt;Dagens N&amp;aelig;ringsliv doesn't appear to agree with Riaz, however; if you want the full analysis of the what and why, I suggest you read the article. I myself find it surprising that the CEO of a technology startup backed by $6.2 million in venture capital would drive &lt;a href=&quot;http://multimedia.dn.no/archive/00144/LB_Ali_Riaz_Fast_144367m.jpg&quot;&gt;an Audi R8&lt;/a&gt;, but that doesn't mean anything (other than that I'm envious of his car). I also find it surprising a former Fast COO would be co-owner of a company reselling Fast licenses, but walking like an Enron duck and quacking like an Enron duck doesn't necessarily mean that it's really anything like Enron. And Attivio's clouding the core technology in marketing hyperboles and buzzword compliance is slightly disconcerting, but &lt;a href=&quot;http://www.cmswatch.com/Trends/1122&quot;&gt;many renowned companies engage in the same practice&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;DN quotes Riaz as saying &amp;quot;you should be much better at praising the people who have success, instead of pushing them down.&amp;quot; And I would certainly love to be proven wrong by Attivio's software; as soon as I get my hands on the trial download I requested, I will let you know if it lives up to the high expectations. As one of my teachers in school once told me, &amp;quot;I'm known for being cynical, or even sarcastic -- myself, I prefer to call it healthy skepticism and mild irony.&amp;quot;&lt;/p&gt;

&lt;p&gt;Being a cynic isn't a lot of fun -- but for now, I would advise you to be at least healthily skeptical of what Attivio has to offer.</description>
         <link>http://www.cmswatch.com/Trends/1294-How-Fast-is-Attivio?source=RSS</link>
         <category>Enterprise Search</category>
         <author>bloem@radagio.com(Adriaan Bloem)</author>
         <pubDate>Sat,  5 Jul 2008 09:55:00 -0400</pubDate>
      </item>
      <item>
         <title>Best bets: a worst practice?</title>
         <description>&lt;p&gt;As one of the authors of our &lt;i&gt;&lt;a href=&quot;http://www.cmswatch.com/Search/Report/&quot;&gt;Enterprise Search Report 2008&lt;/a&gt;&lt;/i&gt;, I've been spending a lot of time lately looking at search technology and talking to folks who care deeply about the subject (i.e., vendors and their customers).&lt;/p&gt;&lt;p&gt;One thing everyone seems to agree on is that providing relevant results to the user is a very hard problem indeed. People don't want to enter a few keywords and then get 10,000 hits on documents that contain those keywords, 9,999 of which might be irrelevant. They want pointers to the one or two documents that &lt;em&gt;are &lt;/em&gt;relevant.&lt;/p&gt;&lt;p&gt;The signal-to-noise problem is so thorny that many enterprise search products include an optional feature known as &amp;quot;best bets.&amp;quot; The idea is that certain very common searches should point to particular documents (or intranet pages) that are known, or presumed, to apply. Imagine that a lawyer working for a large legal firm logs into the company portal and searches on &amp;quot;poison pill.&amp;quot; A thousand hits might come back, of which 990 are related to medications, allergic reactions, toxicity, malpractice, and so on, even though all the person was really looking for was a link to the company's &amp;quot;merger and acquisitions&amp;quot; resource page. (&amp;quot;Poison pill&amp;quot; is a term for tactics a company can use to fend off hostile takeover attempts.) The idea of &amp;quot;best bets&amp;quot; is that you rig the system to promote the company's &amp;quot;M&amp;amp;A resources&amp;quot; link to the top of the hit list whenever someone does a search on &amp;quot;poison pill.&amp;quot;&lt;/p&gt;&lt;p&gt;Sometimes &amp;quot;best bets&amp;quot; refers to presenting the user with a recommendation when, say, several repositories exist, one or more of which could be better-suited to a given search than the others. (&amp;quot;Would you like to search the Parts Catalog for this?&amp;quot;) This is more of a navigational scenario. That's not really what I'm talking about here. I'm talking about&amp;#xa0; the practice of biasing search results by hard-coding certain answers to certain common queries. &lt;/p&gt;&lt;p&gt;Setting up &amp;quot;best bets&amp;quot; is typically a manual process. A person in IT will use search analytics to determine the most common search queries and the most-followed links associated with them. Then those associations will be captured in a database and wired into the search software in such a way that when a user issues a query for which a best bet already exists, the best-bet link(s) will automatically be shown at the top of the results page (either as a regular hit or under a separate heading of &amp;quot;Best Bets&amp;quot;). &lt;/p&gt;&lt;p&gt;&lt;a href=&quot;http://dennisdeacon.wordpress.com/2008/06/25/search-engine-best-bets/&quot;&gt;Not everyone&lt;/a&gt; thinks the &amp;quot;best bets&amp;quot; mechanism is a good idea. The problem is that, fundamentally, it's a hack. It's arguably the worst kind of hack in that it involves serious amounts of human intervention. Someone has to create the best-bet database. (Typically there will be hundreds, if not thousands, of best-bet links.) Then the database has to be updated and kept fresh as user needs change and documents are added to or dropped from the system. &lt;/p&gt;&lt;p&gt;In point of fact, the search software should do all this for you. After all, that's its job: to return relevant results (automatically) in response to queries. Why would you sink tens (or hundreds) of thousands of dollars into an enterprise search system only to override it with a manually assembled collection of point-hacks?&lt;/p&gt;&lt;p&gt;Sure, search is a hard problem. But if your search system is so poor at delivering relevant results that it can't figure out what your users need without someone in IT explicitly &lt;em&gt;telling &lt;/em&gt;it the answer, maybe you should search for a new search vendor. (And for help with that, see &lt;a href=&quot;http://www.cmswatch.com/Search/Report/&quot;&gt;Enterprise Search Report 2008&lt;/a&gt;, a free sample of which is available &lt;a href=&quot;http://www.cmswatch.com/Reports/Try/&quot;&gt;right here&lt;/a&gt;.)&lt;/p&gt;</description>
         <link>http://www.cmswatch.com/Trends/1286-Best-bets:-a-worst-practice?source=RSS</link>
         <category>Enterprise Search</category>
         <author>kthomas@cmswatch.com(Kas Thomas)</author>
         <pubDate>Thu, 26 Jun 2008 21:50:00 -0400</pubDate>
      </item>
      <item>
         <title>The value of archiving and the limitations of e-discovery</title>
         <description>Today we read about yet &lt;a href=&quot;http://www.bloomberg.com/apps/news?pid=20601087&amp;sid=aUWyzWYWGuFY&amp;refer=home&quot;&gt;another 
  major financial scandal&lt;/a&gt; allegedly exposed through the discovery of an e-mail 
  message from a fund principal that apparently stated that their fund was going 
  to be&amp;quot;'&lt;em&gt;toast&lt;/em&gt;.&amp;quot;&lt;/p&gt;
&lt;p&gt;The first thing I thought about this was that (&lt;em&gt;if true&lt;/em&gt;) it was a fantastically 
  stupid communication to put in an e-mail exchange. Secondly, I wondered why 
  it took so long to find this mail -- surely such high-profile financial managers 
  would have their mail exchanges monitored automatically and an exchange like 
  this should have rung every major alarm bell in the firm within seconds. Of 
  course they could have been using an external system to get around that; we 
  don't know at present. But this case once more highlights the limitations of 
  e-mail monitoring (&lt;a href=&quot;http://www.cmswatch.com/Trends/1274-EAM-focused-on-the-wrong-elements?&quot;&gt;discussed 
  here the other day&lt;/a&gt;) and e-discovery, and conversely the value of content 
  archiving.&lt;/p&gt;
&lt;p&gt;E-discovery is in many regards simply glorified &lt;a href=&quot;http://www.cmswatch.com/Search/Report/&quot;&gt;Enterprise 
  Search technology&lt;/a&gt;, but with the added ability to apply legal holds to data. 
  Just as Enterprise Search is limited by the quality and location of the content 
  it indexes, so too are e-discovery tools. Though in the case of e-discovery 
  the limitations are often more severe: evidence may or may not be conveniently 
  located in an e-mail message, as seems to be the case at Bear Stearns. More 
  commonly evidence has to be culled from not only e-mail stores, but also from 
  instant messaging systems, document systems, ERP systems, financial and business 
  applications, external drives, and so on. The idea that e-discovery is limited 
  to mail -- as many vendors (&lt;em&gt;and worryingly many buyers&lt;/em&gt;) seem to think 
  -- is naive in the extreme. Yet this misplaced belief is based on the reality 
  that the bulk of the data you will have to search will indeed be mail. Mail 
  represents the largest form of data in any organization, typically by an order 
  of magnitude (10x) or more. &lt;/p&gt;
&lt;p&gt;But here's the rub. Most of that e-mail mountain consists of redundant data 
  or as the technical terms goes, &amp;quot;&lt;em&gt;crap&lt;/em&gt;.&amp;quot; As we discuss at 
  length in our &lt;a href=&quot;http://www.cmswatch.com/E-mail/Report/&quot;&gt;&lt;em&gt;E-mail Archiving 
  &amp;amp; Management Report&lt;/em&gt;&lt;/a&gt;, typically 80% of mail data consists of a duplication. 
  Yet any search tool has to treat each piece of data equally, slowing the process 
  down massively and shooting discovery costs through the roof. How much more 
  sensible to use an archiving method to capture, filter, and reduce that volume 
  -- and ease the burden and cost of discovery?&lt;/p&gt;
&lt;p&gt;What did we learn today from the Bear Stearns scandal? Not much really, other 
  than mail (&lt;em&gt;and messages&lt;/em&gt;) continue to the be the key &amp;quot;gotcha&amp;quot; 
  elements of the data mountain, and that we need to monitor and manage them ever 
  more closely. Though the monitoring elements are far from mature, &lt;a href=&quot;http://www.cmswatch.com/E-mail/Report/&quot;&gt;EAM&lt;/a&gt; 
  tools today archive and filter very efficiently indeed. The need to take mail 
  and mail content seriously is now an imperative, and building a strategy, agreeing 
  methods and policies, and selecting the right tools -- however complex -- is 
  a must.</description>
         <link>http://www.cmswatch.com/Trends/1279-The-value-of-archiving-and-the-limitations-of-e-discovery?source=RSS</link>
         <category>Enterprise Search</category>
         <author>aps@cmswatch.com(Alan Pelz-Sharpe)</author>
         <pubDate>Thu, 19 Jun 2008 13:49:00 -0400</pubDate>
      </item>
      <item>
         <title>FAST clarification</title>
         <description>Apparently, when I &lt;a href=&quot;http://www.cmswatch.com/Trends/1257&quot;&gt;previously wrote&lt;/a&gt; about Carrie's &lt;a href=&quot;http://faculty.washington.edu/kgb/horror/handfromgrave.jpg&quot;&gt;hand reaching out from the grave&lt;/a&gt;, the metaphor was too subtle. So let me make it absolutely clear: I'm not hoping the FAST turmoil will &amp;quot;&lt;a href=&quot;http://www.ki4u.com/guide1.jpg&quot;&gt;just go away&lt;/a&gt;&amp;quot; so I won't be bothered by it anymore. I was pointing out the fact that the consequences could be further reaching than it being &amp;quot;just another investor soap&amp;quot;, and the acquisition of &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Fast%20Search%20&amp;%20Transfer&quot;&gt;FAST&lt;/a&gt; by &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Microsoft&quot;&gt;Microsoft&lt;/a&gt; won't be enough to bury it.&lt;/p&gt;
&lt;p&gt;Now that I've ruined the joke by explaining it anyway, let me add that this is no laughing matter. Of course, while it is much more fun to &lt;a href=&quot;http://larsumlaut.wordpress.com/&quot;&gt;ridicule the situation without having to be able to back it up&lt;/a&gt;, I'll have to stick to the known facts: FAST is under investigation by the police. By stating just that, I estimate the readers of this blog to understand this as a serious warning.&lt;/p&gt;
&lt;p&gt;When I say I'd rather discuss the technology, again, I imply rather then explain (to do a good job of that &lt;a href=&quot;http://www.cmswatch.com/Search/Report/&quot;&gt;takes quite a few pages&lt;/a&gt;). So let me explicitly state some of the questions arising.&lt;/p&gt;
&lt;p&gt;Now that FAST is turning out to be more of a &lt;a href=&quot;http://en.wikipedia.org/wiki/1948_Tucker_Sedan&quot;&gt;Tucker Torpedo&lt;/a&gt; than a &lt;a href=&quot;http://en.wikipedia.org/wiki/Cadillac&quot;&gt;Cadillac&lt;/a&gt; (and to quote Wikipedia: &amp;quot;[Tucker's] Accessories Program raised funds by selling accessories before the car was even in production&amp;quot;): what is it Microsoft actually bought?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt; Customers (who were, it now seems, fraudulently overvalued, and many of which are on the Linux platform), whom Microsoft probably won't support in the long run?&lt;/li&gt;
&lt;li&gt; Street-cred in enterprise search, which is now seriously marred by the consistently negative reports about FAST, compounded by suspicions that MS may have once again been &lt;a href=&quot;http://www.portfolio.com/news-markets/top-5/2008/05/19/Microsofts-Deal-Plans&quot;&gt;blinded by competition with Google&lt;/a&gt;?&lt;/li&gt;
&lt;li&gt; A vision on enterprise search, in the form of a &lt;a href=&quot;http://www.microsoft.com/presspass/press/2008/apr08/04-25LervikPR.mspx&quot;&gt;vice president of Enterprise Search&lt;/a&gt; possibly implicated in a police investigation?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These have been widely reported, and there doesn't seem to be a whole lot more of value that came for Microsoft's $1.2 billion than technology. Which is what prompted my widely misunderstood remark that I'd rather discuss just that piece of the puzzle. Surely, it would have been possible to acquire the technology at a friendlier price point, say &lt;a href=&quot;http://www.cmswatch.com/Trends/878&quot;&gt;$23 million&lt;/a&gt; (oh wait! they got that as a free bonus), but it raises enough questions on its own. &lt;a href=&quot;http://www.cmswatch.com/Trends/1185&quot;&gt;Which parts&lt;/a&gt; of ESP aren't third party, bought, or open source? &lt;a href=&quot;http://www.cmswatch.com/Trends/1219&quot;&gt;How would these be integrated&lt;/a&gt;, or will a separate product line emerge? What are Microsoft's long-term views on enterprise search?&lt;/p&gt;
&lt;p&gt;I've criticized Microsoft's unclear strategy and roadmap for &lt;a href=&quot;http://www.cmswatch.com/SharePoint/Report/&quot;&gt;MOSS&lt;/a&gt; in the past, and once again, Redmond seems to be covered in fog. Since search solutions are written off in years, not months, knowing what future development holds for the software is very relevant to existing and prospective customers. I'd think twice before investing in an implementation until the clouds subside.&lt;/p&gt;
&lt;p&gt;On a side note, kudos to &lt;a href=&quot;http://marklogic.blogspot.com/2008/06/blind-eyes-industry-analysts-and.html&quot;&gt;Dave Kellogg&lt;/a&gt; for being the first to actually respond to my previous post under his own name. I received a fair bit of anonymous mail on this subject, and it seems paradoxical to be criticized for not speaking out loud enough by those who choose to remain incognito themselves. Though this post is not about Dave, as much as his post is not about me, I would like to point out that my name contains three A's (not two), and I'm not a 20-something English major. Yes, CMS Watch does have &lt;a href=&quot;http://www.cmswatch.com/vendormap/&quot;&gt;maps&lt;/a&gt; and &lt;a href=&quot;http://www.cmswatch.com/Feature/175-Search-2008&quot;&gt;charts&lt;/a&gt;, and no, we don't dismiss business problems -- we just prefer not to place too much emphasis on the intricate details of &lt;a href=&quot;http://marklogic.blogspot.com/search?q=GAAP&quot;&gt;GAAP accounting procedures&lt;/a&gt; on this blog...;-)&lt;/p&gt;</description>
         <link>http://www.cmswatch.com/Trends/1262-FAST-clarification?source=RSS</link>
         <category>Enterprise Search</category>
         <author>bloem@radagio.com(Adriaan Bloem)</author>
         <pubDate>Tue,  3 Jun 2008 09:43:00 -0400</pubDate>
      </item>
      <item>
         <title>CMS Watch Competition Winner</title>
         <description>You may remember a while back we launched our &lt;a href=&quot;http://www.cmswatch.com/Trends/1201-Readers'-challenge---name-our-new-chart!&quot;&gt;little competition&lt;/a&gt; to come up with a new name for our vendor positioning chart. We had some great (&lt;em&gt;and varied&lt;/em&gt;) responses from all over the world. And it took quite an internal debate to decide on the eventual winner, but decide we did. And the winner is...&lt;a href=&quot;http://wordofpie.wordpress.com/&quot;&gt;Laurence Hart&lt;/a&gt; who offered us the name &amp;quot;Cross Check.&amp;quot; Laurence, a bottle of champagne is yours. &lt;/p&gt;
&lt;p&gt;Over the next month we will continue working with our designer to revamp the chart, and of course to rename it -- so look out for the Cross Check in all our report updates this year. </description>
         <link>http://www.cmswatch.com/Trends/1260-CMS-Watch-Competition-Winner?source=RSS</link>
         <category>Enterprise Portals</category>
         <author>aps@cmswatch.com(Alan Pelz-Sharpe)</author>
         <pubDate>Mon,  2 Jun 2008 09:45:00 -0400</pubDate>
      </item>
      <item>
         <title>FAST and the Furious</title>
         <description>I vividly remember being terrified  as a child by the movie &lt;a href=&quot;http://www.imdb.com/title/tt0074285/&quot;&gt;Carrie&lt;/a&gt;.  The closing scene -- where Carrie's hand &lt;a href=&quot;http://faculty.washington.edu/kgb/horror/handfromgrave.jpg&quot;&gt;reaches out from the grave&lt;/a&gt; -- left a lasting impression.&lt;/p&gt;
&lt;p&gt;Now I'm a technology analyst, not a financial analyst, and certainly no Stephen 
  King. But sometimes it's hard to ignore what goes on in our industry beyond the realm of ones and zeros, and when &lt;a href=&quot;http://www.wired.com/techbiz/it/news/2008/05/portfolio_0529&quot;&gt;Wired&lt;/a&gt; 
  reports it (or, if you prefer to read about it in Norwegian, &lt;a href=&quot;http://e24.no/selskap/FAST/article2442276.ece&quot;&gt;Aftenposten&lt;/a&gt;), 
  I should at least casually mention it. The central character here, &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Fast%20Search%20&amp;%20Transfer&quot;&gt;FAST 
  Search &amp;amp; Transfer&lt;/a&gt;, is not exactly dead -- just acquired, by Microsoft.&lt;/p&gt;
&lt;p&gt;After &lt;a href=&quot;http://www.cmswatch.com/Trends/995&quot;&gt;enduring many difficulties last year&lt;/a&gt;, 
  FAST's problems turned out to be more than just &lt;a href=&quot;http://marklogic.blogspot.com/2007/08/fast-search-train-wreck-whos.html&quot;&gt;accounting 
  headaches&lt;/a&gt;. So much so, in fact, that the Kredittilsynet (the Norwegian financial 
  authority) found the irregularities important enough to refer the case to &amp;Oslash;kokrim.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.okokrim.no/&quot;&gt;&amp;Oslash;kokrim&lt;/a&gt; describes itself as &amp;quot;The 
  Norwegian National Authority for Investigation and Prosecution of Economic and 
  Environmental Crime.&amp;quot; A bit of a mouthful which I can abbreviate to &amp;quot;the 
  police.&amp;quot; In other words, it is now a criminal investigation.&lt;/p&gt;
&lt;p&gt;Of course, it would be thrilling to continue this story by recounting past rumors about FAST, and what &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Microsoft&quot;&gt;Microsoft&lt;/a&gt; did or did not know about it before they started talking about an &lt;a href=&quot;http://www.cmswatch.com/Trends/1112&quot;&gt;acquisition&lt;/a&gt;. Followed by an interpretation of the meaningful silences in the conference call where the two companies jointly announced the deal. And to round it off, a discourse on how this will affect the future of enterprise search for Microsoft.&lt;/p&gt;
&lt;p&gt;But in fact, I hope we've pretty much heard the last of it and can return to simply discussing the merits and demerits of the technology. So even though I have this lingering image of the last scene of &amp;quot;Carrie&amp;quot; in the back of my mind, after this short break, we'll return to our &lt;a href=&quot;http://www.cmswatch.com/Search/Report/&quot;&gt;regularly scheduled programming&lt;/a&gt;.</description>
         <link>http://www.cmswatch.com/Trends/1257-FAST-and-the-Furious?source=RSS</link>
         <category>Enterprise Search</category>
         <author>bloem@radagio.com(Adriaan Bloem)</author>
         <pubDate>Fri, 30 May 2008 01:16:00 -0400</pubDate>
      </item>
      <item>
         <title>Dining at the intersection of Search and Retention</title>
         <description>Lawyers were well represented (you might say) at this year's &lt;a href=&quot;http://www.enterprisesearchsummit.com/&quot;&gt;Enterprise Search Summit&lt;/a&gt; in New York. At times, ESS felt more like an e-discovery conference with analytics and social-computing side-tracks rather than a search conference featuring a few e-discovery sessions. &lt;/p&gt;
&lt;p&gt;Based on what I saw at the Search Summit, there seems to be a renewed awareness, at ever-higher levels in the corporate responsibility chain, that in a litigious business environment &amp;quot;enterprise search&amp;quot; is not just a knowledge-management tactic or a productivity aid, but a survival imperative. You will be sued some day. (It's not a matter of &amp;quot;if,&amp;quot; but when.) During the discovery phase of the suit, you're going to provide (and also receive from the other side) bewilderingly immense amounts of data. Without good search technology, sifting through the data isn't just tedious but nightmarishly expensive.&lt;/p&gt;
&lt;p&gt;I didn't get a chance to attend any e-discovery sessions at the Search Summit. It didn't matter. At lunch, I happened to sit down next to litigation technology consultant (and ESS presenter) &lt;a href=&quot;http://www.linkedin.com/pub/3/9A9/224&quot;&gt;Jeff Flax&lt;/a&gt;. We had an illuminating chat about search and discovery in the context of records retention. &lt;/p&gt;
&lt;p&gt;Flax noted that many companies that have records retention policies aren't following them. He sees a &amp;quot;pack rat&amp;quot; syndrome: a tendency to let expired records remain in the morgue past the &amp;quot;save-till&amp;quot; date. The problem with this is that files that have been declared obsolete or marked for disposition, but have not yet been physically destroyed, are still subject to subpoena. &amp;quot;A good lawyer will ask for expired documents during discovery,&amp;quot; Flax notes.&lt;/p&gt;
&lt;p&gt;Lawyers are also demanding data in its &amp;quot;native state&amp;quot;: Not text dumps or PDFs or other derivative forms of the data, but the data as it actually exists. &amp;quot;If I'm a lawyer and I'm requesting someone's e-mails on a certain subject,&amp;quot; says Flax, &amp;quot;I don't want the e-mails as text files, I want the original e-mail archive in binary form so I can pick apart the bits and get at all the header and footer and other information in context.&amp;quot;&lt;/p&gt;
&lt;p&gt;Sometimes physical media must be handed over in discovery so that deleted files can be detected and recovered. &amp;quot;I've seen cases where browser search queries from many years back, supposedly no longer on disk, have been recovered forensically,&amp;quot; Flax told me. &amp;quot;And then certain keyword clumps are detected, and those query patterns can become admissible in court.&amp;quot;&lt;/p&gt;
&lt;p&gt;It turns out that the data in a search index (the index built by a search engine) can often be used to reconstruct a document even after the document itself has been irretrievably lost. Takeaway: A document can't be considered fully destroyed until you've destroyed its search-index data as well. (I wonder how many retention policies take this into account? Doubtless very few.) &lt;/p&gt;
&lt;p&gt;If you're concerned about e-mail retention (and if you're not, you should be), you might want to look into our latest offering: &lt;a href=&quot;http://www.cmswatch.com/E-mail/Report/&quot;&gt;&lt;em&gt;The E-mail 
  Archiving &amp;amp; Management Report 2008.&lt;/em&gt;&lt;/a&gt;You'll find that the report divides vendors (roughly) along three lines: policy-centric, archive-centric, and SaaS-based. (You can &lt;a href=&quot;http://www.cmswatch.com/Reports/Try/&quot;&gt;see a free sample here&lt;/a&gt;.)&lt;/p&gt;&lt;p&gt;My advice? Never pass up a chance to have lunch with a litigation technology expert. You'll be inundated with food for thought. &lt;/p&gt;</description>
         <link>http://www.cmswatch.com/Trends/1251-Dining-at-the-intersection-of-Search-and-Retention?source=RSS</link>
         <category>Records Management</category>
         <author>kthomas@cmswatch.com(Kas Thomas)</author>
         <pubDate>Thu, 22 May 2008 17:41:00 -0400</pubDate>
      </item>
      <item>
         <title>Enterprise search: free as in free beer?</title>
         <description>Searching information -- really, how hard can it be? So, why wouldn't you go 
  out and get a search engine that's for free? Well, to stick to the analogy of 
  &amp;quot;free beer,&amp;quot; you might wake up in the morning with a headache, only 
  to find your wallet gone.&lt;/p&gt;
&lt;p&gt;Of course, I'm paraphrasing the &lt;a href=&quot;http://www.gnu.org/philosophy/free-sw.html&quot;&gt;definition 
  of &amp;quot;free software&amp;quot;&lt;/a&gt;. &lt;a href=&quot;http://en.wikipedia.org/wiki/Richard_stallman&quot;&gt;Richard 
  Stallman&lt;/a&gt;'s example is used to point out the &lt;a href=&quot;http://en.wikipedia.org/wiki/Gratis_versus_Libre&quot;&gt;ambiguity 
  of the term &amp;quot;free&amp;quot;&lt;/a&gt; in the English language. With free software, 
  &amp;quot;you should think of free as in free speech, not as in free beer.&amp;quot; 
  Nevertheless, you should be warned: both open source beer (&lt;a href=&quot;http://freebeer.org/&quot;&gt;now 
  in version 3.3&lt;/a&gt;) and free commercial beer have the potential for leaving 
  you with a bit of a hangover.&lt;/p&gt;
&lt;p&gt;If you really think enterprise search is a simple commodity -- and I will only 
  comment on that with the obligatory statement that readers of our &lt;a href=&quot;http://www.cmswatch.com/Search/Report/&quot;&gt;&lt;em&gt;Enterprise 
  Search Report&lt;/em&gt;&lt;/a&gt; will probably know better than that -- getting a free 
  product would be ideal to get your feet wet (albeit somewhat &lt;a href=&quot;http://www.funnyphotos.net.au/images/beer-spillage-of-the-back-of-a-truck-having-some-t1.jpg&quot;&gt;sticky&lt;/a&gt;). 
  I get invited to &lt;a href=&quot;http://en.wikipedia.org/wiki/BYOB&quot;&gt;BYOB&lt;/a&gt; enterprise 
  search parties a lot, and usually come up with Apache Lucene, IBM Omnifind Yahoo! 
  Edition, and Microsoft Search Server 2008 Express. Let's get a closer taste 
  of each.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://lucene.apache.org/&quot;&gt;Apache Lucene&lt;/a&gt;. Lucene is open source, 
  which you are free to use. The problem is, it's not a complete enterprise search 
  product -- it's a &amp;quot;text search engine API.&amp;quot; What you get is a Java 
  JAR with the core functionality of a search engine. In typical hardcore Java 
  developer understatement this is described as &amp;quot;you write the easy stuff, 
  the UI and the process of selecting and parsing your data files to pump them 
  into the search engine, yourself.&amp;quot; To developers that doesn't sound too 
  difficult -- it's a library they'd be able to use to create search functionality 
  for many applications. As they embark on that journey, however, many will find 
  out they'll have to become experts on enterprise search to get their implementation 
  to perform basic tasks any &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Google/&quot;&gt;Google&lt;/a&gt; 
  user has come to expect. Index Word documents? You'll have to convert those 
  to text first. Remove stop words or perform spell checking? You'll have to get 
  some more jars to fit that in. And that familiar &lt;a href=&quot;http://www.google.com/search?q=google+ui&quot;&gt;user 
  interface&lt;/a&gt; isn't so easy to replicate, either.&lt;/p&gt;
&lt;p&gt;Of course, there's a couple of more &amp;quot;pre-packaged,&amp;quot; Lucene-based 
  engines (such as &lt;a href=&quot;http://lucene.apache.org/nutch/&quot;&gt;Nutch&lt;/a&gt; and &lt;a href=&quot;http://lucene.apache.org/solr/&quot;&gt;Solr&lt;/a&gt;), 
  but they'll only take you so far on that long and winding road. There's some 
  excellent examples of what you can achieve with Lucene, but many more of how 
  hard it can be to get there.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.cmswatch.com/Trends/802-IBM-and-Yahoo-to-Offer-Free-Enterprise-Search-Engine&quot;&gt;IBM Omnifind Yahoo! Edition&lt;/a&gt; (or OY!E). The Google appliances have the Google brand behind them, which must have got the IBM people thinking the Yahoo! brand would be excellent marketing for their free-to-use search engine. In fact, it's neither IBM nor Yahoo's technology, but Lucene wrapped in other open source software. A few commercial bits thrown in create a product that's easy to install and run. It will actually do many of the things Lucene will make you work hard to accomplish: it comes with support for several languages and quite a few source content filters. For users, it looks like a regular web search engine; for admins, there's a nicely designed and intelligible interface. In short, it does most of the things a Google Mini appliance will do -- but for free.&lt;/p&gt;
&lt;p&gt;So what's the catch? Well, the license (by the way, &lt;a href=&quot;http://www.google.com/search?q=%22IBM+Omnifind+Yahoo%21+Edition%22+license&quot;&gt;what 
  license&lt;/a&gt;?) limits you to 500,000 documents and 5 collections. After that, 
  you can &amp;quot;upgrade&amp;quot; to other Omnifind products. But since the technology 
  across the Omnifind line-up is completely different, this is the same as starting 
  from scratch, and you'll pay for the privilege. I've been critical of &lt;a href=&quot;http://www.cmswatch.com/Trends/1122-Google-Search-Appliance:-small-step-in-technology,-giant-leap-in-marketing&quot;&gt;the 
  limitations&lt;/a&gt; of Google's appliances in the past, and sure, the 50,000 document 
  limit of the entry-level Google Mini is a lot less than OY!E's half a million. 
  But that comparison isn't really fair, considering the fact the Mini actually 
  comes with the hardware to run the queries on &lt;a href=&quot;http://www.googlestore.com/appliance/product.asp?catid=3&quot;&gt;for 
  a mere $2,990&lt;/a&gt;. And don't think you'll be able to run IBM's software on an 
  old abandoned test server you have available -- OY!E will need more power than 
  the single blade Google Mini or &lt;a href=&quot;http://www.cmswatch.com/Trends/389-David-to-Google's-Mini-Goliath?&quot;&gt;Thunderstone 
  Appliance&lt;/a&gt; to match the performance. Tellingly, I wasn't able to dig up an 
  example of an OY!E implementation to mention while researching the &lt;a href=&quot;http://www.cmswatch.com/Search/Report/&quot;&gt;&lt;em&gt;Enterprise 
  Search Report&lt;/em&gt;&lt;/a&gt; (if you know of one, let me know).&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.cmswatch.com/Trends/1064-Microsoft's-Free-Lunch&quot;&gt;Microsoft Search Server 2008 Express&lt;/a&gt;. Microsoft's free offering is basically the same software as the non-Express version, but then there's the seemingly innocent limitation: one server only. I wouldn't want to continue the theme of this post by saying this is akin to handing out free samples of beer to get you hooked; suffice it to say that if you start to run the Express version in a production environment, there will, no doubt, come a time when a single server won't be enough anymore. When you've come to rely on the solution, you'll suddenly have to shell out for the licenses. &lt;a href=&quot;http://www.cmswatch.com/Trends/1064-Microsoft's-Free-Lunch&quot;&gt;As I've said before&lt;/a&gt;, having a free lunch isn't necessarily a bad thing; just remember that you'll probably have to pay for the beer the lunch comes with.&lt;/p&gt;
&lt;p&gt;So, this might all start sounding like advice your mother gave you: never take 
  anything from a stranger, and certainly no free alcoholic beverages. Don't forget, 
  however, that I'm Dutch, and I've certainly developed &lt;a href=&quot;http://amstellight.com&quot;&gt;a&lt;/a&gt; 
  &lt;a href=&quot;http://www.heineken.com&quot;&gt;taste&lt;/a&gt; &lt;a href=&quot;http://www.grolsch.com/&quot;&gt;for&lt;/a&gt; 
  enterprise search. Free beer sounds too good to be true, but it could certainly 
  get your party started; just remember to drink in moderation, and never, ever, 
  drink and drive.</description>
         <link>http://www.cmswatch.com/Trends/1247-Enterprise-search:-free-as-in-free-beer?source=RSS</link>
         <category>Enterprise Search</category>
         <author>bloem@radagio.com(Adriaan Bloem)</author>
         <pubDate>Fri, 16 May 2008 13:23:00 -0400</pubDate>
      </item>
      <item>
         <title>Vendor criticism of CMS Watch</title>
         <description>As you know at CMS Watch we write critical product evaluations to help you avoid expensive procurement and deployment mistakes. We write reports that detail both the warts and merits of big vendors like &lt;a href=&quot;http://www.cmswatch.com/ECM/Vendors/Documentum%20(EMC)&quot;&gt;EMC&lt;/a&gt;, &lt;a href=&quot;http://www.cmswatch.com/ECM/Vendors/Oracle&quot;&gt;Oracle&lt;/a&gt;, &lt;a href=&quot;http://www.cmswatch.com/ECM/Vendors/Xerox&quot;&gt;Xerox&lt;/a&gt; and &lt;a href=&quot;http://www.cmswatch.com/ECM/Vendors/IBM&quot;&gt;IBM&lt;/a&gt; -- through to smaller specialist vendors like &lt;a href=&quot;http://www.cmswatch.com/ECM/Vendors/Hyland&quot;&gt;Hyland&lt;/a&gt;, &lt;a href=&quot;http://www.cmswatch.com/Search/Vendors/Autonomy&quot;&gt;Autonomy&lt;/a&gt; and &lt;a href=&quot;http://www.cmswatch.com/ECM/Vendors/Nuxeo&quot;&gt;Nuxeo&lt;/a&gt;. Readers of our reports often ask me &amp;quot;&lt;em&gt;what did vendor x say when they read &lt;u&gt;that&lt;/u&gt;!&lt;/em&gt;&amp;quot;   The assumption, sometimes correct, is that vendors freak out on reading such criticism. &lt;/p&gt;
&lt;p&gt;In an industry whereby most of the &amp;quot;&lt;em&gt;independent analysts&lt;/em&gt;&amp;quot; are heavily dependent on revenues from the very firms they claim to be &amp;quot;&lt;em&gt;independent&lt;/em&gt;&amp;quot; of, it's unusual to see truly critical research get published. So it becomes a surprise to both buyers and sellers when they read such criticism. In our reports we widely distribute the compliments and brickbats -- if something is truly terrible we will tell you.&lt;/p&gt;
&lt;p&gt;But most of the time it is not a case of bad technology versus good technology. Rather it is a case of good fit versus bad fit: a product that could become an outstanding performer in a larger legal firm may make a terrible fit in a mid-sized manufacturing and ERP-centric environment. Hence we urge you the  reader to study all the alternatives and balance them out, rather than look at one preferred vendor in isolation.&lt;/p&gt;
&lt;p&gt;Speaking of isolation, the marketing groups of some vendors seem to operate in in a kind of vacuum. I guess it's part of the job for them to drink their own Kool Aid, but some of them seem to think it's part of their job to attack and stop &lt;em&gt;any&lt;/em&gt; criticism of their product or company. At CMS Watch we're often on the receiving end of that wrath; that stinks sometimes, but so be it. Just as it is the vendor's job to wax lyrical about the joys of their product, so too is it ours to unearth the reality. If you want to get an insight into this particular dynamic, whether you're a curious end user or a vendor AR (Analyst Relations) person, check out &lt;a href=&quot;http://www.cmswatch.com/Feature/178-Analyst-Relations&quot;&gt;the article&lt;/a&gt; I published today. </description>
         <link>http://www.cmswatch.com/Trends/1234-Vendor-criticism-of-CMS-Watch?source=RSS</link>
         <category>Enterprise Portals</category>
         <author>aps@cmswatch.com(Alan Pelz-Sharpe)</author>
         <pubDate>Fri, 16 May 2008 10:43:00 -0400</pubDate>
      </item>

   </channel>
</rss>

