Introducing Collecta!
Collecta is a real-time search engine that boasts a fantastic user interface and innovative applications that stretch its versatility behind what one would expect of most RTS engines. Below is a brief rundown of how it works, what to use it for, and some of the fun features. — Quality is a bit hazy, sharper upload in the works! —
The following is a breakdown of my experience with the real-time search engine Collecta.
For the purposes of this analysis I have hijacked some criteria from the Web Search Slides. This should help me remain more objective so I won't just be rambling on about the features I most enjoyed. Likewise, I'll fix a number out of ten to each aspect discussed so I can later compare this engine to a RTS competitor.
Collecta Background
Collecta is a full text search engine that uses a common syntax system to search blogs, articles, news, images, pages, comments, and tweets. This means that Collecta examines the full text of documents on the web for the keywords you input as opposed to searching through meta-information beneath a directory system.
It utilizes what I will call universalize syntax; + for and, - for or, "search term" for exact phrases, etc. On the other hand more advanced syntax like inurl: or intitle: are either unread by the engine or ineffective in RTS. Essentially, searching with any syntax of this type nets zero results.
Although it can be used as a crude tool for exploring and monitoring changes in webimpressions of different topics it primarily serves to generate current, up to the minute query results. The uses of such a focus will be touched on later.
That is the bare bones breakdown of the Collecta engine and how it works, now I'd like to discuss how it performs.
Query
Variety and Usefulness of Special Queries
This was perhaps the most disappointing aspect of the Collecta engine. While the engine supports enough special queries for most casual users, many advanced syntax features are ineffective. As mentioned above the use of inurl: or intitle: nets zero results. The cause of this is ambiguous because there is no information on the Collecta page about what syntax is supported. So either these features are not programmed, or the fact that it is real time social media engine means these special queries limit results to zero. My research suggests the later though, even when using top mind of mind, hot button issues with special queries I see zero results.
Surely inurl.com shouldn't limit our results on 'healthcare' right? This leads me to believe syntax like this is not supported. This includes other field limiting devices like 'site:' or 'link:'.
It does support standard syntax for and/or and specific phrases though as evidenced by these queries producing materially different results when run at the same time.
Obama healthcare
Obama - healthcare
"Obama healthcare"
Curiously though, using and/or does not serve the same function as +/-, it just includes it in the keyword search.
These are major shortcomings in a search engine usually, but given this is an RTS system so results are inherently limited, foregoing these features does the engine only a small disservice.
Automation
As far as I can tell the engine is automating entirely on the basic of publication time. This is common of RTS engines so I won't judge it too harshly, however it does seem to me to be shortcoming. Clearly when using an RTS engine the goal is to generate current results, but does the user really garner anything from knowing one twitter comment was made 4 minutes before a blog comment elsewhere? I'd suggest that the site should be automated by top trafficked results, or via some other indexing feature and the result stream should be a fixed time period. So when I search 'Obama + healthcare' for instance the top results would be the most reputable blog posts, highly trafficked pictures, and most followed twitter accounts addressing the issue in the past, hour maybe. To me this seems much more useful.
While RTS engines are inherently different in function and use than a traditional engine I don't think this is any reason to drop the standards of performance we expect from them. Automation and Special query performance may not be as useful here, but they could certainly improve the quality of results to some extent and for that reason I give query a 3/10
Search Engine
Quality of the Experience
This is where Collecta really excels. Below is link to the homepage and a shot of the interface upon search.
Its extremely clean and easy to use. There is little clutter on the pages to distract from the search. Additionally it's set up in a logical, easy to follow 3 column pattern. The leftmost is where your search begins with keywords and drop down filters. The middle column is the constantly refreshing stream of results that can be panned up and down through time independently of the other two columns. The rightmost column is preview of the result you have highlighted in column two. This is an especially becoming feature because it allows you to sift through your results in depth without ever leaving the search page.
Responsiveness
Responsiveness for an RTS can be measured on two dimensions; speed of initial search, and speed of stream. The first is easy to measure, it is simply the time between pressing enter and the appearance of your results. For Collecta this varies between 5-10 seconds, which is well beyond powerfull engines like Yahoo! and Google, and even lagging behind similar engines like Scoopler. On this dimension Collecta is lagging.
Measuring the speed of the RTS feed is tougher because lags can either be a lack of content being uploaded to the web or a function of a slow search. When searching the 'hot now:' queries found right below the query box I am seeing tweets dating no more than a minute behind my search and the flow is relatively constant. There is little work out there comparing the feed speeds of RTS social media engines, but this seems at the very least to be sufficient in my opinion. In fact, I would say Collecta excels at this having seen some of the lag issues with RTS twitter engines in our class experiments.
Overall Collecta sports an amazingly pleasant to use, and aesthetically pleasing user interface making for a fun search experience. However, its engine is a bit slow in generating the initial query results. For Search Engine I give Collecta a 9/10 given I think that the quality of the experience plays a more important role than responsiveness. We are generally forgiving of untimely technology if we are having a pleasant time using it.
Results
Content
As mentioned Collecta delivers results from the following categories
*Stories
*blog posts, articles
*Comments (on blog posts)
*Updates
*Twitter, Jaiku, Identica
*Photos
*Flickr, TwitPic, yFrog
*Videos
*YouTube, Ustream
The width of the content covered is admirable and the sources searched are smart. I think if Collecta searched for this same type of content from less reputable, or trafficked sites you would run into problems. The video and image quality would vary, the articles would be less reliable, etc. The only thing I can think of that might be usefully integrated is page change results. Often news manifests itself as changes to existing sites, think wikipedia edits, so if you had an engine that could monitor real time web edits this too would be applicable.
Format of Results
The format of the results leaves a little to be desired. You'll notice there is no option to search similar pages, the results url is not listed, and the preview is quite limited. These are all things satisfied by the preview window in the third column, but it might be nice see them formatted into the second column as well for easy scanning.
Delivery Form
Results are delivered in the browser in a vertical stream format between the search and preview columns. They are organized by timestamp and the source is indicate by a picture icon denoting whether its a twitter post or blog entry for instance. The delivery form is suitable for an RTS engine. Collecta has gone one step further than other engines in this regard though. They have developed a Firefox application which builds into the browser a one-click interface that generates results related to what it is currently on your page. This is a unique stride in delivery form and something especially applicable to RTS.
Collecta is very versatile in the content it searches unlike some other RTS engines, and its results format is adequate. It really excels on delivery form with the addition of the FireFox app, so on the Results dimension I give it an 8/10
Searchable Information
How frequently updated?
As mentioned Collecta's search update time is very fast, less than a minute for hot button searches so it is delivering on that aspect. As for mechanical updates to the engine I found through their blog that material changes and improvements to the site are being made closely to weekly. Additionally, the sites it indexes is constantly update through user feedback in their forums. It is refreshing to see so much user feedback go into the sites improvement, and after looking at the credentials of the support team found in the "About" section I'm confident this engine will become the leader in RTS social media engines.
**On this dimension I give Collecta an 8/10 for constantly looking to improve.*
Subset of the Web
Target
Collecta targets only social media sources. This means the engine neglects all websites that would not be considered news delivery pages or social networking/sharing sites. This is not really a criteria for search engine quality so much as a descriptor so I won't say anything about the applicability of this target.
Quality of Coverage
The coverage is less than impressive. While it hits the leaders in each category it searches (Stories, Comments, Updates, Photos, Videos), it leaves out quite a few highly trafficked reputable sites in each category. Blogs for instance are only pulled from WordPress, and not sites like Blogspot.com or other niche leaders. Comments are derived from the blog sites searched so the same problem exists here. Additionally, comments are not pulled from the news sites that are tracked for the stories portion which seems to represent a disconnect as well. Updates are pulled from twitter and other similar sites, but Facebook is left out which I find to be just as good a place to pick up on the buzz of prevalent events. Additionally, image sites like photobucket are left out as well. Finally, the video search is extremely limited, metacafe, hulu, and other well trafficked sources are all not included.
Opacity
Collecta is quite clear about the sources it derives its results from. Next to the search filters you'll notice a breakdown of the primary pages related that category (listed above in Content section). Collecta does add various other sources to its index as requested by users, but at the moment these sources are far and few and consitute a very small segment of the overall results. While the results list itself doesn't include the sources the preview column clearly marks the source on the bottom.
In grading the Subset of the Web criteria I think that the Quality of Coverage is of utmost importance given it represents the engines true purpose. The fact that Collecta is so limited on this dimension is unsatisfying and for that I give it a 3/10
Collecta Scoreboard
| Criteria | Score |
| Query | 3 |
| Search Engine | 8 |
| Results | 8 |
| Searchable Info | 9 |
| Subset of Web | 3 |
| Total | 31/50 |
Scoopler Comparison
Scoopler Background
Scoopler is an RTS engine, and is Collecta's nearest competitor. Like Collecta, it aggregates tweets, blogs, and articles. While I found it to be a sufficiently powerful RTS engine, it returns slightly different content than Collecta and falls short of Collecta's interface. I would use it for casually browsing news because it has directory like categories from business to lifestyle - something Collecta lacks. And when browsing these categories the related tweets automatically propagate on the side bar, so its a unique combination of hard and soft news on one interface.
Query
Scoopler performs on par with Collecta as far as syntax. Behind the basics, +/- for and/or — no syntax is built in. Things like inurl: and intitle: do not work.
However, its automated by the number of 'shares' each result has which suggests they are doing what I recommended above. They are ordering by popularity within a fixed time period, as opposed to just ordering by time published.
On the basis of automation, I give Scoopler a 5/10
Search Engine
Tends to load a bit slower than Collecta, and results on the news/article bar are much older than the average result at Collecta. Part of this is because they are sorted by 'shares' vs. publish time.
Overall the interface is not as pleasing as the Collecta design, but it does score points for dividing out the different types of results (something Collecta doesn't do unless you filter them out altogether).
Scoopler scores are 7/10 here for lacking the aesthetic that Collecta's clean interface has
Results
Collecta and Scoopler diverge the most on content. Collecta aggregates comments, tweets, blogs, videos, and articles from major news sources. Scoopler on the other hand is more limited, and more expansive in different respects. It aggregates only tweets as far as user feedback - opting out of collecting blog and article comments, and it also ignores images.
On the other hand its news/article search engine is not limited to major sources like Collecta's is, you'll find articles from all kinds of sites related to your search. While this increases results, it may also decrease relevance/validity of said results — this is a very normal trade-off.
The format, as mentioned is pretty nice, ample information about the result with a small first paragraph preview and a viewing pane within the site for looking further. Additionally, the idea of pairing soft results (tweets) with hard news is kind of nice. You get a real action-reaction look at each news event.
**There is really no clear winner in this category, its give and take, with Scoopler giving wider news/article results and Collecta wider public opinion results.. 8/10
Searchable Information
Scoopler like Collecta is obsessive about improvement. Browsing the blog I see tons of new features being tested, and unlike Collecta, Scoopler has secured some seed funding. I expect major improvements to be made to this site in the coming year.
There is also a small feedback tab that follows your scroll bar allowing you to click and submit a few quick words whenever you have a suggestion. All this means we'll see frequent improvements to the searchable information on this site in the near future.
Scoopler edges out Collecta with a 10/10 here for showing promise to advance
Subset of the Web
Scoopler makes its sources very clear, it hides nothing. The real place that Scoopler shines over Collecta is its quality of coverage. It includes many topically relevant, but non-major news sources which Collecta lacks on. It also covers a wider array of blog sites, vs. Collecta which just scours WordPress.
Scoopler earns a 6/10 here for superior Quality of Coverage
Scoopler Scoreboard
| Criteria | Score |
| Query | 5 |
| Search Engine | 7 |
| Results | 8 |
| Searchable Info | 10 |
| Subset of Web | 6 |
| Total | 36/50 |
Looks like Scoopler edges out a win over Collecta









