Ideas for a Better Search Engine
A collection of ideas to improve search.
A search engine may be able to retrieve all pages relevant to a query, but the monumental task of filtering the results still remains. While ranking criteria like PageRank may help, such criteria can be gamed by Search Engine Optimization (SEO) companies. One of the results of this is that when searching for a popular topic, the first dozen pages or so will only be populated by companies that are either extremely popular or have performed such "optimizations". This replaces the thrill of discovering a new website with the monotony of dredging the same sites for new content. Exploration in the of the staggering amount of content on the web can be made more manageable and equitable by providing a user the ability to more easily define filters.
One approach is to allow users to maintain a blacklist of domains that they do not want in their results. Similar to how software such as Adblock Plus and NoScript work, this would allow users to eliminate results that they would skip over. While this could be done currently by specifying negative search criteria through the - symbol, applying more than a few rules would be cumbersome to do and difficult to maintain. Users could also maintain different sets of rules depending on what they are looking for, and could also use rulesets maintained by the others.
For example, if someone is looking for independent reviews of an album by authors other than magazine writers, they can set up a blacklist that filters out sites such as Rolling Stone, Pitchfork, and NME. This can be layered on top of another filter that removes sites where the album can be purchased (which may have reviews in the comments).
Of course, this could also be applied to whitelisting sites.
Similar to the categorization of works of art, sites may be classified by genre. A rudimentary way to implement this currently is to use filters such as "site:*.edu" to match on sites that belong to a university. This can be refined further by trying to determine what general category a site or page falls into. Applicable genres may include:
- Small Vendor Marketplace
- Discussion Forum
- News Aggregator
- Wiki and Reference
- Open Source Software Project
- File Repository
- Personal Blog
- Corporate Blog
Such a categorization may be best implemented by using machine learning techniques where a set of pages is categorized by hand, and then this set is used to calibrate an algorithm which classifies pages by genre.
Genres would also be orthogonal to subjects found in web directories. While the subject of a website may be "Art", applicable genres may be "Blog", "Gallery", and "Store".
first published: 2015-03-17 0041 EDT
last updated: 2015-03-17 0041 EDT