PLink - Search Blog: 2007

Thursday, July 5, 2007

Web Design Made Easy

PLink - Search Engine, Blogging, Content Management, and more.

PLink API

PLink Blogs can now be implemented on your website with three lines of code, using the new PLink API. This means portions of your website can be created and edited as the sections of a PLink document, with all of the fancy pants editing features that go along with this: including Textile markup support, tables, images, bibliography entries, etc. Edit your blog on plink-search.com and sections of your site implementing the API will be instantly updated — no need to sift through HTML to find what you want to update, and PLink documents can be edited from any computer! (with your PLink login and password). Sections of a PLink document integrate seamlessly with your new or already established site, as with this example of a PLink document imported into Marie Alighieri’s website.

Example on mariealighieri.com

Using the PLink API your PLink Blog is imported one section at a time. The library is currently available in PHP. Java™ and Ruby versions will soon follow. An example of the PHP library follows:


  plinkInclude("username","password","Section Title");
  ?>

Where username and password are your Plink login information, and Section Title represents the name of one of the section in your Plink Blog.

You can use style sheets to modify the style of the imported section, please note that for your convenience images and tables are given the div class ‘figure’. Also the link to PLink at the bottom of the incoming document is wrapped in the div class ‘plink’.

Enjoy the update,

Ben.

Download
PLink API For PHP
PLink API For Ruby

Notes on Ruby

To use the Plink API with your Ruby on Rails application add the file plinkapi.rb to the lib/ directory of your application. To use the API within the application add the following to your application controller or the controller of your choice:

require "plinkapi"


class ApplicationController <>

To put the content of a Plink section onto your page add the following line to the Views you want the content to appear in:

<%= plinkInclude( "username", "password", "section" ) %>

Friday, June 22, 2007

PLink - Search Engine, Blogging, Content Management, and more.

DolphinNet™ is an easy to use server and distributed processing library for Java™. Like RMI, DolphinNet™ uses objects to communicate between a client and server. DolphinNet™, however, approaches this task using object serialization over a standard TCP/IP connection.

DolphinNet™ was designed by me for a 4^th year research course, and was originally intended for facilitating distributed processing tasks (document clustering to be exact), it has since been put to use in several different ways:

A chat server/statistics collection system for a marketing research study.
The back-end for an online applet based poker game.
The client server system for the partially complete Vulgate MMORPG.
A P2P web-browser and HTML renderer.
In conjunction with Tomcat, it provides PLink’s search functionality.
- Including the website indexing, which is done by a client.

DolphinNet™ has been used by me and my peers for several years now, but I have never made it public. I’ve recently decided to make it available with a GPL license, and feel that it could be a useful tool to any Java™ programmers out there, who are looking to add some great OOP client/server software to their programming arsenal.

DolphinNet Main Page

PLink Textile

PLink - Search Engine, Blogging, Content Management, and more.

Several people have been asking for the ability to perform richer editing on the text in paragraph elements — don’t worry, I’m always open to new ideas. Rather than make a proprietary PLink markup, I have implemented the markup used by Dean Allen’s Textile™. Textile is easy to use, and quite powerful. This having been said, I wasn’t satisfied with the Java™ Textile libraries that already existed, so I have been busy programming my own parser for its specification.

Which brings us to PLextile™, a complete Java™ library for parsing Dean Allen’s Textile which has been implemented by the PLink team. Unlike other existing Java™ implementations of Textile, Plextile™ supports all of its major features:

PLink Textile

iPlink for iGoogle

PLink - Search Engine, Blogging, Content Management, and more.

Just in time for iChristmas, we at PLink have thrown together a simple little widget that lets you post notes to any section of your PLink documents from your iGoogle homepage. These notes are inserted as a paragraph element, and are quite useful for roughing out your blogs/documents/manifestos from a remote location – potentially whilst looking up stuff on Google. You can of course use Plextile Textile during these posts.

iPlink for iGoogle

A Techy Blog for the Techy Masses

What started as a CIS project for creating technical documents has – over a span of two years – become PLink. Several months ago a corporeal voice said to me – it might have been my own – “Ben, why don’t you apply some of the research you’ve been doing, and make a search engine?” The idea (essentially) was to make a search engine that relied on voting to rank search results.

To make a long story short, during the process of making this search engine, I realized that PLink would be a good venue to dust off some old code I had been working on for creating online technical documents – this became somewhat of an obsession. I began outlining the sorts of features that I would enjoy in blogging software, with an emphasis on the ability to create technical documents online....

A Techy Blog for the Techy Masses

Saturday, May 12, 2007

PLink Online Document Creator

PLink - Search Engine, Blogging, Content Management, and more.

Those who have read my previous posts know that I've recently been working on a search engine project. I had the idea recently to re-purpose content management software I made for a university project, to provide users of PLink with a way to create online documents. The software was originally made for creating technical documents (for CIS courses), and is ideal for similar documents such as C.V.s. The documents themselves and individual sections from the documents can be saved as PDFs, making it a fairly useful tool. I'll continue to keep people posted with new updates regarding this ongoing project, located at:

http://www.plink-search.com

Ben.

Sunday, May 6, 2007

PLink - The People's Search Engine

PLink - Search Engine, Blogging, Content Management, and more.

Using some of the libraries I was playing with around the end of the semester, I've put together a DIGG style web application.

http://www.plink-search.com

Note: Search results aren't spectacular yet this will change as more sites are added and more users are voting.

It's meant to be used as a general-purpose search engine (think Google), rather than as a news aggregator like DIGG (the entries aren't ordered by date, or deleted after a period of time). It's hoped that a user based voting system can help promote good search results - hopefully better than the methods, like link counting, that Google uses.
Regardless of whether enough people start using PLink to give it a true test (who knows if my server can handle that even), it was a fun experience making an application like this. It brought together a lot of stuff I’ve done in the past. If you’re curious, I think I’ll go into a bit more detail – well regardless of whether you’re interested actually.

Search Engine 101:

Indexer:

I used the same networking libraries that I used in my previous project Coezilla, this allows for multiple clients to index websites and to report them back to the central server. This is neat, but turned out to be somewhat redundant, given the quality of the computers I’m running it on - the bottleneck ended up being writing to MySQL and to the disk. Still, this approach is good because it allows you to move some processing away from the main server computer (which, if you're making a search engine like me, is already pretty bogged down).

Parser:

This was one of the harder things to get off the ground. HTML is not guaranteed to be properly formed, and to apply all the transformations to the data that I wanted; I needed (preferably) to get stuff parsed into a Document Object Model (DOM). I ended up using the CyberNeko HTML Parser, in conjunction with the Xerces XML parser. This worked great. Once stuff is in a DOM, you just traverse the elements in the HTML document like a tree.

NLP:

Natural language processing is a complex field. But, I’ll give you a two-minute primer.
I didn’t want to get too complex, in favor of letting voting determine search results. I started by collecting all the text data in the page I was indexing into one big chunk. Having done this, I extracted data one word at a time and applied the following transformations to it.

Stop Word Removal: Lots of words in the English language are really common and don’t provide much information about a web-page (the, at, him, etc.), it’s alright to just remove these words. You do this by using a big list of such words.

Stemming: Lot’s of different words have a root in common, e.g., cat, cats, cat’s. You can apply a stemming algorithm to extract this common root, rather than treating them as different words. Eliminating as many terms as possible can reduce a lot of the complexity in search engines. I used a stemmer called Snowball (which is an implementation of the Porter stemming algorithm).

While applying these two steps, I collected the words into a big alphabetical list, keeping track of the frequency of each word. This list is used to determine the importance of each word to the given web-page.

Applying a Distribution:

Web-pages have significantly variable amounts of information on them. I applied the Poisson distribution to the frequencies in the alphabetical list of words. This helps ensure that we can compare a ten thousand word website to a one hundred word website.

Once all these steps are applied to a given web-page, it is shipped off to the central server, where it is written to disk… using an,

Inverse-Index:

If you give me a big ordered list of words, and point me to the middle, I know I can eliminate either the upper or lower part of the list depending on the word I’m searching for. This process is extremely efficient, and is used to drive searching. Rather than maintain a big list of web-pages (like a phone book), I maintain a big list of words (like a dictionary). If you search for a given word, I retrieve a list of all the pages that have this word in them – this list of words is ordered by each sites importance to the given word. Using this efficient data structure millions of words can be searched, relatively instantaneously.

Voting:

When you vote with Plink, it allows you to shift sites rankings around on given words – in these lists that I mentioned previously. It is hoped that, with enough people doing this, it would lead to good search results.
Well, there you have it, now go out and make your own search engine, or, better yet, go sign-up for a PLink account… and start voting.

http://www.plink-search.com

Thursday, July 5, 2007

Web Design Made Easy

Web Design Made Easy

Friday, June 22, 2007

PLink Textile

iPlink for iGoogle

A Techy Blog for the Techy Masses

Saturday, May 12, 2007

PLink Online Document Creator

Sunday, May 6, 2007

PLink - The People's Search Engine

Blog Archive

About Me