$VAR1 = {
'blog' => [
bless( {
'file' => '/home/karpet/blog/projects/swish.pod',
'format' => 'pod',
'id' => 'swish.pod',
'mtime' => 1159586132,
'name' => 'swish',
'text' => '=head1 The State of Swish3
I\'ve been working on the next version of Swish-e (codename: Swish3) for about a year now.
Squirreling away hours in the late evening after the kid is asleep, with one ear on the TV
my wife is watching in the other room, and eyes on the screen in here. I\'ve been learning
C, UTF-8 and Perl\'s XS glue language. It\'s been a very stretching year.
This little corner of the blog will record my progress.
',
'url' => 'projects/swish'
}, 'PodBlog::Model::Blog::Entry' ),
bless( {
'file' => '/home/karpet/blog/projects/swish/progress3.txt',
'format' => 'txt',
'id' => 'progress3.txt',
'mtime' => 1208662934,
'name' => 'progress3',
'text' => 'Swish3 Status 19 April 2008
There\'s been quite a bit of activity in the last month.
<ul>
<li>The C++ Xapian example now can search as well as index, and there are Perl
equivalents using Search::Xapian checked into svn as well. The C++ code
will read/write the swish.xml header; the Perl does not (yet).
</li>
<li>
The meta/prop id unique check now uses a hash for quick look up.
</li>
<li>
The test suite for libswish3 is totally restructured. Now using Perl\'s
Test::Harness and added a slew of new meta/prop tests. Alongside that
were additions to the NamedBuffer debugging output to print each
substring in the buffer.
</li>
<li>
Several new string-related utility functions for converting ints to strings
and back. Also a new config hash for configration options that use a StringList
instead of a simple string.
</li>
<li>Fixed some mem leaks in the example .c programs and added more info to the
swish_lint usage() output (including reminders about the various SWISH_DEBUG*
env var values).
</li>
</ul>
There are still several parser features yet to be implemented to support the Swish-e
2.4 config options, but those will likely take a backseat to getting a working
swish3 Perl script running with SWISH::Prog and SWISH::Prog::Xapian.
',
'url' => 'projects/swish/progress3'
}, 'PodBlog::Model::Blog::Entry' ),
bless( {
'file' => '/home/karpet/blog/projects/swish/api_docs/swish_intro.7.pod',
'format' => 'pod',
'id' => 'swish_intro.7.pod',
'mtime' => 1208317559,
'name' => 'swish_intro.7',
'text' => '=pod
=head1 Introduction to Swish3
Swish is the Simple Web Indexing System for Humans. Swish
is an information retrieval tool. It is B<not> a search engine, but
can be used as an integral part of creating a search engine. Swish gathers,
parses, indexes and searches document collections. A collection can be any
set of real or virtual documents: web pages, database rows, PDFs or office
files, or anything else that can be converted to text.
Swish3 is version three of Swish.
Kevin Hughes wrote the original version in 1994. In 2000, the project was
updated and released as Swish-e version 2 (the -e is for Enhanced). Swish3
is the third phase in the evolution of the project.
In this document, the name C<Swish> will refer to the entire project,
without regard to a particular version. C<Swish-e> will refer specifically
to version 2.x. C<Swish3> will refer specifically to version three.
=head2 Anatomy of a Search Tool Chain
The following description could apply to any search system or information
retrival project, not just Swish. First we\'ll look at the various
parts of the system, then look at how they are implemented in Swish3.
Every search system implements the following chain of features:
=over
=item aggregator
An aggregator assembles documents into a collection. It can be as simple
as a filesystem tool like the Unix B<find> command or as sophisticated as a
web crawler. An aggregator selects documents based on various criteria:
content, MIME type or format, date, author, URL, or any other criteria
that you desire.
=item normalizer
A normalizer verifies that all documents the aggregator collects are in a format
that the analyzer can parse. For example, a binary file format like PDF is
converted to HTML or unmarked text. The same is true for all office file formats,
PostScript, etc.
=item analyzer
An analyzer examines the text supplied via the aggegrator/normalizer steps.
The analyzer does several things, some of them optional:
=over
=item parsing
Separates text from any surrounding markup, optionally
remembering the context (tag) in which text was found.
=item case folding
Changes the text to all lowercase or all uppercase, to make comparisons
easier.
=item tokenizing
Splitting a string of text into tokens or words.
=item stemming
Using one of a variety of word-stemming algorithms, tries to discover the root
C<stem> of each word.
=item customization
Many advanced analyzers offer some level of customization to apply at some
point in the analysis, whether it be synonym matching or other linguistic
logic.
=back
=item indexer
An indexer stores basic document metadata and token (word) information
in an index for fast and efficient retrieval.
=item searcher
A searcher parses a user query using the same logic used by the analyser
when processing the original document collection,
applies some well-defined rules for matching documents in the index,
and then returns results, typically a list or iterator of matching documents.
=back
Now let\'s look at how Swish3 implements these five features.
=head2 A Library, Not a Command
The first thing to know about Swish3 is that, unlike previous versions of
Swish, there is not a single Swish3 implementation.
That might sound confusing at first, because it is a significant
departure from earlier versions of Swish, where there was a primary
program, written in C, which handled all five links in the search chain.
Swish3 takes a different approach.
Swish3 is primarily a C library called B<libswish3>. The library has a
well-defined list of public functions and data structures that aim
to fill a particular void in the world of information retrieval tools:
analyzing HTML and XML documents.
Swish3 takes as its starting point the B<-S prog> feature of Swish-e,
where you can define your own aggregator/normalizer program, and makes that
Swish3\'s central feature. Swish3 extends the B<-S prog> API to include
additional header values, and adds the same MIME-type-matching feature
as the popular Apache web server.
Swish3 has no native indexer or searcher features [TODO: this might change
if the 2.6 BDB backend is ported]. Nor does it have any aggregator or normalizer
features. Swish3 is primarily an analyzer.
The Swish3 distribution does come with some examples of how to write Swish3
applications, including an example program for using the popular Xapian
library. And there is a Perl implementation based on the SWISH::Prog package.
=head2 So How Does It Work?
libswish3 defines hooks or callbacks where you can override the default
behaviour of the analyzer. These hooks are intended for making it easy to
plug libswish3 into the indexing chain.
Here\'s one example. If you wanted to index a web site, you might use an
aggregator/normalizer tool like Swish-e\'s B<spider.pl>. spider.pl will print its
output on stdout.
% spider.pl your_config > spider_output
Then you could use a program like B<swish_xapian> to analyze and index the
output:
% swish_xapian -c swish.conf - < spider_output
If you look at the source for the B<swish_xapian> program, in
the libswish3 distribution, you will see that there is a B<handler> function
defined that takes the output of the libswish3 parsing function and
adds it to a Xapian index.
=head2 See Also
This document provides an overview of Swish3\'s anatomy. You might also be
interested in these docs:
=over
=item
L<Migrating from Swish-e to Swish3|swish_migration.7>
=item
L<Perl implementation of Swish3|SWISH::Prog>
=item
L<libswish3 API|libswish3.3>
=back
',
'url' => 'projects/swish/api_docs/swish_intro.7'
}, 'PodBlog::Model::Blog::Entry' ),
bless( {
'file' => '/home/karpet/blog/projects/swish/api_docs/swish_migration.7.pod',
'format' => 'pod',
'id' => 'swish_migration.7.pod',
'mtime' => 1208317558,
'name' => 'swish_migration.7',
'text' => '=pod
=head1 Migrating from Swish-e to Swish3
If you haven\'t already, read the L<Introduction to Swish3|swish_intro.7>
document first.
This document is intended for users already familiar with Swish-e
version 2.x who want to migrate to using Swish3.
=head2 The Tool Chain
Swish3 is intended to be one part of a search system tool chain.
In this section we will look at how Swish-e implements each of the tool
chain features, and then compare it to Swish3.
=head3 Aggregator
Swish-e has two built-in aggregators, for filesystem and web,
indicated with the B<-S> flag to the B<swish-e> command. Swish-e also
has a third B<-S> option called B<prog>, which is short for C<program>.
The C<program> is an aggregator that you define. Swish-e ships with several
example aggregators, including a filesystem crawler called B<DirTree.pl>
and a web crawler called B<spider.pl>. There are also example aggregators
for pulling data from a database and for specific kinds of documents, like
Hypermail mail archives.
Swish3 has no built-in aggregators. Instead, Swish3 takes the B<-S prog> approach
of defining an API for external aggregators to follow.
=head3 Normalizer
Swish-e has a feature called B<FileFilter> which allows you define an external
program to call if a document\'s name matches a particular pattern. The
file is handed to the external program and the output of the external program
is treated as the contents of the document. For example, you can specify
that all documents that end with C<.pdf> are first filtered through
the B<pdftotext> command.
Swish-e also comes with a set of Perl modules bundled together as
B<SWISH::Filter>. SWISH::Filter is used by the external aggregators like
B<DirTree.pl> and B<spider.pl>, thus making those programs both aggregators
and normalizers.
Swish3 has no built-in normalizer or feature like B<FileFilter>. Instead,
Swish3 assumes that something like SWISH::Filter will be used to standardize
documents before they are handed to Swish3.
=head2 Configuration
One of the biggest changes is the configuration file format. Swish3 uses
XML-style configuration files, and supports a subset of the configuration
options available in Swish-e.
This section documents the configuration options supported in Swish3.
=head2 See Also
=over
=item
L<Introduction to Swish3|swish_intro.7>
=item
L<Perl implementation of Swish3|SWISH::Prog>
=item
L<libswish3 API|libswish3.3>
=back
',
'url' => 'projects/swish/api_docs/swish_migration.7'
}, 'PodBlog::Model::Blog::Entry' ),
bless( {
'file' => '/home/karpet/blog/projects/swish/progress2.txt',
'format' => 'txt',
'id' => 'progress2.txt',
'mtime' => 1206932628,
'name' => 'progress2',
'text' => 'Swish3 Status 30 March 2008
More progress with Swish3.
<ul>
<li>
There is now a swish_xapian.cpp C++ example for using
libswish3 with a Xapian backend. All that is complete is the indexing
portion; still TODO is the search part. Still, a significant thing that
it was so easy to build a search engine.
</li>
<li>
The swish.xml header format is complete and can now read/write the header file.
Need to add that part to the swish_xapian.cpp example.
</li>
<li>
Squashed some long-standing memory leaks when using the filehandle functions.
</li>
</ul>
Little by little.
',
'url' => 'projects/swish/progress2'
}, 'PodBlog::Model::Blog::Entry' ),
bless( {
'file' => '/home/karpet/blog/projects/swish/progress.txt',
'format' => 'txt',
'id' => 'progress.txt',
'mtime' => 1205637794,
'name' => 'progress',
'text' => 'Swish3 Status 2008-03-15
I\'ve finally gotten back to Swish3 development after several months away. Hard to believe
I\'ve been working on this project for something close to 3 years now.
Lately I have been focusing on the following things:
<dl>
<dt>Header file format</dt>
<dd>
Because Swish3 will have multiple IR backends, it is important that there
be a consistent index metadata file that describes the MetaNames, Properties,
and tokenizing information, just like the Swish2 header does. Just as with the config
file format, it makes sense to define the header file format as XML, since we
already have a robust XML parser for free. To make it simple, I have defined
the header file XML schema to be the same as the config file schema. In short,
you configure Swish3 by creating a header file. The "real" header file will
be more strict about explicitly naming all the expected attribute values,
numbering the MetaName/Property ids, etc. But the idea is simple: a single
XML schema.
I have written the code to read header/config file format and create a
swish_Config object. There\'s also code for merging 2 swish_Config objects together,
so that you can define a config file to override an existing header file.
Still TODO is the code for writing the header file.
The Perl bindings have been updated to reflect the new swish_Config API. This
required a great deal of reworking and thinking about the Perl API. I had to rewrite
things a few times to get a workable solution. The key Perl mantra is "objects on demand."
I.e., do not define any Perl objects that wrap C pointers and try to keep them on the XS
side. Instead, create all Perl objects "just in time" as part of the get_* method call.
This makes reference counting much simpler.
</dd>
<dt>MetaNames and PropertyNames</dt>
<dd>These now have their own C API with swish_MetaName and swish_Property structs.
These relate directly to the header file format and swish_Config. There will end up
being a separate PropertyName API for search results. I still think we\'re going to have
to port the Swish2 PropertyNames storage/retrieval code to Swish3 and have a backend-indepedent
index.prop file. The issue with this is going to be scaling. One other thought I\'ve had
is storing properties in a SQLite db. That route won\'t allow for presorted properties,
but does have the advantage of being much more transparent and de-buggable.
</dd>
<dt>SWISH::Prog</dt>
<dd>I have moved the SWISH::Prog svn tree to svn.swish-e.org from peknet.com. I also
moved SWISH::Filter (and likely will eventually move SWISH::API::More and its cousins).
SWISH::Prog will form the framework for the Perl implementation of Swish3. I know there
are some folks who don\'t like the idea of Swish3 being so Perl-centric. To that I can say only,
tough luck. :)
Seriously though, my perspective is that there will be multiple Swish3 implementations.
The one I am working on is in Perl using SWISH::Prog. There\'s nothing to stop someone from
implementing one in C or C++ or Java or whatever. libswish3 provides the parsing/tokenizing
piece missing from other IR projects, and it is a library for the very reason that implementing
a Swish3 program should be language-neutral. If you can link against a C library, then you can
write a Swish3 program. The header file API is well documented; the backend is supposed to be
pluggable. It\'s all about the API.
I do intend to write a swish_xapian.cpp program eventually, showing how to implement a C++
Swish3 program with Xapian. That could be the fallback program if you really don\'t want to use
Perl.
</dd>
<dt>Documentation</dt>
<dd>I\'ve stared a swish_intro.7 and swish_migration.7 set of docs. swish_intro will outline the
aggregator/normalizer/analyzer/indexer/searcher philosophy and the outline of the libswish3
API. swish_migration will discuss differences in Swish2 vs Swish3 and how you can convert your
config files and move to using Swish3.
</dd>
</dl>
',
'url' => 'projects/swish/progress'
}, 'PodBlog::Model::Blog::Entry' ),
bless( {
'file' => '/home/karpet/blog/projects/swish/swishprog2.pod',
'format' => 'pod',
'id' => 'swishprog2.pod',
'mtime' => 1197347386,
'name' => 'swishprog2',
'text' => '=head1 SWISH::Prog take 2
Spent the last week or 2 totally reworking SWISH::Prog. Reorganized the class layout
to mirror the aggregator/parser/indexer/searcher paradigm I described some time ago.
It has started to look a little like KinoSearch in that respect, with the addition
of the aggregators and parser (which is of course Swish-e\'s contribution to IR).
After mulling/experimenting for several days over how best to write the spider,
I have decided to use WWW::Mechanize along with WWW::Rules and write from scratch.
Then I\'ll provide backwards API compat for the Swish-e 2.4 spider.pl script
config files/callbacks/etc. This proved easier than a direct port, and allows
me to provide extensible caching/queueing/user_agent classes rather than
hardcoding everything in a single script/library. I toyed with WWW::CheckSite
but in order to make it work with the aggregator API required so many gymnastics
it finally became easier to just write the spider myself. And a good programming
exercise as well. :)
',
'url' => 'projects/swish/swishprog2'
}, 'PodBlog::Model::Blog::Entry' ),
bless( {
'file' => '/home/karpet/blog/projects/swish/api_docs/libswish3.3.pod',
'format' => 'pod',
'id' => 'libswish3.3.pod',
'mtime' => 1195015232,
'name' => 'libswish3.3',
'text' => '=pod
=head1 NAME
libswish3 - Swish3 C library
=head1 SYNOPSIS
=head2 Data Structures
struct swish_StringList
{
int n;
xmlChar ** word;
};
struct swish_Config
{
int ref_cnt; /* for scripting languages */
void *stash; /* for scripting languages */
xmlHashTablePtr conf; /* the meat */
};
struct swish_ConfigValue
{
int ref_cnt;
unsigned int multi; /* indicates whether value is a string or hashref */
unsigned int equal; /* indicates whether key/value pairs are equal */
void *value; /* xmlHashTablePtr or xmlChar *str */
xmlChar *key; /* should be same as the key that points at this object */
};
struct swish_NamedBuffer
{
int ref_cnt; /* for scripting languages */
void *stash; /* for scripting languages */
xmlHashTablePtr hash; /* the meat */
};
struct swish_DocInfo
{
time_t mtime;
off_t size;
xmlChar * mime;
xmlChar * encoding;
xmlChar * uri;
unsigned int nwords;
xmlChar * ext;
xmlChar * parser;
xmlChar * update;
};
struct swish_Word
{
unsigned int position; // word position in doc
xmlChar *metaname; // immediate metaname
xmlChar *context; // metaname ancestry
xmlChar *word; // the word itself (NOTE stored as multibyte not wchar)
unsigned int start_offset; // start byte
unsigned int end_offset; // end byte
struct swish_Word *next; // pointer to next swish_Word
struct swish_Word *prev; // pointer to prev swish_Word
};
struct swish_WordList
{
swish_Word *head;
swish_Word *tail;
swish_Word *current; // for iterating
unsigned int nwords;
unsigned int ref_cnt; // for scripting languages
};
struct swish_Tag
{
xmlChar *name;
struct swish_Tag *next;
unsigned int n;
};
struct swish_TagStack
{
swish_Tag *head;
swish_Tag *temp;
unsigned int count;
char *name; // debugging aid -- name of the stack
xmlChar *flat; // all the stack item names as a string for convenience
};
struct swish_Analyzer
{
unsigned int maxwordlen; // max word length
unsigned int minwordlen; // min word length
unsigned int tokenize; // should we parse into WordList
swish_WordList* (*tokenizer) (swish_Analyzer*, xmlChar*, ...);
xmlChar* (*stemmer) (xmlChar*);
unsigned int lc; // should tokens be lowercased
void *stash; // for script bindings
void *regex; // optional regex
int ref_cnt; // for script bindings
};
struct swish_Parser
{
int ref_cnt; // for script bindings
swish_Config *config; // config object
swish_Analyzer *analyzer; // analyzer object
void (*handler)(swish_ParseData*); // handler reference
void *stash; // for script bindings
};
// TODO maybe store swish_Parser * here instead of separate config and analyzer
struct swish_ParseData
{
xmlBufferPtr buf_ptr; // tmp text (MetaName) buffer
xmlBufferPtr prop_buf; // tmp Property buffer
xmlChar *tag; // current tag name
swish_DocInfo *docinfo; // document-specific properties
swish_Config *config; // global config
unsigned int context_as_meta; // index tokens under all applicable MetaNames
unsigned int no_index; // toggle flag for special comments
unsigned int is_html; // shortcut flag for html parser
unsigned int bump_word; // boolean for moving word position/adding space
unsigned int word_pos; // word position in document
unsigned int offset; // current offset position
swish_TagStack *metastack; // stacks for tracking the tag => metaname
swish_TagStack *propstack; // stacks for tracking the tag => property
xmlParserCtxtPtr ctxt; // so we can free at end
swish_WordList *wordlist; // linked list of words
swish_NamedBuffer *properties; // buffer all properties
swish_NamedBuffer *metanames; // buffer all metanames
swish_Analyzer *analyzer; // Analyzer struct
void *stash; // for script bindings
};
=head2 Global Functions
void swish_init();
void swish_cleanup();
=head2 I/O Functions
xmlChar * swish_slurp_stdin( long flen );
xmlChar * swish_slurp_file_len( xmlChar *filename, long flen );
xmlChar * swish_slurp_file( xmlChar *filename );
=head2 Hash Functions
int swish_hash_add( xmlHashTablePtr hash, xmlChar *key, void * value );
int swish_hash_replace( xmlHashTablePtr hash, xmlChar *key, void *value );
int swish_hash_delete( xmlHashTablePtr hash, xmlChar *key );
xmlHashTablePtr swish_new_hash(int size);
=head2 Memory Functions
void swish_init_memory();
void * swish_xrealloc(void *ptr, size_t size);
void * swish_xmalloc( size_t size );
void swish_xfree( void *ptr );
void swish_mem_debug();
xmlChar * swish_xstrdup( const xmlChar * ptr );
xmlChar * swish_xstrndup( const xmlChar * ptr, int len );
=head2 Time Functions
double swish_time_elapsed(void);
double swish_time_cpu(void);
char * swish_print_time(double time);
char * swish_print_fine_time(double time);
=head2 Error Functions
void swish_set_error_handle( FILE *where );
void swish_fatal_err(char *msg,...);
void swish_warn_err(char *msg,...);
void swish_debug_msg(char *msg,...);
=head2 String Functions
void swish_verify_utf8_locale();
int swish_is_ascii( xmlChar *str );
int swish_utf8_chr_len( xmlChar *utf8 );
wchar_t * swish_locale_to_wchar(xmlChar * str);
xmlChar * swish_wchar_to_locale(wchar_t * str);
wchar_t * swish_wstr_tolower(wchar_t *s);
xmlChar * swish_str_tolower(xmlChar *s );
xmlChar * swish_str_skip_ws(xmlChar *s);
void swish_str_trim_ws(xmlChar *string);
void swish_debug_wchars( const wchar_t * widechars );
int swish_wchar_t_comp(const void *s1, const void *s2);
int swish_sort_wchar(wchar_t *s);
swish_StringList * swish_make_StringList(xmlChar * line);
swish_StringList * swish_init_StringList();
void swish_free_StringList(swish_StringList * sl);
=head2 Configuration Functions
swish_Config * swish_init_config();
swish_Config * swish_add_config( xmlChar * conf, swish_Config * config );
swish_Config * swish_parse_config( xmlChar * conf, swish_Config * config );
swish_Config * swish_parse_config_new( xmlChar * conf, swish_Config * config );
int swish_debug_config( swish_Config * config );
xmlHashTablePtr swish_subconfig_hash( swish_Config * config, xmlChar *key);
int swish_config_value_exists( swish_Config * config, xmlChar *key, xmlChar *value );
xmlChar * swish_get_config_value(swish_Config * config, xmlChar * key, xmlChar * value);
void swish_free_config(swish_Config * config);
swish_ConfigValue * swish_keys( swish_Config * config, ... );
swish_ConfigValue * swish_value( swish_Config * config, xmlChar * key, ... );
swish_ConfigValue * swish_init_ConfigValue();
void swish_free_ConfigValue( swish_ConfigValue * cv );
xmlHashTablePtr swish_mime_hash();
xmlChar * swish_get_mime_type( swish_Config * config, xmlChar * fileext );
xmlChar * swish_get_parser( swish_Config * config, xmlChar *mime );
=head2 Parser Functions
swish_Parser * swish_init_parser( swish_Config * config,
swish_Analyzer * analyzer,
void (*handler) (swish_ParseData *),
void *stash
);
void swish_free_parser( swish_Parser * parser );
int swish_parse_file( swish_Parser * parser,
xmlChar *filename,
void * stash );
int swish_parse_stdin( swish_Parser * parser,
void * stash );
int swish_parse_buffer( swish_Parser * parser,
xmlChar * buf,
void * stash );
=head2 Token Functions
void swish_init_words();
swish_WordList * swish_init_wordlist();
void swish_free_wordlist(swish_WordList * list);
swish_WordList * swish_tokenize( swish_Analyzer * analyzer, xmlChar * str, ... );
swish_WordList * swish_tokenize_utf8_string(
swish_Analyzer * analyzer,
xmlChar * str,
unsigned int offset,
unsigned int word_pos,
xmlChar * metaname,
xmlChar * context
);
swish_WordList * swish_tokenize_ascii_string(
swish_Analyzer * analyzer,
xmlChar * str,
unsigned int offset,
unsigned int word_pos,
xmlChar * metaname,
xmlChar * context
);
swish_WordList * swish_tokenize_regex(
swish_Analyzer * analyzer,
xmlChar * str,
unsigned int offset,
unsigned int word_pos,
xmlChar * metaname,
xmlChar * context
);
size_t swish_add_to_wordlist( swish_WordList * list,
xmlChar * word,
xmlChar * metaname,
xmlChar * context,
int word_pos,
int offset );
int swish_add_to_wordlist_len(
swish_WordList * list,
xmlChar * str,
int len,
xmlChar * metaname,
xmlChar * context,
int word_pos,
int offset );
void swish_debug_wordlist( swish_WordList * list );
=head2 Analyzer Functions
swish_Analyzer * swish_init_analyzer( swish_Config * config );
void swish_free_analyzer( swish_Analyzer * analyzer );
=head2 DocInfo Functions
swish_DocInfo * swish_init_docinfo();
void swish_free_docinfo( swish_DocInfo * ptr );
int swish_check_docinfo(swish_DocInfo * docinfo, swish_Config * config);
int swish_docinfo_from_filesystem( xmlChar *filename,
swish_DocInfo * i,
swish_ParseData *parse_data );
void swish_debug_docinfo( swish_DocInfo * docinfo );
xmlChar * swish_get_file_ext( xmlChar *url );
=head2 Buffer Functions
swish_NamedBuffer * swish_init_nb( swish_Config * config, xmlChar * configKey );
void swish_free_nb( swish_NamedBuffer * nb );
void swish_debug_nb( swish_NamedBuffer * nb, xmlChar * label );
void swish_add_buf_to_nb( swish_NamedBuffer *nb,
xmlChar * name,
xmlBufferPtr buf,
xmlChar * joiner,
int cleanwsp,
int autovivify);
void swish_add_str_to_nb( swish_NamedBuffer * nb,
xmlChar * name,
xmlChar * str,
unsigned int len,
xmlChar * joiner,
int cleanwsp,
int autovivify);
void swish_append_buffer( xmlBufferPtr buf, xmlChar * txt, int len );
=head1 DESCRIPTION
B<libswish3> is the core C library of B<Swish3>.
B<libswish3> uses the GNOME L<Libxml2|http://xmlsoft.org/> library to parse words and metadata
from XML, HTML and plain text files. B<libswish3> supports full UTF-8 encoding.
B<libswish3> is a parsing tool for use with information retrieval (IR) libraries.
Dynamic language bindings are available in the source distribution in the C<bindings>
directory.
=head1 APIs
The following APIs are defined:
=head1 Parsing API
B<libswish3> provides three basic input functions:
=over
=item
swish_parse_file()
=item
swish_parse_fh()
=item
swish_parse_buffer()
=back
Each of these functions takes a C<swish_Parser> struct pointer
and optional I<user_data>.
In addition:
=over
=item
The swish_parse_file() function takes a file path, which must be a valid file.
Directories and links are not supported. The assumption is that you will use
your calling code to recurse through directories and handle links.
=item
swish_parse_buffer() takes a string representing the document
headers and the full text of the document.
=item
swish_parse_fh() takes a filehandle pointer, which if set to NULL,
defaults to stdin.
=back
See the L<Headers API> section for more
information on using swish_parse_fh() and
swish_parse_buffer().
See the L<I<handler> Function> section for more information on how
to deal with the data extracted by each of the swish_parse_* functions.
=head1 Headers API
The Headers API supports and extends the Swish-e B<-S prog> feature,
which allows you to feed the indexer with output from another I<prog>ram.
The API has been extended from Swish-e\'s to allow for MIME types
and more congruence with the HTTP 1.1 specification.
See SWISH-RUN documentation
in the Swish-e distribution for the Swish-e version 2 headers API.
This is the libswish3 implementation. See B<SWISH::Prog::Headers> for a simple
Perl-based way of generating the proper headers.
=over
=item Content-Location
B<Swish-e name:> Path-Name
The name of the document. May be any string: an ID of a record in a database,
a URL or a simple file name. The string is stored in the swish_DocInfo B<uri> struct member,
which is often used as the primary identifier of a document in an index.
This header is required.
=item Content-Length
The length in bytes of the document, starting after the blank line separating the headers
from the document itself.
The value must be exactly the length of the document, including any extra
line feeds or carriage returns at the end of the document.
Example:
Content-Location: foo.html
Content-Length: 9
The doc.\\n
^^^^^^^^ ^
12345678 9
The value is stored in the swish_DocInfo B<size> struct member.
This header is required.
=item Last-Modified
B<Swish-e name:> Last-Mtime
The last modification time of the document. The value must be an integer:
the seconds since the Epoch on your system.
If not present, will default to the current time.
The value is stored in the swish_DocInfo B<mtime> struct member.
This header is not required.
=item Parser-Type
B<Swish-e name:> Document-Type
Explicitly name the parser used for the document, rather than defaulting to the MIME
type mapping based on B<Content-Type> and/or B<Content-Location>. The three parser types are:
=over
=item
XML
=item
HTML
=item
TXT
=back
The Swish-e values B<XML2>, B<XML*>, B<HTML2>, B<HTML*>, B<TXT2>, B<TXT*> are also
supported for compatibility, but they map to the three libswish3 types.
The value is stored in the swish_DocInfo B<parser> struct member.
If not present, the document parser will be automatically chosen based on the following logic:
=over
=item
If a B<Content-Type> is given, the parser mapped to that MIME type will be used. You may override
the default mappings in your configuration. See B<Configuration API>.
=item
If no B<Content-Type> is given, a MIME type will be guessed at based on the file extension of the
document\'s B<Content-Location>, and the parser mapped to that MIME type will be used.
=item
Finally, if a MIME type is not identified, the parser defined in B<SWISHP_CONFIG_DEFAULT_PARSER>
in B<libswish3.h> will be used.
=back
See also B<Content-Type> and B<Content-Location>.
This header is not required.
=item Content-Type
The MIME type of the document. The libswish3 MIME type list is based on the Apache 2.0
version. See L<http://www.iana.org/assignments/media-types/> for the official registry.
If not defined with B<Content-Type>, the MIME type will be guessed based on the
file extension in the B<Content-Location>
header. If the B<Content-Location> string does not contain a file extension (as might be the case
with non-URL value), or the file extension has no MIME mapping, then the MIME type will default
to B<SWISHP_DEFAULT_MIME> as defined in B<libswish3.h>.
You may override the default extension-to-MIME mappings in your configuration. See B<Configuration API>.
The value is stored in the swish_DocInfo B<mime> struct member.
See also B<Content-Location> and B<Parser-Type>.
This header is not required.
=item Update-Mode
B<NOTE:> This header exists only for backwards compatibility with Swish-e\'s incremental
index feature. B<It may be deprecated in a future version of libswish3.>
=back
=head1 Structures API
Writing an effective I<handler> function requires an understanding of some of the key
B<libswish3> data structures.
For more details on any of these structures, see the SYNOPSIS.
=head2 swish_Config
A configuration object. This object is required for initializing both a C<swish_Analyzer>
object and a C<swish_Parser> object.
=head2 swish_Parser
A parser object. Required for executing any of the three C<swish_parse_*> functions.
=head2 swish_ParseData
A parser data object. This object is passed around internally by the libxml2
SAX2 handlers, and is eventually the object passed to the I<handler> function pointer.
See L<The I<handler> Function>.
=head2 swish_WordList
A list of words or tokens. The object contains a linked list of swish_Word objects.
You can iterate over the contents of the WordList like this:
swish_debug_msg("%d words in list", list->nwords);
list->current = list->head;
while (list->current != NULL)
{
swish_debug_msg(" ---------- WORD --------- ");
swish_debug_msg("word : %s", list->current->word);
swish_debug_msg(" meta : %s", list->current->metaname);
swish_debug_msg(" context : %s", list->current->context);
swish_debug_msg(" pos : %d", list->current->position);
swish_debug_msg("soffset: %d", list->current->start_offset);
swish_debug_msg("eoffset: %d", list->current->end_offset);
list->current = list->current->next;
}
=head2 swish_Word
An object representing one word or token in an object. The word\'s start and end offset,
position relative to other words, tag context and MetaName are all available in the object.
=head2 swish_DocInfo
An object describing metadata about the document itself: URI, MIME type, size, etc.
=head2 swish_Analyzer
The Analyzer object controls how the character content of a document is parsed: whether
or not a WordList is created with a tokenizer, if the words (tokens) are lowercased or
stemmed, etc.
=head1 The I<handler> Function
The I<handler> function pointer is the final link in the parsing chain. The function
pointer is set in the Parser object constructor, and is called by each of the
swish_parse_* functions after the entire document has been parsed and (optionally)
tokenized.
The I<handler> receives one argument: a swish_ParseData object containing all the metadata
and words in the document.
If all you wanted to do was print out a report about each document as it was parsed,
your I<handler> function might be as simple as:
void
my_handler( swish_ParseData * parse_data )
{
swish_debug_docinfo( parse_data->docinfo );
swish_debug_wordlist( parse_data->wordlist );
swish_debug_nb( parse_data->properties, "Property" );
swish_debug_nb( parse_data->metanames, "MetaName" );
}
B<IMPORTANT:> After the I<handler> function is called, all the structures referenced
by the swish_ParseData object are automatically freed, so if you intend to keep any of the
data for storing in an index, you will need to strdup() words, properties, docinfo, etc.
as part of your indexing code.
See the example C<swish_lint.c> file for how to create and pass in a I<handler>
function pointer to the swish_Parser constructor.
=head1 Configuration API
Configuration is different with B<libswish3> than with Swish-e. The biggest change
is that B<libswish3> configuration files are written in XML. This is done for several
reasons:
=over
=item 1
Since B<libswish3> already has a powerful XML parser built-in, it\'s much easier to
parse a configuration file written in XML than to port the Swish-e config-style parser
to B<libswish3>.
=item 2
B<libswish3> stores index header information in a XML format nearly identical
to the configuration file format. So the parser needs to understand only one XML
schema.
=item 3
You can store UTF-8 text in your configuration file and it will be parsed correctly.
=item 4
The configuration directive list is extensible. Simple key/value configuration directives
can be added without any modification to the B<libswish3> config parser. They are simply
stored in the C<swish_Config> struct hash for your own use and amusement.
B<CAUTION:> The configuration directive names documented in the L<Directives> section below
are reserved for use by B<libswish3>. Some of them have special handling considerations
(like MetaNames and PropertyNames). So the important idea to grasp with the extensible
configuration feature is "simple key/value pairs."
=back
This section describes how to build a B<libswish3> configuration file.
=head2 Configuration Example
Here\'s an example B<libswish3> configuration file:
<swish>
<FollowSymLinks>yes</FollowSymLinks>
<Meta name="foo" bias="+10" />
<Meta name="bar" bias="-5" />
<Meta name="swishtitle" bias="+50" alias="title" />
<Meta name="other">color size weight</Meta>
<Prop name="foo" type="text" ignorecase="1" />
<Prop name="bar" type="int" />
<Prop name="lastmod" type="date" />
<Prop name="bing" comparecase="1" />
<Prop name="description" verbatim="1" max="10000" alias="body" length="20" />
<Prop name="notsorted" sort="0" />
<Tokenize>1</Tokenize>
</swish>
And here\'s that same example, dissected:
<swish>
The top level tag.
<FollowSymLinks>yes</FollowSymLinks>
Equivalent to the Swish-e style:
FollowSymLinks yes
which simply informs whatever aggregator you are using that when confronted
with a symlink on the filesystem, it should be followed.
C<FollowSymLinks> is an example of a simple key/value pair (see the B<CAUTION> above).
=head3 MetaNames
Here\'s the first big difference from Swish-e. MetaNames, MetaNameAlias, and
MetaNamesRank have been combined into a single XML tag with appropriate
attributes.
<Meta name="foo" bias="10" />
is the same thing as (in Swish-e style):
MetaNames foo
MetaNamesRank 10 foo
while:
<Meta name="swishtitle" bias="50" alias="title" />
is equivalent to:
MetaNames swishtitle
MetaNameAlias swishtitle title
MetaNamesRank 50 swishtitle
You can see that the XML style allows for a terser, more compact expression.
You can still assign multiple aliases to a single MetaName:
<Meta name="other">color size weight</meta>
is equivalent to:
MetaNames other
MetaNameAlias other color size weight
In addition, there are some special features intended for use with HTML documents.
<Meta name="links" html="1" alias="href" /> # same as HTMLLinksMetaName
<Meta name="images" html="1" alias="src" /> # same as ImageLinksMetaName
<Meta name="alttext" html="1" alias="alt" /> # same as IndexAltTagMetaName
<Meta name="as-text" html="1" alias="alt" /> # same as IndexAltTagMetaName
=head3 PropertyNames
PropertyNames, PropertyNamesCompareCase, PropertyNamesIgnoreCase, PropertyNamesNoStripChars,
PropertyNamesNumeric, PropertyNamesDate, PropertyNameAlias, PropertyNamesMaxLength,
PropertyNamesSortKeyLength, StoreDescription and PreSortedIndex
have all been combined into a single XML directive.
Here\'s the example from above with equivalent Swish-e directives annotated:
<Prop name="foo" ignorecase="1" />
# PropertyNamesIgnoreCase foo
<Prop name="bar" type="int" />
# PropertyNamesNumeric bar
<Prop name="lastmod" type="date" />
# PropertyNamesDate lastmod
<Prop name="bing" comparecase="1" />
# PropertyNamesCompareCase bing
<Prop name="description" verbatim="1" max="10000" alias="body" length="20" />
# PropertyNamesNoStripChars description
# PropertyNamesMaxLength 10000 description
# PropertyNameAlias description body
# PropertyNamesSortKeyLength 20 description
<Prop name="notsorted" sort="0" />
# PreSortedIndex foo bar lastmod bind description
Again, the XML format greatly simplifies the syntax. You can assign attributes
as you need, though be aware that some attributes are inherently mismatched
and might generate an error or unexpected behaviour:
<Prop name="foo" ignorecase="1" type="int" /> # wrong
<Prop name="foo" comparecase="1" type="date" /> # wrong
<Prop name="foo" verbatim="1" type="int" /> # wrong
<Prop name="foo" sort="0" length="20" /> # wrong
=head2 Directives
The following configuration directives are currently supported.
TODO
=head1 EXAMPLES
See the C<swish_lint.c> file included in the libswish3 distribution.
=head1 FAQ
=head2 What is IR?
Information Retrieval.
=head2 How is libswish3 related to Swish-e?
libswish3 is the core parsing library for Swish-e version 3 (Swish3).
=head2 Is libswish3 a search engine?
No. libswish3 is a document parser. It might work well in or with any number of search engines,
but it is not in itself any kind of search tool.
=head2 So what does libswish3 DO exactly?
libswish3 reads text, HTML and XML files and extracts just the words and document
properties from each document. It then hands off the wordlist and properties
to a I<handler> function. Finally, it frees all the memory associated with the wordlist
and properties.
The I<handler> function can do whatever you wish, though typically a I<handler>
would iterate over the words in the wordlist and add each one to an index using
an IR library API.
=head1 BACKGROUND
libswish3 is part of the Swish-e project.
It was born out of the need for UTF-8 and incremental
indexing support and a desire to experiment with alternate indexing
libraries like Lucene, KinoSearch, Xapian and Hyperestraier.
libswish3 was developed with the idea that many quality IR libraries already exist,
but few if any provide an easy and fast way of preparing documents for indexing.
The following assumptions informed the development of libswish3.
=head2 The IR Toolchain
A decent IR toolchain requires 5 parts:
=over 4
=item aggregator
Collects documents from a filesystem, database, website or other sources.
=item filter
Normalizes documents to a standard format (plain text or a delimited/markup
like YAML, HTML or XML) for indexing.
=item parser
Breaks a document into a list of words, including their context and position.
=item indexer
Writes the list of words in a storage system for quick, efficient retrieval.
=item searcher
Parses queries and fetches data from the indexer\'s storage system.
=back
Of course, the division between these parts is not always clean or apparent. Parsing search
queries, for example, will necessarily involve elements of the parser and searcher
components, while the indexer and searcher are of necessity intrinsically bound.
But any complete IR system will have these five parts in some combination.
=head2 Swish-e aggregators and filters are already good
The existing Swish-e document aggregators (B<DirTree.pl> and B<spider.pl>) and filtering
system (B<SWISH::Filter>) are good. They are all written in Perl and are easily modified,
and they have ample configuration options and documentation.
=head2 Why reinvent the wheel?
Several good IR libraries exist that provide an indexer and searcher. These libraries
do UTF-8, incremental indexing, and have search syntax on par with (or better than)
Swish-e 2.x. Examples include Xapian, KinoSearch and Lucene.
While they might be a little slower
than Swish-e (at least in terms of indexing speed) they make up that for with:
=over
=item
well-documented APIs
=item
bindings in a variety of programming languages
=item
active development communities
=item
the flexibility that comes with being a library instead of a fixed program
=back
=head2 The missing link
The piece that Swish-e provides that other IR libraries lack is a fast, stable, integrated
document parser. Xapian has Omega, but it does not parse XML, nor does it recognize
ad hoc word context (metanames).
However, the Swish-e 2.x parser does not work independently of the Swish-e indexer
and searcher, nor does it support UTF-8.
One piece is missing: a parser that works with the Swish-e aggregator/filter system, supports
UTF-8, and offers flexible options for connecting with other IR libraries.
Ergo, libswish3: a document parser compatible with the existing Swish-e -S prog API
and capable of generating UTF-8 wordlists for indexing with a variety of IR libraries.
=head2 Where does libswish3 fit?
libswish3 is the core C library in Swish3.
However, libswish3 may be used without the rest of the Swish3.
The assumption is that libswish3 could fit into an IR toolchain like this:
aggregator -> filter -> libswish3 -> some IR library
You could then use the native search API of the IR library.
For example, you might use the Swish-e B<spider.pl> script to spider a website, filtering
documents with B<SWISH::Filter> and then handing the output to a B<libswish3>-based
program that will parse the documents into words and store the data in a
Xapian or KinoSearch index (or both!). That model is, in fact, what Swish3 does.
Or you might use the B<SWISH::Prog> Perl module (from the CPAN) to build your own
aggregator/filter system, then hand the output to libswish3.
=head1 AUTHOR
Peter Karman (peter@peknet.com).
=head1 CREDITS
B<libswish3> is inspired by code from
Swish-e (http://www.swish-e.org),
Libxml2 (http://www.xmlsoft.org),
Apache (http://www.apache.org),
Rahul Dhesi (http://www.tug.org/tex-archive/tools/zoo/),
Angel Ortega (http://www.triptico.com/software/unicode.html),
James Henstridge (http://www.jamesh.id.au/articles/libxml-sax/libxml-sax.html),
YoLinux (http://www.yolinux.com/TUTORIALS/GnomeLibXml2.html)
and no doubt many unnamed others.
All mistakes, errors and poor programming choices are, however, those
of the author.
=head1 LICENSE
B<libswish3> is licensed under the GPL.
libswish3 is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later version.
libswish3 is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Library General Public
License along with libswish3; see the file COPYING. If
not, write to the
Free Software Foundation, Inc.
59 Temple Place - Suite 330
Boston, MA 02111-1307, USA
=head1 SEE ALSO
The project homepage: http://dev.swish-e.org/wiki/swish3
swish_lint(1), swish_isw(1), swish_words(1)
=cut',
'url' => 'projects/swish/api_docs/libswish3.3'
}, 'PodBlog::Model::Blog::Entry' ),
bless( {
'file' => '/home/karpet/blog/projects/swish/xapian10.txt',
'format' => 'txt',
'id' => 'xapian10.txt',
'mtime' => 1179496848,
'name' => 'xapian10',
'text' => 'Xapian 1.0 Released
<a href="http://thread.gmane.org/gmane.comp.search.xapian.general/4448">Announced this morning.</a>
Now if only I could find time to finish the swishx Swish3 example program...
',
'url' => 'projects/swish/xapian10'
}, 'PodBlog::Model::Blog::Entry' ),
bless( {
'file' => '/home/karpet/blog/projects/swish/tools.txt',
'format' => 'txt',
'id' => 'tools.txt',
'mtime' => 1178935018,
'name' => 'tools',
'text' => 'Open Source Search Tools
I was answering an email tonight from the hyperestraier list about
Xapian and Lucene and KinoSearch, and as I was googling around to find
all the email threads I remembered being a part of on the topic,
it was interesting to see intersections I hadn\'t remembered, like how
the same people (like me) keep popping up around these tools.
There are some folks who just need to implement a search engine for their
website/company/intranet. These are the sysadmin types who just need
something that works so that they can move on to the next project.
Then there are folks working in the IR field itself who are trying to
build the Next Big Search Thing, following in google tradition. Good luck
to them. They\'ll need it.
Then there are folks like me, who are a little OCD over things like
IR and search. I consider the developers of the projects I list above
in that camp. It\'s a good camp to be in.
Open source search tools have come a long way and there is really some
good momentum now in implementing multiple terabyte, high volume search
projects using open source technology. I like working in IR at a time
like this. Hopeful. Almost. :)
',
'url' => 'projects/swish/tools'
}, 'PodBlog::Model::Blog::Entry' ),
bless( {
'file' => '/home/karpet/blog/projects/swish/api_docs/swish_words.1.pod',
'format' => 'pod',
'id' => 'swish_words.1.pod',
'mtime' => 1178508976,
'name' => 'swish_words.1',
'text' => '=pod
=head1 NAME
swish_words - test the libswish3 tokenizer
=head1 SEE ALSO
swish_lint(1), swish_isw(1), libswish3(3)
=cut',
'url' => 'projects/swish/api_docs/swish_words.1'
}, 'PodBlog::Model::Blog::Entry' )
],
'menu' => [
bless( {
'dir' => 1,
'file' => '/home/karpet/blog/books',
'level' => 1,
'name' => 'books',
'url' => 'books'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 1,
'file' => '/home/karpet/blog/general',
'level' => 1,
'name' => 'general',
'url' => 'general'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 1,
'file' => '/home/karpet/blog/projects',
'level' => 1,
'name' => 'projects',
'url' => 'projects'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/_intro.txt',
'level' => 2,
'name' => '_intro',
'url' => 'projects/_intro'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/ajax.txt',
'level' => 2,
'name' => 'ajax',
'url' => 'projects/ajax'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/blas.txt',
'level' => 2,
'name' => 'blas',
'url' => 'projects/blas'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/blue.txt',
'level' => 2,
'name' => 'blue',
'url' => 'projects/blue'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/bug_or_feature.txt',
'level' => 2,
'name' => 'bug_or_feature',
'url' => 'projects/bug_or_feature'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/catalyst.txt',
'level' => 2,
'name' => 'catalyst',
'url' => 'projects/catalyst'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/chars.txt',
'level' => 2,
'name' => 'chars',
'url' => 'projects/chars'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/cpan.txt',
'level' => 2,
'name' => 'cpan',
'url' => 'projects/cpan'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/craydoc.txt',
'level' => 2,
'name' => 'craydoc',
'url' => 'projects/craydoc'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/cssprint.txt',
'level' => 2,
'name' => 'cssprint',
'url' => 'projects/cssprint'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/fp_talk1.txt',
'level' => 2,
'name' => 'fp_talk1',
'url' => 'projects/fp_talk1'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/fp_talk2.txt',
'level' => 2,
'name' => 'fp_talk2',
'url' => 'projects/fp_talk2'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/fp_talk3.txt',
'level' => 2,
'name' => 'fp_talk3',
'url' => 'projects/fp_talk3'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/fp_talks.txt',
'level' => 2,
'name' => 'fp_talks',
'url' => 'projects/fp_talks'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/frozenperl.txt',
'level' => 2,
'name' => 'frozenperl',
'url' => 'projects/frozenperl'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/hacker.txt',
'level' => 2,
'name' => 'hacker',
'url' => 'projects/hacker'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/hiliter.txt',
'level' => 2,
'name' => 'hiliter',
'url' => 'projects/hiliter'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/http_flow.txt',
'level' => 2,
'name' => 'http_flow',
'url' => 'projects/http_flow'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/ibmunicode.txt',
'level' => 2,
'name' => 'ibmunicode',
'url' => 'projects/ibmunicode'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/ideas.txt',
'level' => 2,
'name' => 'ideas',
'url' => 'projects/ideas'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/ishida.txt',
'level' => 2,
'name' => 'ishida',
'url' => 'projects/ishida'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/iterm.txt',
'level' => 2,
'name' => 'iterm',
'url' => 'projects/iterm'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/larry_pm.txt',
'level' => 2,
'name' => 'larry_pm',
'url' => 'projects/larry_pm'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/latenights.txt',
'level' => 2,
'name' => 'latenights',
'url' => 'projects/latenights'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/long_live_perl.txt',
'level' => 2,
'name' => 'long_live_perl',
'url' => 'projects/long_live_perl'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/memory.txt',
'level' => 2,
'name' => 'memory',
'url' => 'projects/memory'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/mylibrary.txt',
'level' => 2,
'name' => 'mylibrary',
'url' => 'projects/mylibrary'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/perlisalive.txt',
'level' => 2,
'name' => 'perlisalive',
'url' => 'projects/perlisalive'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/postgresql_on_osx.txt',
'level' => 2,
'name' => 'postgresql_on_osx',
'url' => 'projects/postgresql_on_osx'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/profiling_perl.txt',
'level' => 2,
'name' => 'profiling_perl',
'url' => 'projects/profiling_perl'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/rest.txt',
'level' => 2,
'name' => 'rest',
'url' => 'projects/rest'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/ror.txt',
'level' => 2,
'name' => 'ror',
'url' => 'projects/ror'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/stateofsearch.txt',
'level' => 2,
'name' => 'stateofsearch',
'url' => 'projects/stateofsearch'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 1,
'file' => '/home/karpet/blog/projects/swish',
'level' => 2,
'name' => 'swish',
'url' => 'projects/swish'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 1,
'file' => '/home/karpet/blog/projects/swish/api_docs',
'level' => 3,
'name' => 'api_docs',
'url' => 'projects/swish/api_docs'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/bindings.pod',
'level' => 3,
'name' => 'bindings',
'url' => 'projects/swish/bindings'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/cpan100606.pod',
'level' => 3,
'name' => 'cpan100606',
'url' => 'projects/swish/cpan100606'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/libswish3.pod',
'level' => 3,
'name' => 'libswish3',
'url' => 'projects/swish/libswish3'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/original_idea.txt',
'level' => 3,
'name' => 'original_idea',
'url' => 'projects/swish/original_idea'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/progress.txt',
'level' => 3,
'name' => 'progress',
'url' => 'projects/swish/progress'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/progress2.txt',
'level' => 3,
'name' => 'progress2',
'url' => 'projects/swish/progress2'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/progress3.txt',
'level' => 3,
'name' => 'progress3',
'url' => 'projects/swish/progress3'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/swish3Proposal.pod',
'level' => 3,
'name' => 'swish3Proposal',
'url' => 'projects/swish/swish3Proposal'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/swishprog.pod',
'level' => 3,
'name' => 'swishprog',
'url' => 'projects/swish/swishprog'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/swishprog2.pod',
'level' => 3,
'name' => 'swishprog2',
'url' => 'projects/swish/swishprog2'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/tokenizer.txt',
'level' => 3,
'name' => 'tokenizer',
'url' => 'projects/swish/tokenizer'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/tools.txt',
'level' => 3,
'name' => 'tools',
'url' => 'projects/swish/tools'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/utf8.notes.pod',
'level' => 3,
'name' => 'utf8.notes',
'url' => 'projects/swish/utf8.notes'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/whySwish3.pod',
'level' => 3,
'name' => 'whySwish3',
'url' => 'projects/swish/whySwish3'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swish/xapian10.txt',
'level' => 3,
'name' => 'xapian10',
'url' => 'projects/swish/xapian10'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/swished.txt',
'level' => 2,
'name' => 'swished',
'url' => 'projects/swished'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/texttools.txt',
'level' => 2,
'name' => 'texttools',
'url' => 'projects/texttools'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/wrong.txt',
'level' => 2,
'name' => 'wrong',
'url' => 'projects/wrong'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/xapian.txt',
'level' => 2,
'name' => 'xapian',
'url' => 'projects/xapian'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 0,
'file' => '/home/karpet/blog/projects/yamllint.txt',
'level' => 2,
'name' => 'yamllint',
'url' => 'projects/yamllint'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 1,
'file' => '/home/karpet/blog/spam',
'level' => 1,
'name' => 'spam',
'url' => 'spam'
}, 'PodBlog::Model::Menu::Entry' ),
bless( {
'dir' => 1,
'file' => '/home/karpet/blog/stpaulbartour',
'level' => 1,
'name' => 'stpaulbartour',
'url' => 'stpaulbartour'
}, 'PodBlog::Model::Menu::Entry' )
]
};