Swish3 Proposal

Thoughts on Swish-e version 3.

Assumptions
  • In order to keep Swish-e fast and portable, some key parts need to be written in a compiled language like C.

  • C developers are increasingly harder to recruit to OSS projects like Swish-e.

  • C is slower to develop and more difficult to maintain than non-compiled languages like Python or Perl.

  • To encourage more code contributors to the project and make the project more useful to more people, make the core C parts library modules with well-defined and documented APIs. This makes the code more maintainable and flexible, and allows integration of other IR libraries like Xapian.

Core C Libraries

NOTE The following list is no longer accurate. libswish3 combines all these into one library.

  • SwishUtils (libswishu)

    Common shared functions for things like IO, string handling, times, errors, memory and hashing.

    I've started this one.

  • SwishConfig (libswishc)

    Parse config files into in-memory data structures, and read/write index config headers.

    I've started this one.

  • SwishParser (libswishp)

    Parse documents into properties and wordlist.

    I've started this one.

  • SwishIndex (libswishi)

    Store properties and wordlists.

    TODO.

  • SwishSearch (libswishs)

    Parse queries and fetch results from an index.

    Could be re-working of existing libswish-e to expect UTF-8 (which SwishUtils supports).