cfind

Description

cfind (Content FINDer) is a UNIX tool that provides functionality similar to that of Google Desktop from the command line.

Usage

The interface is similar to that of locate.

  rg@rg src $ cindex
  rg@rg src $ cfind open source mathematics
  /home/rg/phd/ipa_cover.tex
  /usr/share/texmf/tex/eplain/eplain.tex
  /home/rg/sgb/gb_words.tex
  /usr/share/texmf/source/amstex/doc/amsguide.tex
  /usr/share/doc/aspell-0.50.5-r4/manual.tex
  /home/rg/sgb/assign_lisa.tex
  /usr/share/rfc/rfc-index.txt

The results are ordered by their relevance. The cindex command builds the index. You need to run it only once in a while but it will take some time. The search, however, is very fast. Only text and TeX files are indexed by version 0.0.0. Note that unlike locate, cfind expects the index building command (cindex) and the search command (cfind) to be run by the same user.

See the README file for more details.

Download

You can download cfind-0.0.0 from SourceForge.

Development

Please use email (radugrigore at gmail) to report bugs so I can take care of them quickly. I am also interested in hearing about performance on large datasets and about file types you would like to be indexed. Ideally your email should have the word cfind somewhere in the subject.

Right now cfind is in a very preliminary phase. The next thing you will see is the addition of more formats (html, cpp, pdf). Other feature requests are tracked through SourceForge. Please consider adding yours there.

If you are interested in adding support for your own format then read on. The development language is OCaml. You need to provide a fold function over the words of a file. One way to do it is to write a lexer and then use Word_lister.Make functor to construct a module with the fold function (as well as iter and map). For example the lexer for TeX files that ignores commands is:

  {type token = IDENT of string | EOF;;}

  let ID = ['a'-'z' 'A'-'Z']+
  rule word = parse
      '\\'(ID)           { word lexbuf }
    | ID                 { IDENT (Lexing.lexeme lexbuf) }
    | eof                { EOF }
    | _                  { word lexbuf }

  {}

From time to time I'll post development related information on my blog.


SourceForge.net Logo The Caml Language Valid HTML 4.01!