cfind
(Content FINDer)
is a UNIX tool that provides functionality similar
to that of Google Desktop from
the command line.
The interface is similar to that of locate
.
rg@rg src $ cindex rg@rg src $ cfind open source mathematics /home/rg/phd/ipa_cover.tex /usr/share/texmf/tex/eplain/eplain.tex /home/rg/sgb/gb_words.tex /usr/share/texmf/source/amstex/doc/amsguide.tex /usr/share/doc/aspell-0.50.5-r4/manual.tex /home/rg/sgb/assign_lisa.tex /usr/share/rfc/rfc-index.txt
The results are ordered by their relevance.
The cindex
command builds the index. You need to run it
only once in a while but it will take some time. The search, however, is
very fast. Only text and TeX files are indexed by version 0.0.0.
Note that unlike locate, cfind expects the index building command
(cindex
) and the search command (cfind
)
to be run by the same user.
See the README file for more details.
You can download cfind-0.0.0 from SourceForge.
Please use email (radugrigore at gmail) to report bugs
so I can take care of them quickly. I am also interested
in hearing about performance on large datasets
and about file types you would like to be indexed. Ideally your
email should have the word cfind
somewhere in
the subject.
Right now cfind
is in a very preliminary phase. The
next thing you will see is the addition of more formats (html, cpp,
pdf). Other feature requests are tracked through
SourceForge.
Please consider adding yours there.
If you are interested in adding support for your own format then
read on. The development language is OCaml. You need to
provide a fold
function over the words of a file. One way to do it is to write a
lexer and then use Word_lister.Make
functor to
construct a module with the fold
function (as well
as iter
and map
).
For example the lexer for TeX files that ignores commands is:
{type token = IDENT of string | EOF;;} let ID = ['a'-'z' 'A'-'Z']+ rule word = parse '\\'(ID) { word lexbuf } | ID { IDENT (Lexing.lexeme lexbuf) } | eof { EOF } | _ { word lexbuf } {}
From time to time I'll post development related information on my blog.