todo.pod - metacpan.org

=head1 TO DO

=head2 Name

* Come up with a good name

* YODA (from the image in the talk)?

* "Answers you seek?"

=head2 Distribution

=head3 Docker

* ulimit for crawling processes

=head3 Windows distribution

* C<.msi> package including Perl, Apache Tika and Elasticsearch

* Debian package

=head2 Pages

=head3 Search page

* Simple search

* Autocomplete/recommend

=head3 Simple (HTML) Results page

* Search images

=head3 Result fragment / document rendering

Come up with a concept to render different mime types differently.

Ideally, this would avoid the hardcoding we use for C<< audio/mpeg >>
currently.

=head3 Customization

* Auto-session

* Refinement using the last search, if the last search was "recently"

=head2 Plack

* L<Plack>-hook/example for C</search> to tie up the search application
into arbitrary websites

=head2 Dancer

* ElasticSearch plugin / configuration through YAML

=head2 Mojolicious

* ElasticSearch plugin / configuration through YAML

=head2 Search multiple indices

Having different Elasticsearch clusters available (or not) should
be recognized and the search results should be combined. For example,
a work cluster should be searched in addition to the local cluster, if the
work network is available.

This calls for using the asynchronous API not only for searching but also
for progressively enhancing the results page as new results become available.

=head2 Recognizing new versions of old documents

How can we/Elasticsearch recognize similarity between two documents?

If two documents live in the same directory, the newest one should take
precedence and fold the similar documents below it.

=head2 Java ES plugins

Currently better written in Perl

=head2 ES Analyzers

=head3 FS scanner

* Don't rescan/reanalyze elements that already exist in Elasticsearch

* Delete entries that don't exist in the filesystem anymore

=head3 Video data

Which module provides interesting video metadata?

=head3 Audio data

* MP3s get imported but could use a nicer body rendering.

* Playback duration should be calculated

* Also import audio lyrics - how could these be linked to their mp3s?

=head3 Playlist data

Playlists should get custom rendering (album art etc.)

Playlists should ideally also hotlink their contents

=head2 Test data

* Consider importing a Wikipedia dump

* Some other larger, mixed corpus, like http://eur-lex.europa.eu/

=head2 Synonyms

Find out which one(s) we want:

L<https://www.elastic.co/guide/en/elasticsearch/guide/current/synonyms-expand-or-contract.html>

From first glance, we might want Simple Expansion, but Genre Expansion
also seems interesting.

We want to treat some synonyms as identical though, like 'MMSR' and its
German translation 'Geldmarktstatistik'.

=head1 User Introduction

=head2 Videos

* Create screencasts using L<http://www.openshot.org/videos/> or

=head1 Code structure

=head2 Crawlers

* Create Dancer-crawler - skip the HTTP generation process
and reuse C<App::Wallflower> for crawling a Dancer website.

* Create tree-structure-importer

Both IMAP and file systems are basically directed graphs and far easier
to crawl than the cyclic graphs of web pages. Abstract out the crawling
of a tree into a common module.

* Turn C<index-imap> and C<index-filesystem> into modules so they
become independent of being called from an outside shell.

This also implies they become runnable directly from the web interface
without an intermediate shell.

* Add attachment import to the imap crawler

=head2 Metasearch

Implement metasearch across multiple ES instances

=cut

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)