* Come up with a good name
* YODA (from the image in the talk)?
* "Answers you seek?"
* ulimit for crawling processes
* .msi package including Perl, Apache Tika and Elasticsearch
.msi
* Debian package
* Simple search
* Autocomplete/recommend
* Search images
Come up with a concept to render different mime types differently.
Ideally, this would avoid the hardcoding we use for audio/mpeg currently.
audio/mpeg
* Auto-session
* Refinement using the last search, if the last search was "recently"
* Plack-hook/example for /search to tie up the search application into arbitrary websites
/search
* ElasticSearch plugin / configuration through YAML
Having different Elasticsearch clusters available (or not) should be recognized and the search results should be combined. For example, a work cluster should be searched in addition to the local cluster, if the work network is available.
This calls for using the asynchronous API not only for searching but also for progressively enhancing the results page as new results become available.
How can we/Elasticsearch recognize similarity between two documents?
If two documents live in the same directory, the newest one should take precedence and fold the similar documents below it.
Currently better written in Perl
* Don't rescan/reanalyze elements that already exist in Elasticsearch
* Delete entries that don't exist in the filesystem anymore
Which module provides interesting video metadata?
* MP3s get imported but could use a nicer body rendering.
* Playback duration should be calculated
* Also import audio lyrics - how could these be linked to their mp3s?
Playlists should get custom rendering (album art etc.)
Playlists should ideally also hotlink their contents
* Consider importing a Wikipedia dump
* Some other larger, mixed corpus, like http://eur-lex.europa.eu/
Find out which one(s) we want:
https://www.elastic.co/guide/en/elasticsearch/guide/current/synonyms-expand-or-contract.html
From first glance, we might want Simple Expansion, but Genre Expansion also seems interesting.
We want to treat some synonyms as identical though, like 'MMSR' and its German translation 'Geldmarktstatistik'.
* Create screencasts using http://www.openshot.org/videos/ or
* Create Dancer-crawler - skip the HTTP generation process and reuse App::Wallflower for crawling a Dancer website.
App::Wallflower
* Create tree-structure-importer
Both IMAP and file systems are basically directed graphs and far easier to crawl than the cyclic graphs of web pages. Abstract out the crawling of a tree into a common module.
* Turn index-imap and index-filesystem into modules so they become independent of being called from an outside shell.
index-imap
index-filesystem
This also implies they become runnable directly from the web interface without an intermediate shell.
* Add attachment import to the imap crawler
Implement metasearch across multiple ES instances
To install Dancer::SearchApp, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Dancer::SearchApp
CPAN shell
perl -MCPAN -e shell install Dancer::SearchApp
For more information on module installation, please visit the detailed CPAN module installation guide.