The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
!init OPT_STYLE="paper"

!define DOC_NAME           "PApp - creating applications for the WorldWideWeb"
!define DOC_TYPE           "[Vortrag]"
!define DOC_AUTHOR         "(c) 2000 Marc Lehmann <schmorp@schmorp.de>"
!build_title

H1: PApp - what is it?

PApp (which is simply short for "Perl APPlication") is a basically a
collection of perl modules directed at seasoned perl programmers that
allows one to create {{large}} applications for stateful protocols (like
http or wap). It is possible to implement a simple-but-complete content
management system in a few hundred lines of papp-code.

H1: About this document/presentation

The presentation I will hold at the linuxworldexpo will explain
a lot of technical details not included in this document. The
slides for the linuxworldexpo presentation will be available at
{{URL:http://www.goof.com/pcg/marc/docs.html}}, shortly after the
linuxworldexpo.

H1: What was the motivation for creating PApp?

The original motivation for writing PApp came when our company (nethype
GmbH) started to implement a highly interactive website that required at
least limited forms of content management.

Creating large or even medium-sized applications for the Web using CGI
is a tedious task. Things like preserving state (a.k.a. session-tracking
and/or user-tracking) is a major headache, both for the programmer and the
security advisor.

PApp solves these and a lot of other problems we didn't originally envision
by providing a generic and easy API.

H1: Features of PApp / Advantages over other solutions

While PApp could be mistaken to be yet another CGI-wrapper, this is not
the case. PApp concentrates on providing an application-centric view,
rather than a web-page-centric one. This means that there can be one web
page per file (as with CGI), but it is also possible to put multiple
pages into a single file (papp pages, called {{modules}}, are often so
short that it makes a lot of sense to group related pages together), or
to distribute them between many files. Of course, the good-old include
mechanism is still there (although not named C<include>).

H2: State management / Persistance

At your option (you want this!), PApp can manage persistent variables for
a single session or user. This means that variables almost automatically
stay persistent for the whole session. This marking mechanism is very
generic: State variables (called {{state keys}} in PApp parlance) can be
marked as session-dependent (the default), user-dependent (persistent over
session borders, also called {{preference items}}), local to a page or
a group of pages or any mixture thereof. Interesting planned extensions
include things like transactions and transaction-dependent state keys.

An example for a user-dependent state key is the language the user
selected last. A typical session-dependent variable is the flag wether
the user has authorized herself. A local variable could be data from a
multi-page transaction.

H2: Security

A common bug in cgi scripts is passing of sensitive data in so-called
{{hidden fields}} of web forms, hidden from the casual user but of
course open to attacks. With PApp this is very unnatural: The state data
never leaves the server, but instead an encrypted (128 bit twofish code)
{{cookie}} is used (usually encoded in the URL, not to be mistaken with the
cookies netscape implements). Compromising the server key gives
access to other sessions (similar to a broken caching proxy), but still
makes it impossible to change the data.

In general, the design of PApp makes security the first priority, not only
by careful design of the network/server protocol but also by providing
easy and standardized methods for common tasks.

H2: Session/User-tracking

Session and user-tracking are done automatically by PApp. The application
can react to session starts if necesary (e.g. by redirecting the user to
a start page when the data on the accessed page is no longer available
or initializing state keys on session start). This is also possible when
new users start a new session, of course. A session is defined by PApp as
a tree with the page that started a session (i.e. one without or with an
invalid session cookie)

User tracking beyond sessions is currently done using the http-cookie
mechanism. Care has been taken to do this sensibly, however:  The session
data from cookies is ignored and a user without (or with disabled)
cookies will not be flooded with cookie request more than once or twice a
day. This is another example of how PApp can easily adapt to users.

H2: User administration

PApp manages users per-server, not per-application. Applications can
use an access right system similar to the unix user/group mechanism to
aministrate its users, but can also implement their own system. PApp
identifies every user using a unique user id with optional attributes like
name/password/group and preferences.

H2: I18n

I18n is short for {{Internationalization}}, which, in the context of
PApp means multiple language support. PApp does this using {{language
tagging}}, string scanning and a generic translation editor. The I18n
model of PApp is more general than the widely-known gettext model
implimented by GNU and sun, among others. The target language is chosen
using the users preferred language and protocol-specific data (e.g. the
C<Accept-Language>-http-header).

Every source file can use a different language (if necessary; the
language format allows finer-grained distribution of languages but this
is not yet implemented). PApp can scan for strings in papp-sources,
text/html/xml files and even database fields (e.g. you can declare a
single database table row as english, to be translated, or as mixed
language, to be scanned for language tags). Every application specifies
the destination languages it wants to support. A translation editor (an
example application "delivered" with PApp) can be used by translators to
translate as-of-yet-untranslated messages, updates can be done on the fly.

I18n is as easy as writing C<__"Translate this"> (the tagging syntax
is a reminiscent of the widely used C<_("message")>-syntax of C) in your
documents or program.

H2: Unicode / Multi-charset ability

Internally, PApp supports only two datatypes: {{binary}} and
{{text}}. Binary is usually used for images or similar data, while text
should be used for html (or xml or wml...) pages. Relatively recent fixes
to the unicode standard mostly pacified the objections raised by a lot of
cultural groups against this standard, so PApp encodes everything using
unicode.

The internal representation is independent of the output encoding or
even the output character set. You can opt to output C<iso-8859-1> text
or, if the user wants something else, another charset + encoding like
C<iso-2022-jp>. This selection can be based on program choice (i.e.
hardwired) or can be based on the users preferences or protocol data (e.g.
the C<Accept-Charset>-http-header), similar to the selection of the target
language.

H2: Speed

Speed was a major concern when designing the API. Although implemented to
a large part in perl, PApp aims at providing a similar speed than bare
in-server scripts (like C<Apache::Registry>-based scripts). At least for
non-trivial ("real-world") pages, PApp pages should be very similar in
performance, but of course cannot beat C<Apache::Registry>. However, features
like I18n come basically at no cost, both with respect to programming
time and to the runtime, often leading to correctly tagged applications
even when multi-language was not originally a target of an application
("because it's so easy to do").

H2: Scalability

PApp applications easily scale to many servers if a single server cannot
sustain the load. The only limitation is that there must be a single
database-server managing the state keys, which is usually a very small
amount of processing power a PApp application consumes.

H2: XML

PApp support for XML is two-fold: First, PApp uses XML for its own
papp-source-format specifying basic layout of an application. The
decision for using XML was not easy, as XML can be regarded as a format
designed for machines (but still decipherable and writable by humans, if
necessary), while humans are generally better suited with SGML.

However, XML is used not only for internal source files, but is fully
supported as a text format. together with its evil brother {{PXML}},
which is basically xml (text) with embedded perl code (or vice versa), it
allows PApp to apply stylesheets (XSLT) to webpages either at compliation
time or at runtime. PApp applications can be written fully in XML, and
only the output stylesheets decides wether to actually output HTML, WML
(for mobile phones) or XML (for XML-capable-browsers, if sensible).

As an added gimmick, PApp can dynamically fetch XML or PXML data (or
code!) from other sources at runtime. Content management systems usually
want to store pages (and/or layout) inside a database. An example on
how to implement this even fits as an example into the manpage for the
C<PApp::XML>-module.

H2: Protocol- & Layout-independence

Together with stylesheets, PApp applications can be written with only
minimal dependencies on layout or target protocol. It is possible to write
applications that only provide "modules", and only the final stylesheet
decides how it is rendered. Since PApp supports this implicitly this
enables layout- and protocol-independent applications.

Together with the I18n model, this enables the almost complete seperation
of translation/programming/design and environment. I18n comes at almost
no cost, while layout seperation of course requires though on the
programmer's side, while protocol independence requires translation
stylesheets to translate internal xml representations into the target
"language".

H2: Platform-independence

Although the main target of PApp is Apache/mod_perl, an interface
based on CGI (or similar mechanisms) is possible (yet slower). PApp
applications are indifferent to the environment (i.e. it is easy to write
an application that runs both under CGI and inside apache).

H2: Database support

Serious web applications without database support are, of course,
impossible. Therefore PApp provides a lot of conviniences for
applications: Each application (and sub-application) can define a default
database connection which is persistent, cached, and checked (like all
database connections in PApp). SQL support is compatible to the underlying
DBI interface, but PApp programmers rarely have to resort to that
API. PApp automatically caches prepared SQL statements (allowing the query
optimiser to work once). Since a code-excerpt is better than a thousand
marketing words, here is an actual example:

!block perl
<:
   my $st = sql_exec \my($id, name),
                     "select id, name from user where name like ?",
                     $S{name};
:>
<table><tr><th>ID</th><th>Name</th></tr>
<:
   while ($st->fetch) {
      ?><tr><td>$id</td><td>$name</td></tr><:
   }
:>
</table>
!endblock

This displays all id/name-pairs in a given table using a nicely-formatted
html-table.

PApp currently uses MySQL for internal state management (MySQL is very
fast for the task at hand). Applications are free to use any database they
want, of course.

H2: Persistent helper objects

As mentioned earlier, database connections are persistent. PApp provides
a number of "unusual" persistent objects, for example, it is possible
to tell PApp that a given callback needs to be called after a specific
URL has been clicked, independent of the target page (i.e. the target
page has to know nothing about who called it). Another helper object is
the persistent SQL row object, which maps SQL rows into perl hashes. The
following code excerpt (using the {{editform}} library which is part of
PApp, and fully language-tagged) displays the id and name fields of a
table in a freely-editable (HTML-) form. Updates to the database are done
automatically, thus, the example is complete:

!block perl
<:
   my $row = new PApp::DataRef 'DB_row',
                               table => "user",
                               where => [id => $userid];

   # pre-set name
   $row->{name} ||= "<username>";
                               
   ef_begin;
   :><br>__"ID:"   <?ef_string \$row->{id}  ,  5:><:
   :><br>__"Name:" <?ef_string \$row->{name}, 20:><:
   ef_submit __"Update";
   ef_end;
:>
!endblock

H2: Debugability

Since PApp is written by its main users (or, conversely, the main users
of PApp also develop it), debugging support is relatively strong. PApp
usually is able to deliver a complete backtrace of the program, including
"interesting objects", when a fatal error occurs and features a powerful
exception mechanism to gather information from all stages (low- to
high-level) of a request. It is possible to store the backtrace and error
information into a database, providing the user with a nice error message
while mailing the administrator about the incident. The information saved
by PApp allows the programmer to precisely reproduce the error situation
(as far as possible), but usually the URL suffices the uniquely identify
the precise state of the session.

Schemes where the user could add comments to such a "coredump" or even
browse the data structures interactively (given correct access rights)
should be possible, but not implemented so far (this would be implemented
using a specialized PApp application supposedly called the "error
browser").

H2: "Web-Widgets"

The newest addition to PApp (and therefore not final in its
implementation) are reusable components also dubbed {{Web-Widgets}},
similar to reusable GUI-objects named "widgets". An example for such a
component would be a standard "forum" widget. In one of our projects we
use the same forum widget to provide "web chat", "small ads" and the
"news!" page, all with the same code but using different stylesheets to
customize the layout.

A PApp-application is, in some sense, a large state-machine. this is a
limitation of stateless protocols which is hard, but not impossible, to
circumvent (if perl only had efficient continuations). Any application can
be embedded into other applications, while retaining their own state and
their own state machine and of course their own set of variables / state
keys. Standard PApp applications are not much more than a single page with
html header/footer, authenticitation check and an embedded "Web-widget".

H2: Logging

Logging is a required part of any serious business. Gathering statistical
data is a basic requirenment today. PApp saves every state key to a
database for each page impression/request (usually between 300 and 900
bytes). This saved information contains everything nedeed to recreate a
given page except the actual program code. When session/state data is
being expired (necessary to put a sensible bound on the size of the state
database), PApp conviniently allows applications to gather statistical
data from individual "hits", using almost the same environment as at
runtime. Expiring and data gathering can, of course, be done seperately if
necessary.

H1: Disadvantages

PApp is not actually a revolution, judged by its components. It
does, however draw a lot of functionality and ideas into a single,
well-contained package. Nevertheless, there are quite a few reasons on why
{{NOT}} to use PApp, or at least not to use PApp {{YET}}.

* PApp is not yet a released module, its API is in flux (with respect to
  recent features), and not everything is working as it should yet. This
  is fortunately only a question of time.

* Only a single person is currently developing and designing PApp for
  free. This means that advances might sometimes not lead into the
  direction you want, and maybe not in the schedule you want.

* PApp requires the very latest perl - at the moment, this is perl
  5.7 + custom patches, due to the buggy unicode support in earlier
  versions. The latest released PApp module (on CPAN) does not rely on
  unicode, but is already quite outdated with respect to current features.

* There is a total lack of tutorials or introductory courses. While all
  PApp modules are documented, it is very much impossible to learn it
  using the reference documentation alone. Basically a one-day course
  under four eyes is necessary to get you up and running - PApp is
  not trivial. The lost time is generally made up with the increase in
  productivity experienced with PApp, though, similar to learning Perl ;)

A1: References

* PApp is available as a standard module from CPAN, although the current
  version uploaded is very old.

* The PApp homepage is available at the author's homepage, under
  {{URL:http://www.goof.com/pcg/marc/papp.html}}. This page includes
  links to the current manpages. Online demos are, unfortunately, not yet
  available.

* The newest versions of PApp can be accessed using CVS as a sourceforge project,
  at {{URL:http://www.sourceforge.net}}.

* The slides for this presentation will be available at the author's
  homepage, at {{URL:http://www.goof.com/pcg/marc/docs.html}}, shortly
  after the linuxworldexpo.

A2: Apology

I'd like to apologize for any typoes or other mistakes in this
document. It was written in a single session without access to a
spellchecker and so has not been debugged it yet ;)