Tatsuhiko Miyagawa > PSGI > PSGI::FAQ

Download:
PSGI-1.102.tar.gz

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Source  

NAME ^

PSGI::FAQ - Frequently Asked Questions and answers

QUESTIONS ^

General

How do you pronounce PSGI?

We read it simply P-S-G-I.

So what is this?

PSGI is an interface between web servers and perl-based web applications akin to what CGI does for web servers and CGI scripts.

Why do we need this?

Perl has CGI as a core module that somewhat abstracts the difference between CGI, mod_perl and FastCGI. However, most web application framework developers (e.g. Catalyst and Jifty) usually avoid using it to maximize the performance and to access low-level APIs. So they end up writing adapters for all of those different environments, some of which may be well tested while others are not.

PSGI allows web application framework developers to only write an adapter for PSGI. End users can choose from among all the backends that support the PSGI interface.

You said PSGI is similar to CGI. How is the PSGI interface different from CGI?

The PSGI interface is intentionally designed to be very similar to CGI so that supporting PSGI in addition to CGI would be extremely easy. Here's a highlight of the key differences between CGI and PSGI:

My framework already does CGI, FCGI and mod_perl. Why do I want to support PSGI?

There are many benefits for the web application framework to support PSGI.

I'm writing a web application. What's the benefit of PSGI for me?

If the framework you're using supports PSGI, that means your application can run on any of existing and future PSGI implementations. You can provide a .psgi file that returns PSGI application, the end users of your application should be able to configure and run your application in a bunch of different ways.

But I'm writing a web application in CGI and it works well. Should I switch to PSGI?

If you're writing a web application with a plain CGI.pm and without using any web frameworks, you're limiting your application in the plain CGI environments, along with mod_perl and FastCGI with some tweaks. If you're the only one developer and user of your application then that's probably fine.

One day you want to deploy your application in a shared hosting environment for your clients, or run your server in the standalone mode rather than as a CGI script, or distribute your application as open source software. Limiting your application in the CGI environment by using CGI.pm will bite you then.

You can start using one of PSGI compatible frameworks (either full-stack ones or micro ones), or use Plack::Request if you are anti frameworks, to make your application PSGI aware, to be more future proof.

Even if you ignore PSGI today and write applications in plain CGI, you can always later switch to PSGI with the CGI::PSGI wrapper.

What should I do to support PSGI?

If you're a web server developer, write a PSGI implementation that calls a PSGI application. Also join the development on Plack, the PSGI toolkit and utilities, to add a server adapter for your web server.

If you're a web application framework developer, write an adapter for PSGI. Now you're freed from supporting all different server environments.

If you're a web application developer (or a web application framework user), choose the framework that supports PSGI, or ask the author to support it. :) If your application is a large scale installable application that doesn't use any existing frameworks (e.g. WebGUI or Movable Type) you're considered as a framework developer instead from the PSGI point of view. So, writing an adapter for PSGI on your application would make more sense.

Is PSGI faster than (my framework)?

Again, PSGI is not an implementation, but there's a potential for a very fast PSGI implementation that preloads everything and runs fully optimized code as a preforked standalone with XS parsers, an event-based tiny web server written in C and embedded perl that supports PSGI, or a plain-old CGI.pm based backend that doesn't load any modules at all and runs pretty quickly without eating so much memory under the CGI environment.

There are prefork web server implementations such as Starman and Starlet, as well as fully asynchronous event based implementations such as Twiggy, Corona or Feersum. They're pretty fast and they include adapters for Plack so you can run with the plackup utility.

Users of your framework can choose which backend is the best for their needs. You, as a web application framework developer, don't need to think about lots of different users with different needs.

Plack

What is Plack? What is the difference between PSGI and Plack?

PSGI is a specification, so there's no software or module called PSGI. End users will need to choose one of the PSGI server implementations to run PSGI applications on. Plack is a set of PSGI utilities and contains the reference PSGI server HTTP::Server::PSGI, as well as Web server adapters for CGI, FastCGI and mod_perl.

Plack also has useful APIs and helpers on top of PSGI, such as Plack::Request to provide a nice object-oriented API on request objects, plackup that allows you to run an PSGI application from the command line and configure it using app.psgi (a la Rack's Rackup), and Plack::Test that allows you to test your application using standard HTTP::Request and HTTP::Response pair through mocked HTTP or live HTTP servers. See Plack for details.

What kind of server backends would be available?

In Plack, we already support most web servers like Apache2, and also the ones that supports standard CGI or FastCGI, but also try to support special web servers that can embed perl, like Perlbal or nginx. We think it would be really nice if Apache module mod_perlite and Google AppEngine supported PSGI too, so that you could run your PSGI/Plack based perl app in the cloud.

Ruby is Rack and JavaScript is Jack. Why is it not called Pack?

Well Pack indeed is a cute name, but Perl has a built-in function pack so it's a little confusing, especially when speaking instead of writing.

What namespaces should I use to implement PSGI support?

Do not use the PSGI:: namespace to implement PSGI backends or adapters.

The PSGI namespace is reserved for PSGI specifications and reference unit tests that implementors have to pass. It should not be used by particular implementations.

If you write a plugin or an extension to support PSGI for an (imaginary) web application framework called Camper, name the code such as Camper::Engine::PSGI.

If you write a web server that supports PSGI interface, then name it however you want. You can optionally support Plack::Handler's abstract interface or write an adapter for it, which is:

  my $server = Plack::Handler::FooBar->new(%opt);
  $server->run($app);

By supporting this new and run in your server, it becomes plackup compatible, so users can run your app via plackup. You're recommended to, but not required to follow this API, in which case you have to provide your own PSGI app launcher.

I have a CGI or mod_perl application that I want to run on PSGI/Plack. What should I do?

You have several choices:

CGI::PSGI

If you have a web application (or framework) that uses CGI.pm to handle query parameters, CGI::PSGI can help you migrate to PSGI. You'll need to change how you create CGI objects and how to return the response headers and body, but the rest of your code will work unchanged.

CGI::Emulate::PSGI and CGI::Compile

If you have a dead old CGI script that you want to change as little as possible (or even no change at all), then CGI::Emulate::PSGI and CGI::Compile can compile and wrap them up as a PSGI application.

Compared to CGI::PSGI, this might be less efficient because of STDIN/STDOUT capturing and environment variable mangling, but should work with any CGI implementation, not just CGI.pm, and CGI::Compile does the job of compiling a CGI script into a code reference just like mod_perl's Registry does.

Plack::Request and Plack::Response

If you have an HTTP::Engine based application (framework), or want to write an app from scratch and need a better interface than CGI, or you're used to Apache::Request, then Plack::Request and Plack::Response might be what you want. It gives you a nice Request/Response object API on top of the PSGI env hash and response array.

NOTE: Don't forget that whenever you have a CGI script that runs once and exits, and you turn it into a persistent process, it may have cleanup that needs to happen after every request -- variables that need to be reset, files that need to be closed or deleted, etc. PSGI can do nothing about that (you have to fix it) except give you this friendly reminder.

HTTP::Engine

Why PSGI/Plack instead of HTTP::Engine?

HTTP::Engine was a great experiment, but it mixed the application interface (the request_handler interface) with implementations, and the monolithic class hierarchy and role based interfaces make it really hard to write a new backend. We kept the existing HTTP::Engine and broke it into three parts: The interface specification (PSGI), Reference server implementations (Plack::Handler) and Standard APIs and Tools (Plack).

Will HTTP::Engine be dead?

It won't be dead. HTTP::Engine will stay as it is and still be useful if you want to write a micro webserver application rather than a framework.

Do I have to rewrite my HTTP::Engine application to follow PSGI interface?

No, you don't need to rewrite your existing HTTP::Engine application. It can be easily turned into a PSGI application using HTTP::Engine::Interface::PSGI.

Alternatively, you can use Plack::Request and Plack::Response which gives compatible APIs to HTTP::Engine::Request and HTTP::Engine::Response:

  use Plack::Request;
  use Plack::Response;

  sub request_handler {
      my $req = Plack::Request->new(shift);
      my $res = Plack::Response->new;
      # ...
      return $res->finalize;
  }

And this request_handler is a PSGI application now.

API Design

Keep in mind that most design choices made in the PSGI spec are to minimize the requirements on backends so they can optimize things. Adding a fancy interface or allowing flexibility in the PSGI layers might sound catchy to end users, but it would just add things that backends have to support, which would end up getting in the way of optimizations, or introducing more bugs. What makes a fancy API to attract web application developers is your framework, not PSGI.

Why a big env hash instead of objects with APIs?

The simplicity of the interface is the key that made WSGI and Rack successful. PSGI is a low-level interface between backends and web application framework developers. If we define an API on what type of objects should be passed and which method they need to implement, there will be so much duplicated code in the backends, some of which may be buggy.

For instance, PSGI defines $env->{SERVER_NAME} as a string. What if the PSGI spec required it to be an instance of Net::IP? Backend code would have to depend on the Net::IP module, or have to write a mock object that implements ALL of Net::IP's methods. Backends depending on specific modules or having to reinvent lots of stuff is considered harmful and that's why the interface is as minimal as possible.

Making a nice API for the end users is a job that web application frameworks (adapter developers) should do, not something PSGI needs to define.

Why is the application a code ref rather than an object with a ->call method?

Requiring an object in addition to a code ref would make EVERY backend's code a few lines more tedious, while requiring an object instead of a code ref would make application developers write another class and instanciate an object.

In other words, yes an object with a call method could work, but again PSGI was designed to be as simple as possible, and making a code reference out of class/object is no brainer but the other way round always requires a few lines of code and possibly a new file.

Why are the headers returned as an array ref and not a hash ref?

Short: In order to support multiple headers (e.g. Set-Cookie).

Long: In Python WSGI, the response header is a list of (header_name, header_value) tuples i.e. type(response_headers) is ListType so there can be multiple entries for the same header key. In Rack and JSGI, a header value is a String consisting of lines separated by "\n".

We liked Python's specification here, and since Perl hashes don't allow multiple entries with the same key (unless it's tied), using an array reference to store [ key => value, key => value ] is the simplest solution to keep both framework adapters and backends simple. Other options, like allowing an array ref in addition to a plain scalar, make either side of the code unnecessarily tedious.

I want to send Unicode content in the HTTP response. How can I do so?

PSGI mocks wire protocols like CGI, and the interface doesn't care too much about the character encodings and string semantics. That means, all the data on PSGI environment values, content body etc. are sent as byte strings, and it is an application's responsibility to properly decode or encode characters such that it's being sent over HTTP.

If you have a decoded string in your application and want to send them in UTF-8 as an HTTP body, you should use Encode module to encode it to utf-8. Note that if you use one of PSGI-supporting frameworks, chances are that they allow you to set Unicode text in the response body and they do the encoding for you. Check the documentation of your framework to see if that's the case.

This design decision was made so it gives more flexibility to PSGI applications and frameworks, without putting complicated work into PSGI web servers and interface specification itself.

No iterators support in $body?

We learned that WSGI and Rack really enjoy the benefit of Python and Ruby's language beauty, which are iterable objects in Python or iterators in Ruby.

Rack, for instance, expects the body as an object that responds to the each method and then yields the buffer, so

  body.each { |buf| request.write(buf) }

would just magically work whether body is an Array, FileIO object or an object that implements iterators. Perl doesn't have such a beautiful thing in the language unless autobox is loaded. PSGI should not make autobox as a requirement, so we only support a simple array ref or file handle.

Writing an IO::Handle-like object is pretty easy since it's only getline and close. You can also use PerlIO to write an object that behaves like a filehandle, though it might be considered a little unstable.

See also IO::Handle::Util to turn anything iterators-like into IO::Handle-like.

How should server determine to switch to sendfile(2) based serving?

First of all, an application SHOULD always set a IO::Handle-like object (or an array of chunks) that responds to getline and close as a body. That is guaranteed to work with any servers.

Optionally, if the server is written in perl or can tell a file descriptor number to the C-land to serve the file, then the server MAY check if the body is a real filehandle (possibly using Plack::Util's is_real_fh function), then get a file descriptor with fileno and call sendfile(2) or equivalent zero-copy data transfer using that.

Otherwise, if the server can't send a file using the file descriptor but needs a local file path (like mod_perl or nginx), the application can return an IO::Handle-like object that also responds to path method. This type of IO-like object can easily be created using IO::File::WithPath, IO::Handle::Util or Plack::Util's set_io_path function.

Middlewares can also look to see if the body has path method and does something interesting with it, like setting X-Sendfile headers.

To summarize:

What if I want to stream content or do a long-poll Comet?

The most straightforward way to implement server push is for your application to return a IO::Handle-like object as a content body that implements getline to return pushed content. This is guaranteed to work everywhere, but it's more like pull than push, and it's hard to do non-blocking I/O unless you use Coro.

If you want to do server push, where your application runs in an event loop and push content body to the client as it's ready, you should return a callback to delay the response.

  # long-poll comet like a chat application
  my $app = sub {
      my $env = shift;
      unless ($env->{'psgi.streaming'}) {
          die "This application needs psgi.streaming support";
      }
      return sub {
          my $respond = shift;
          wait_for_new_message(sub {
              my $message = shift;
              my $body = [ $message->to_json ];
              $respond->([200, ['Content-Type', 'application/json'], $body]);
          });
      };
  };

wait_for_new_message can be blocking or non-blocking: it's up to you. Most of the case you want to run it non-blockingly and should use event loops like AnyEvent. You may also check psgi.nonblocking value to see that it's possible and fallback to a blocking call otherwise.

Also, to stream the content body (like streaming messages over the Flash socket or multipart XMLHTTPRequest):

  my $app = sub {
      my $env = shift;
      unless ($env->{'psgi.streaming'}) {
          die "This application needs psgi.streaming support";
      }
      return sub {
          my $respond = shift;
          my $writer = $respond->([200, ['Content-Type', 'text/plain']]);
          wait_for_new_message(sub {
              my $message = shift;
              if ($message) {
                  $writer->write($message->to_json);
              } else {
                  $writer->close;
              }
          });
      };
  };

Which framework should I use to do streaming though?

We have servers that support non-blocking (where psgi.nonblocking is set to true), but the problem is that framework side doesn't necessarily support asynchronous event loop. For instance Catalyst has write method on the response object:

  while ($cond) {
      $c->res->write($some_stuff);
  }

This should work with all servers with psgi.streaming support even if they are blocking, and it should be fine if they're running in multiple processes (psgi.multiprocess is true).

Catalyst::Engine::PSGI also supports setting an IO::Handle-like object that supports getline, so using IO::Handle::Util

  my $io = io_from_getline sub {
       return $data; # or undef when done()
  };
  $c->res->body($io);

And that works fine to do streaming, but it's blocking (pull) rather than asynchronous server push, so again you should be careful not to run this application on non-blocking (and non-multiprocess) server environments.

We expect that more web frameworks will appear that is focused on, or existent frameworks will add support for, asynchronous and non-blocking streaming interface.

Is psgi.streaming interface a requirement for the servers?

It is specified as SHOULD, so unless there is a strong reason not to implement the interface, all servers are encouraged to implement this interface.

However, if you implement a PSGI server using an Perl XS interface for the ultimate performance or integration with web servers like Apache or nginx, or implement a sandbox like environment (like Google AppEngine or Heroku) or distributed platform using tools like Gearman, you might not want to implement this interface.

That's fine, and in that case applications relying on the streaming interface can still use Plack::Middleware::BufferedStreaming to fallback to the buffered write on unsupported servers.

Why CGI-style environment variables instead of HTTP headers as a hash?

Most existing web application frameworks already have code or a handler to run under the CGI environment. Using CGI-style hash keys instead of HTTP headers makes it trivial for the framework developers to implement an adapter to support PSGI. For instance, Catalyst::Engine::PSGI is only a few dozens lines different from Catalyst::Engine::CGI and was written in less than an hour.

Why is PATH_INFO URI decoded?

To be compatible with CGI spec (RFC 3875) and most web servers' implementations (like Apache and lighttpd).

I understand it could be inconvenient that you can't distinguish foo%2fbar from foo/bar in the trailing path, but the CGI spec clearly says PATH_INFO should be decoded by servers, and that web servers can deny such requests containing %2f (since such requests would lose information in PATH_INFO). Leaving those reserved characters undecoded (partial decoding) would make things worse, since then you can't tell foo%2fbar from foo%252fbar and could be a security hole with double encoding or decoding.

For web application frameworks that need more control over the actual raw URI (such as Catalyst), we made the REQUEST_URI environment hash key REQUIRED. The servers should set the undecoded (unparsed) original URI (containing the query string) to this key. Note that REQUEST_URI is completely raw even if the encoded entities are URI-safe.

For comparison, WSGI (PEP-333) defines both SCRIPT_NAME and PATH_INFO be decoded and Rack leaves it implementation dependent, while fixing most of PATH_INFO left encoded in Ruby web server implementations.

http://www.python.org/dev/peps/pep-0333/#url-reconstruction http://groups.google.com/group/rack-devel/browse_thread/thread/ddf4622e69bea53f

SEE ALSO ^

WSGI's FAQ clearly answers lots of questions about how some API design decisions were made, some of which can directly apply to PSGI.

http://www.python.org/dev/peps/pep-0333/#questions-and-answers

MORE QUESTIONS? ^

If you have a question that is not answered here, or things you totally disagree with, come join the IRC channel #plack on irc.perl.org or mailing list http://groups.google.com/group/psgi-plack. Be sure you clarify which hat you're wearing: application developers, server implementors or middleware developers. And don't criticize the spec just to criticize it: show your exact code that doesn't work or get too messy because of spec restrictions etc. We'll ignore all nitpicks and bikeshed discussion.

AUTHOR ^

Tatsuhiko Miyagawa <miyagawa@bulknews.net>

COPYRIGHT AND LICENSE ^

Copyright Tatsuhiko Miyagawa, 2009-2010.

This document is licensed under the Creative Commons license by-sa.

syntax highlighting: