You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

456 lines
23 KiB

This is pgintcl/INTERNALS, notes on internal implementation of pgintcl.
Last updated for pgintcl-3.4.0 on 2011-09-19
The project home page is: http://sourceforge.net/projects/pgintcl/
-----------------------------------------------------------------------------
INTERNAL IMPLEMENTATION NOTES:
This information is provided for maintenance, test, and debugging.
A connection handle is just a Tcl socket channel. The application using
pgin.tcl must not read from or write to this channel.
Internal procedures, result structures, and other data are stored in a
namespace called "pgtcl". The following namespace variables apply to
all connections:
pgtcl::debug A debug flag, default 0 (no debugging)
pgtcl::version pgin.tcl version string
pgtcl::rn Result number counter
pgtcl::fnoids Function OID cache; see FAST-PATH FUNCTION CALLS
pgtcl::errnames Constant array of error message field names
The following arrays are indexed by connection handle, and contain data
applying only to that connection:
pgtcl::notice() Command to execute when receiving a Notice
pgtcl::xstate() Transaction state
pgtcl::notify() Notifications; see NOTIFICATIONS
pgtcl::notifopt() Notification optionss; see NOTIFICATION
pgtcl::std_str() For pg_escape_string etc; see ESCAPING
pgtcl::bepid() Backend process ID (PID)
Additional namespace variables are described in the sections below.
Result structure variables are described next.
-----------------------------------------------------------------------------
RESULT STRUCTURES:
A result structure is implemented as a variable result$N in the pgtcl
namespace, where N is an integer. (The value of N is stored in pgtcl::rn
and is incremented each time a new result structure is needed.) The result
handle is passed back to the caller as $N, just the integer. The result
structure is an array which stores all the meta-information about the
result as well as the result values.
The result structure array indexes in use are:
Variables describing the overall result:
result(conn) The connection handle (the socket channel)
result(nattr) Number of attributes (columns)
result(ntuple) Number of tuples (rows)
result(status) PostgreSQL status code, e.g. PGRES_TUPLES_OK
result(error) Error message if status is PGRES_FATAL_ERROR
result(complete) Command completion status, e.g. "SELECT 10"
result(error,C) Error message field C if status is PGRES_FATAL_ERROR.
C is one of the codes for extended error message fields.
Variables describing the attributes (columns) in the result:
result(attrs) A list of the name of each attribute
result(types) A list of the type OID for each attribute
result(sizes) A list of attribute byte lengths or -1 if variable
result(modifs) A list of the size modifier for each attributes
result(formats) A list of the data format for each attributes
result(tbloids) A list of the table OIDs for each attribute
Variables describing prepared query parameters in the result:
result(nparams) The number of prepared statement parameters
result(paramtypes) List of prepared statement parameter type OIDs
Variables storing the query result values:
result($irow,$icol) Data value for result
result(null,$irow,$icol) NULL flag for result
The pg_exec and pg_exec_prepared commands create and return a new result
structure. The pg_result command retrieves information from the result
structure and also frees the result structure with the -clear option.
(Other commands, notably pg_select and pg_execute, use pg_exec, so they
also make a result structure, but it stays internal to the command and the
caller never sees it.) The result structure innards are also directly
accessed by some other routines, such as pg_select and pg_execute. Result
structure arrays are unset (freed) by pg_result -clear, and any left-over
result structures associated with a connection handle are freed when the
connection handle is closed by pg_disconnect.
The query result values are stored in result($irow,$icol) where $irow is
the tuple (row) number, between 0 and $result(ntuples)-1 inclusive, and
$icol is the attribute (column) number, between 0 and $result(nattr)-1
inclusive. If the value returned by the database is NULL, then
$result($irow,$icol) is set to an empty string, and
$result(null,$irow,$icol) is also set to an empty string for this row and
column. For non-NULL values, $result(null,$irow,$icol) is not set at all.
The "null,*,*" indexes are used only by pg_result -getNull if it is
necessary for the application to distinguish NULL from empty string - both
of which are stored as empty strings in result($irow,$icol) and return an
empty string with any of the pg_result access methods. There is no way to
distinguish NULL from empty string with pg_select, pg_execute, or
pg_exec_prepared.
The entire result of a query is stored before anything else happens (that
is, before pg_exec and pg_exec_prepared return, and before pg_execute and
pg_select process the first row). This is also true of libpq and pgtcl-ng
(in their synchronous mode), but Tcl can be slower.
Extended error message fields are new with PostgreSQL-7.4. Individual parts
of a received error message are stored in the result array indexed by
(error,$c) where $c is the one-letter code used in the protocol. See the
pgin.tcl documentation for "pg_result -errorField" for more information.
(As of 2.2.0, pg_result -errorField is the same as pg_result -error: both
take an optional field name or code argument to return an extended error
message field, rather than the full message.)
-----------------------------------------------------------------------------
BUFFERING
PostgreSQL protocol version 3 (PostgreSQL-7.4) uses a message-based
protocol. To read messages from the backend, pgin.tcl implements a
per-connection buffer using several Tcl variables in the pgtcl namespace.
The name of the connection handle (the socket name) is part of the variable
name, represented by $c below.
pgtcl::buf_$c The buffer holding a message from the backend.
pgtcl::bufi_$c Index of the next byte to be processed from buf_$c
pgtcl::bufn_$c Total number of bytes in the buffer buf_$c.
For example, if the connection handle is "sock3", the variables are
pgtcl::buf_sock3, pgtcl::bufi_sock3, and pgtcl::bufn_sock3.
A few tests determined that the fastest way to fetch data from the buffers
in Tcl was to use [string index] and [string range], although this might
not seem intuitive.
-----------------------------------------------------------------------------
PARAMETERS
The PostgreSQL backend can notify a front-end client about some parameters,
and pgin.tcl stores these in the following variable in the pgtcl namespace:
pgtcl::param_$c Array of parameter values, indexed by parameter name
where $c is the connection handle (socket name).
Access to these parameters is through the pg_parameter_status command,
a pgin.tcl extension.
-----------------------------------------------------------------------------
PROTOCOL ISSUES
This version of pgin.tcl speaks only to a Protocol Version 3 PostgreSQL
backend (7.4 or later). There is one concession made to Version 2, and
that is reading an error message. If a Version 2 error message is read,
pgin.tcl will recognize it and pretend it got a Version 3 message. This
is for use during the connection stage, to allow it to fail with a
proper message if connecting to a Version 2-only backend.
-----------------------------------------------------------------------------
NOTIFICATIONS
An array pgtcl::notify keeps track of notifications you want. The array is
indexed as pgtcl::notify(connection,name) where connection is the
connection handle (socket name) and name is the parameter used in
pg_listen. The value of an array element is the command to execute on
notification. This can be a procedure name, or a procedure name with
leading arguments. It must be a proper Tcl list.
Starting with PostgreSQL-9.0.0, a 'payload' string can be provided with the
SQL NOTIFY command. Starting with pgin.tcl-3.2.0, this payload (if not empty)
will be passed as an additional argument to the command. The command is taken
as a list, and the payload is appended as in lappend. The resulting list is
the command to execute. If there is no payload, or it is empty, or the server
is older than PostgreSQL-9.0.0, no additional argument will be passed to the
command. The command should therefore always accept an optional argument.
Starting with pgintcl-3.4.0, there is an additional array pgtcl::notifopt()
to store options for the notification. This array is indexed the same way
as pgtcl::notif(), and holds integer values. The value is 0 if there are no
options for this notification. The value is 1 if the notification listener
should get the notifying backend process ID as an argument, as indicated by
the -pid option to pg_listen. No other options are defined.
-----------------------------------------------------------------------------
NOTICES
Notice and warning message handling can be customized using the
pg_notice_handler command. By default, the notice handler is
puts -nonewline stderr
and this string will be returned the first time pg_notice_handler is
called. A notice handler should be defined as a proc with one or more
arguments. Leading arguments are supplied when the handler is set with
pg_notice_handler, and the final argument is the notice or warning message.
-----------------------------------------------------------------------------
LARGE OBJECTS
The large object commands are implemented using the PostgreSQL "fast-path"
function call interface (same as libpq). See the next section for more
information on fast-path.
The pg_lo_creat command takes a mode argument. According to the PostgreSQL
libpq documentation, lo_creat should take "INV_READ", "INV_WRITE", or
"INV_READ|INV_WRITE". (pgin.tcl accepts "r", "w", and "rw" as equivalent
to those respectively, but this is not compatible with pgtcl-ng.) It isn't
clear why you would ever create a large object with other than
"INV_READ|INV_WRITE".
The pg_lo_open command also takes a mode argument. According to the
PostgreSQL libpq documentation, lo_open takes the same mode values as
lo_creat. But in libpgtcl the pg_lo_open command takes "r", "w", or "rw"
for the mode, for some reason. pgin.tcl accepts either form for mode,
but to be compatible with libpgtcl you should use "r", "w", or "rw"
with pg_lo_open instead of INV_READ, INV_WRITE, or INV_READ|INV_WRITE.
-----------------------------------------------------------------------------
FAST-PATH FUNCTION CALLS
Access to the PostgreSQL "Fast-path function call" interface is available
in pgin.tcl. This was written to implement the large object command, and
general use is discouraged. See the libpq documentation for more details on
what this interface is and how to use it.
It is expected that the Fast-path function call interface in PostgreSQL
will be deprecated in favor of using the Extended Protocol to do
separate Prepare, Bind, and Execute steps. See PREPARE/BIND/EXECUTE.
Internally, backend functions are called by their PostgreSQL OID, but
pgin.tcl handles the mapping of function name to OID for you. The
fast-path function interface in pgin.tcl uses an array pgtcl::fnoids to
cache object IDs of the PostgreSQL functions. One instance of this array
is shared among all connections, under the assumption that these OIDs are
common to all databases. (It is possible that if you have simultaneous
connections to multiple database servers running different versions of
PostgreSQL this could break.) The index to pgtcl::fnoids is the name
of the function, or the function plus argument type list, as supplied
to the pgin.tcl fast-path function call commands. The value of each
array index is the OID of the function.
PostgreSQL supports overloaded functions (same name, different number
and/or argument types). You can call overloaded functions with pgin.tcl by
specifying the argument type list after the function name. See examples
below. You must specify the argument list exactly like psql "\df" does - as
a list of correct type names, separated by a single comma and space. There
is currently no provision to distinguish functions by their return type. It
doesn't seem like there are any PostgreSQL functions which differ only by
return type.
Before PostgreSQL-7.4, certain errors in fast-path calls (such as supplying
the wrong number of arguments to the backend function) would cause the
back-end and front-end to lose synchronization, and the channel would be
closed. This was true about libpq as well. This has been fixed with the
new protocol in PostgreSQL-7.4.
Commands:
pg_callfn $db "fname" result "arginfo" arg...
Call a PostgreSQL backend function and store the result.
Returns the size of the result in bytes.
Parameters:
$db is the connection handle.
"fname" is the PostgreSQL function name. This is either a simple
name, like "encode", or a name followed by a parenthesized
argument type list, like "like(text, text)". The second form
is needed to specify which of several overloaded functions you want
to call.
"result" is the name of a variable where the PostgreSQL backend
function returned value is to be stored. The number of bytes
stored in "result" is returned as the value of pg_callfn.
"arginfo" is a list of argument descriptors. Each list element is
one of the following:
I An integer32 argument is expected.
S A Tcl string argument is expected. The length of the
string is used (remember Tcl strings can contain null bytes).
n (an integer > 0)
A Tcl string argument is expected, and exactly this many
bytes of the string argument are passed (padding with null
bytes if needed).
arg... Zero or more arguments to the PostgreSQL function follow.
The number of arguments must match the number of elements
in the "arginfo" list. The values are passed to the backend
function according to the corresponding descriptor in
"arginfo".
For PostgreSQL backend functions which return a single integer32 argument,
the following simplified interface is available:
pg_callfn_int $db "fname" "arginfo" arg...
The db, fname, arginfo, and other arguments are the same as
for pg_callfn. The return value from pg_callfn_int is the
integer32 value returned by the PostgreSQL backend function.
Examples:
Note: These examples demonstrate the command, but in both of these
cases you would be better off using an SQL query instead.
set n [pg_callfn $db version result ""]
This calls the backend function version() and stores the return
value in $result and the result length in $n.
pg_callfn $db encode result {S S} $str base64
This calls the backend function encode($str, "base64") with 2
string arguments and stores the result in $result.
pg_callfn_int $db length(text) S "This is a test"
This calls the backend function length("This is a test"). Because
there are multiple functions called length(), the argument type
list "(text)" must be given after the function name. The length
of the string (14) is returned by the function.
-----------------------------------------------------------------------------
PREPARE/BIND/EXECUTE
Starting with PostgreSQL-7.4, access to separate Parse, Bind, and Execute
steps are provided by the protocol. The Parse step can be replaced by an
SQL PREPARE command. pgin.tcl provides support for this extended query
protocol with pg_exec_prepared (introduced in pgin.tcl-2.0.0), and
pg_exec_params (introduced in pgin.tcl-2.1.0). There is also a variation of
pg_exec which provides a simplified interface to pg_exec_params.
The main advantage of the extended query protocol is separation of
parameters from the query text string. This avoids the need to quote and
escape parameters, and may prevent SQL injection attacks. pg_exec_prepared
also offers some performance advantages if a query can be prepared, parsed,
and stored once and then execute multiple times without re-parsing.
In addition to working with text parameters and results, the
pg_exec_prepared and pg_exec_params commands support sending unescaped
binary data to the server. (Fast-path function calls also support this.)
These commands also support returning binary data to the client. (This can
also be done with binary cursors.) Although the protocol definition and
pgin.tcl commands support mixed text and binary results, libpq requires all
result columns to be text, or all binary. Using mixed binary/text result
columns will make your application incompatible with libpq-based versions
of this interface.
pg_exec_prepared is for execution of pre-prepared SQL statements after
binding parameters. A named SQL statement must be prepared using the SQL
"PREPARE" command before using pg_exec_prepared. An advantage of
pg_exec_prepared is that the protocol-level Parse step requires the client
to translate parameter types to OIDs, but using PREPARE lets the server
determine the parameter argument types. pg_exec_prepared is modeled after
the Libpq call: PQexecPrepared().
pg_exec_params does all three steps of the extended query protocol: parse,
bind, and execute. Parameter types can be specified by type OID, or parameters
can be based as text to be interpreted by the server as it does for any
untyped literal string. To find the type OID of a PostgreSQL type '<T>',
you need to query the server like this:
SELECT oid FROM pg_type where typname='<T>'
pg_exec_params is modeled after the Libpq call: PQexecParams().
A limitation of both pg_exec_prepared and pg_exec_params is lack of support
for NULLs as parameter values. There is no way to pass a NULL parameter to
the prepared statement. This is not a protocol or database limitation, but
just lack of a good idea on how to implement the command interface to
support NULLs without needlessly complication the more common case without
NULLs.
-----------------------------------------------------------------------------
MD5 AUTHENTICATION
MD5 authentication was added at PostgreSQL-7.2. This is a
challenge/response protocol which avoids having clear-text passwords passed
over the network. To activate this, the PostgreSQL administrator puts "md5"
in the pg_hba.conf file instead of "password". Pgin.tcl supports this
transparently; that is, if the backend requests MD5 authentication during
the connection, pg_connect will use this protocol. The MD5 implementation
was coded by the original author of pgin.tcl. It does not use the tcllib
implementation, which is significantly faster but much more complex.
-----------------------------------------------------------------------------
ENCODING
Character set encoding was added to pgin.tcl-3.0.0. More information can be
found in the README and REFERENCE files.
The following are converted to Unicode before being sent to PostgreSQL:
+ Query strings (pg_exec, and all higher-level commands which use it)
+ TEXT-format query parameters in pg_exec_prepared/pg_exec_params
+ All parameter arguments in pg_exec when query parameters are used
+ Prepared statement name in pg_exec_prepared
+ COPY table FROM STDIN data sent using pg_copy_write
The following are converted from Unicode when received from PostgreSQL:
+ Query result column data when TEXT-format (not when BINARY-format)
+ All Error and Notice response strings
+ Parameter names and values
+ Notification messages
+ Command completion message
+ Query result field names (column names)
+ COPY table TO STDOUT data received using pg_copy_read
Conversion of data to Unicode for sending to PostgreSQL occurs in 5 places
in the code: pg_exec and pg_exec_params query strings, pg_exec_prepared
statement name, pg_exec_prepared text format parameters, and when writing
COPY FROM data in pg_copy_write.
Conversion of Unicode data from PostgreSQL occurs in 3 places in the code:
when receiving a protocol message "string" type (which covers various
messages, parameters, and field names), when reading TEXT mode tuple data,
and when reading COPY TO data in pg_copy_read.
There is no Unicode conversion for the connection parameters username,
database-name, or password. PostgreSQL seems to store these using the
encoding of the database cluster/template1 database, which may differ from
the encoding of the database to which the client is connected. It is
unclear how to recode these characters. At this time, it is wise to avoid
non-ASCII characters in database names, usernames, and passwords. This may
be fixed in the future.
The fast-path function call interface treats all its arguments as binary
data and does not encode or decode them. The fast-path function calls
were implemented primarily for large object support, and large object
support is not affected by Unicode encoding because it is all binary
data. It is unlikely that encoding support will be added to fast-path
function calls, since parameterized queries are the preferred replacement.
-----------------------------------------------------------------------------
ESCAPING
An array pgtcl::std_str() is used to store the per-connection setting for
the PostgreSQL setting standard_conforming_strings. This was added in
Pgin.tcl-3.1.0 to support the versions of pg_escape_string, pg_quote, and
pg_escape_bytea which accept an optional $conn argument.
If the array value indexed by $conn is 1, then standard conforming strings
is on for that database connection, and the backslash (\) is not considered
special in SQL quoted string constants. In this case, pg_escape_string and
pg_quote will not double backslashes. pg_escape_bytea will omit one level
of backslashes when escaping backslash and octal values.
If the array value indexed by $conn is 0, then standard conforming strings
is off for that database and connection, and the backslash (\) is special
in SQL quoted string constants. In that case, pg_escape_string and pg_quote
will double backslashes. pg_escape_bytea will use 4 backslashes for a single
backslash, and 2 backslashes in an octal value.
There is also an array index "_default_" which is used when no $conn
argument is supplied to the escape commands. Just as in libpq, the
_default_ value is set any time a Set Parameter message for
standard_conforming_strings is received over any open database connection.
If you are using a single connection, or multiple connections with the same
value for standard_conforming_strings, you will get correct escaping
behavior even without using the $conn argument when escaping strings.
-----------------------------------------------------------------------------