You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
456 lines
23 KiB
456 lines
23 KiB
This is pgintcl/INTERNALS, notes on internal implementation of pgintcl. |
|
Last updated for pgintcl-3.4.0 on 2011-09-19 |
|
The project home page is: http://sourceforge.net/projects/pgintcl/ |
|
----------------------------------------------------------------------------- |
|
INTERNAL IMPLEMENTATION NOTES: |
|
|
|
This information is provided for maintenance, test, and debugging. |
|
|
|
A connection handle is just a Tcl socket channel. The application using |
|
pgin.tcl must not read from or write to this channel. |
|
|
|
Internal procedures, result structures, and other data are stored in a |
|
namespace called "pgtcl". The following namespace variables apply to |
|
all connections: |
|
|
|
pgtcl::debug A debug flag, default 0 (no debugging) |
|
pgtcl::version pgin.tcl version string |
|
pgtcl::rn Result number counter |
|
pgtcl::fnoids Function OID cache; see FAST-PATH FUNCTION CALLS |
|
pgtcl::errnames Constant array of error message field names |
|
|
|
The following arrays are indexed by connection handle, and contain data |
|
applying only to that connection: |
|
|
|
pgtcl::notice() Command to execute when receiving a Notice |
|
pgtcl::xstate() Transaction state |
|
pgtcl::notify() Notifications; see NOTIFICATIONS |
|
pgtcl::notifopt() Notification optionss; see NOTIFICATION |
|
pgtcl::std_str() For pg_escape_string etc; see ESCAPING |
|
pgtcl::bepid() Backend process ID (PID) |
|
|
|
Additional namespace variables are described in the sections below. |
|
Result structure variables are described next. |
|
|
|
----------------------------------------------------------------------------- |
|
RESULT STRUCTURES: |
|
|
|
A result structure is implemented as a variable result$N in the pgtcl |
|
namespace, where N is an integer. (The value of N is stored in pgtcl::rn |
|
and is incremented each time a new result structure is needed.) The result |
|
handle is passed back to the caller as $N, just the integer. The result |
|
structure is an array which stores all the meta-information about the |
|
result as well as the result values. |
|
|
|
The result structure array indexes in use are: |
|
|
|
Variables describing the overall result: |
|
result(conn) The connection handle (the socket channel) |
|
result(nattr) Number of attributes (columns) |
|
result(ntuple) Number of tuples (rows) |
|
result(status) PostgreSQL status code, e.g. PGRES_TUPLES_OK |
|
result(error) Error message if status is PGRES_FATAL_ERROR |
|
result(complete) Command completion status, e.g. "SELECT 10" |
|
result(error,C) Error message field C if status is PGRES_FATAL_ERROR. |
|
C is one of the codes for extended error message fields. |
|
|
|
Variables describing the attributes (columns) in the result: |
|
result(attrs) A list of the name of each attribute |
|
result(types) A list of the type OID for each attribute |
|
result(sizes) A list of attribute byte lengths or -1 if variable |
|
result(modifs) A list of the size modifier for each attributes |
|
result(formats) A list of the data format for each attributes |
|
result(tbloids) A list of the table OIDs for each attribute |
|
|
|
Variables describing prepared query parameters in the result: |
|
result(nparams) The number of prepared statement parameters |
|
result(paramtypes) List of prepared statement parameter type OIDs |
|
|
|
Variables storing the query result values: |
|
result($irow,$icol) Data value for result |
|
result(null,$irow,$icol) NULL flag for result |
|
|
|
The pg_exec and pg_exec_prepared commands create and return a new result |
|
structure. The pg_result command retrieves information from the result |
|
structure and also frees the result structure with the -clear option. |
|
(Other commands, notably pg_select and pg_execute, use pg_exec, so they |
|
also make a result structure, but it stays internal to the command and the |
|
caller never sees it.) The result structure innards are also directly |
|
accessed by some other routines, such as pg_select and pg_execute. Result |
|
structure arrays are unset (freed) by pg_result -clear, and any left-over |
|
result structures associated with a connection handle are freed when the |
|
connection handle is closed by pg_disconnect. |
|
|
|
The query result values are stored in result($irow,$icol) where $irow is |
|
the tuple (row) number, between 0 and $result(ntuples)-1 inclusive, and |
|
$icol is the attribute (column) number, between 0 and $result(nattr)-1 |
|
inclusive. If the value returned by the database is NULL, then |
|
$result($irow,$icol) is set to an empty string, and |
|
$result(null,$irow,$icol) is also set to an empty string for this row and |
|
column. For non-NULL values, $result(null,$irow,$icol) is not set at all. |
|
The "null,*,*" indexes are used only by pg_result -getNull if it is |
|
necessary for the application to distinguish NULL from empty string - both |
|
of which are stored as empty strings in result($irow,$icol) and return an |
|
empty string with any of the pg_result access methods. There is no way to |
|
distinguish NULL from empty string with pg_select, pg_execute, or |
|
pg_exec_prepared. |
|
|
|
The entire result of a query is stored before anything else happens (that |
|
is, before pg_exec and pg_exec_prepared return, and before pg_execute and |
|
pg_select process the first row). This is also true of libpq and pgtcl-ng |
|
(in their synchronous mode), but Tcl can be slower. |
|
|
|
Extended error message fields are new with PostgreSQL-7.4. Individual parts |
|
of a received error message are stored in the result array indexed by |
|
(error,$c) where $c is the one-letter code used in the protocol. See the |
|
pgin.tcl documentation for "pg_result -errorField" for more information. |
|
(As of 2.2.0, pg_result -errorField is the same as pg_result -error: both |
|
take an optional field name or code argument to return an extended error |
|
message field, rather than the full message.) |
|
|
|
----------------------------------------------------------------------------- |
|
BUFFERING |
|
|
|
PostgreSQL protocol version 3 (PostgreSQL-7.4) uses a message-based |
|
protocol. To read messages from the backend, pgin.tcl implements a |
|
per-connection buffer using several Tcl variables in the pgtcl namespace. |
|
The name of the connection handle (the socket name) is part of the variable |
|
name, represented by $c below. |
|
|
|
pgtcl::buf_$c The buffer holding a message from the backend. |
|
pgtcl::bufi_$c Index of the next byte to be processed from buf_$c |
|
pgtcl::bufn_$c Total number of bytes in the buffer buf_$c. |
|
|
|
For example, if the connection handle is "sock3", the variables are |
|
pgtcl::buf_sock3, pgtcl::bufi_sock3, and pgtcl::bufn_sock3. |
|
|
|
A few tests determined that the fastest way to fetch data from the buffers |
|
in Tcl was to use [string index] and [string range], although this might |
|
not seem intuitive. |
|
|
|
----------------------------------------------------------------------------- |
|
PARAMETERS |
|
|
|
The PostgreSQL backend can notify a front-end client about some parameters, |
|
and pgin.tcl stores these in the following variable in the pgtcl namespace: |
|
|
|
pgtcl::param_$c Array of parameter values, indexed by parameter name |
|
|
|
where $c is the connection handle (socket name). |
|
|
|
Access to these parameters is through the pg_parameter_status command, |
|
a pgin.tcl extension. |
|
|
|
----------------------------------------------------------------------------- |
|
PROTOCOL ISSUES |
|
|
|
This version of pgin.tcl speaks only to a Protocol Version 3 PostgreSQL |
|
backend (7.4 or later). There is one concession made to Version 2, and |
|
that is reading an error message. If a Version 2 error message is read, |
|
pgin.tcl will recognize it and pretend it got a Version 3 message. This |
|
is for use during the connection stage, to allow it to fail with a |
|
proper message if connecting to a Version 2-only backend. |
|
|
|
----------------------------------------------------------------------------- |
|
NOTIFICATIONS |
|
|
|
An array pgtcl::notify keeps track of notifications you want. The array is |
|
indexed as pgtcl::notify(connection,name) where connection is the |
|
connection handle (socket name) and name is the parameter used in |
|
pg_listen. The value of an array element is the command to execute on |
|
notification. This can be a procedure name, or a procedure name with |
|
leading arguments. It must be a proper Tcl list. |
|
|
|
Starting with PostgreSQL-9.0.0, a 'payload' string can be provided with the |
|
SQL NOTIFY command. Starting with pgin.tcl-3.2.0, this payload (if not empty) |
|
will be passed as an additional argument to the command. The command is taken |
|
as a list, and the payload is appended as in lappend. The resulting list is |
|
the command to execute. If there is no payload, or it is empty, or the server |
|
is older than PostgreSQL-9.0.0, no additional argument will be passed to the |
|
command. The command should therefore always accept an optional argument. |
|
|
|
Starting with pgintcl-3.4.0, there is an additional array pgtcl::notifopt() |
|
to store options for the notification. This array is indexed the same way |
|
as pgtcl::notif(), and holds integer values. The value is 0 if there are no |
|
options for this notification. The value is 1 if the notification listener |
|
should get the notifying backend process ID as an argument, as indicated by |
|
the -pid option to pg_listen. No other options are defined. |
|
|
|
----------------------------------------------------------------------------- |
|
NOTICES |
|
|
|
Notice and warning message handling can be customized using the |
|
pg_notice_handler command. By default, the notice handler is |
|
puts -nonewline stderr |
|
and this string will be returned the first time pg_notice_handler is |
|
called. A notice handler should be defined as a proc with one or more |
|
arguments. Leading arguments are supplied when the handler is set with |
|
pg_notice_handler, and the final argument is the notice or warning message. |
|
|
|
----------------------------------------------------------------------------- |
|
LARGE OBJECTS |
|
|
|
The large object commands are implemented using the PostgreSQL "fast-path" |
|
function call interface (same as libpq). See the next section for more |
|
information on fast-path. |
|
|
|
The pg_lo_creat command takes a mode argument. According to the PostgreSQL |
|
libpq documentation, lo_creat should take "INV_READ", "INV_WRITE", or |
|
"INV_READ|INV_WRITE". (pgin.tcl accepts "r", "w", and "rw" as equivalent |
|
to those respectively, but this is not compatible with pgtcl-ng.) It isn't |
|
clear why you would ever create a large object with other than |
|
"INV_READ|INV_WRITE". |
|
|
|
The pg_lo_open command also takes a mode argument. According to the |
|
PostgreSQL libpq documentation, lo_open takes the same mode values as |
|
lo_creat. But in libpgtcl the pg_lo_open command takes "r", "w", or "rw" |
|
for the mode, for some reason. pgin.tcl accepts either form for mode, |
|
but to be compatible with libpgtcl you should use "r", "w", or "rw" |
|
with pg_lo_open instead of INV_READ, INV_WRITE, or INV_READ|INV_WRITE. |
|
|
|
|
|
----------------------------------------------------------------------------- |
|
FAST-PATH FUNCTION CALLS |
|
|
|
Access to the PostgreSQL "Fast-path function call" interface is available |
|
in pgin.tcl. This was written to implement the large object command, and |
|
general use is discouraged. See the libpq documentation for more details on |
|
what this interface is and how to use it. |
|
|
|
It is expected that the Fast-path function call interface in PostgreSQL |
|
will be deprecated in favor of using the Extended Protocol to do |
|
separate Prepare, Bind, and Execute steps. See PREPARE/BIND/EXECUTE. |
|
|
|
Internally, backend functions are called by their PostgreSQL OID, but |
|
pgin.tcl handles the mapping of function name to OID for you. The |
|
fast-path function interface in pgin.tcl uses an array pgtcl::fnoids to |
|
cache object IDs of the PostgreSQL functions. One instance of this array |
|
is shared among all connections, under the assumption that these OIDs are |
|
common to all databases. (It is possible that if you have simultaneous |
|
connections to multiple database servers running different versions of |
|
PostgreSQL this could break.) The index to pgtcl::fnoids is the name |
|
of the function, or the function plus argument type list, as supplied |
|
to the pgin.tcl fast-path function call commands. The value of each |
|
array index is the OID of the function. |
|
|
|
PostgreSQL supports overloaded functions (same name, different number |
|
and/or argument types). You can call overloaded functions with pgin.tcl by |
|
specifying the argument type list after the function name. See examples |
|
below. You must specify the argument list exactly like psql "\df" does - as |
|
a list of correct type names, separated by a single comma and space. There |
|
is currently no provision to distinguish functions by their return type. It |
|
doesn't seem like there are any PostgreSQL functions which differ only by |
|
return type. |
|
|
|
Before PostgreSQL-7.4, certain errors in fast-path calls (such as supplying |
|
the wrong number of arguments to the backend function) would cause the |
|
back-end and front-end to lose synchronization, and the channel would be |
|
closed. This was true about libpq as well. This has been fixed with the |
|
new protocol in PostgreSQL-7.4. |
|
|
|
|
|
Commands: |
|
|
|
pg_callfn $db "fname" result "arginfo" arg... |
|
|
|
Call a PostgreSQL backend function and store the result. |
|
Returns the size of the result in bytes. |
|
|
|
Parameters: |
|
|
|
$db is the connection handle. |
|
|
|
"fname" is the PostgreSQL function name. This is either a simple |
|
name, like "encode", or a name followed by a parenthesized |
|
argument type list, like "like(text, text)". The second form |
|
is needed to specify which of several overloaded functions you want |
|
to call. |
|
|
|
"result" is the name of a variable where the PostgreSQL backend |
|
function returned value is to be stored. The number of bytes |
|
stored in "result" is returned as the value of pg_callfn. |
|
|
|
"arginfo" is a list of argument descriptors. Each list element is |
|
one of the following: |
|
I An integer32 argument is expected. |
|
S A Tcl string argument is expected. The length of the |
|
string is used (remember Tcl strings can contain null bytes). |
|
n (an integer > 0) |
|
A Tcl string argument is expected, and exactly this many |
|
bytes of the string argument are passed (padding with null |
|
bytes if needed). |
|
|
|
arg... Zero or more arguments to the PostgreSQL function follow. |
|
The number of arguments must match the number of elements |
|
in the "arginfo" list. The values are passed to the backend |
|
function according to the corresponding descriptor in |
|
"arginfo". |
|
|
|
For PostgreSQL backend functions which return a single integer32 argument, |
|
the following simplified interface is available: |
|
|
|
pg_callfn_int $db "fname" "arginfo" arg... |
|
|
|
The db, fname, arginfo, and other arguments are the same as |
|
for pg_callfn. The return value from pg_callfn_int is the |
|
integer32 value returned by the PostgreSQL backend function. |
|
|
|
Examples: |
|
Note: These examples demonstrate the command, but in both of these |
|
cases you would be better off using an SQL query instead. |
|
|
|
set n [pg_callfn $db version result ""] |
|
This calls the backend function version() and stores the return |
|
value in $result and the result length in $n. |
|
|
|
pg_callfn $db encode result {S S} $str base64 |
|
This calls the backend function encode($str, "base64") with 2 |
|
string arguments and stores the result in $result. |
|
|
|
pg_callfn_int $db length(text) S "This is a test" |
|
This calls the backend function length("This is a test"). Because |
|
there are multiple functions called length(), the argument type |
|
list "(text)" must be given after the function name. The length |
|
of the string (14) is returned by the function. |
|
|
|
----------------------------------------------------------------------------- |
|
PREPARE/BIND/EXECUTE |
|
|
|
Starting with PostgreSQL-7.4, access to separate Parse, Bind, and Execute |
|
steps are provided by the protocol. The Parse step can be replaced by an |
|
SQL PREPARE command. pgin.tcl provides support for this extended query |
|
protocol with pg_exec_prepared (introduced in pgin.tcl-2.0.0), and |
|
pg_exec_params (introduced in pgin.tcl-2.1.0). There is also a variation of |
|
pg_exec which provides a simplified interface to pg_exec_params. |
|
|
|
The main advantage of the extended query protocol is separation of |
|
parameters from the query text string. This avoids the need to quote and |
|
escape parameters, and may prevent SQL injection attacks. pg_exec_prepared |
|
also offers some performance advantages if a query can be prepared, parsed, |
|
and stored once and then execute multiple times without re-parsing. |
|
|
|
In addition to working with text parameters and results, the |
|
pg_exec_prepared and pg_exec_params commands support sending unescaped |
|
binary data to the server. (Fast-path function calls also support this.) |
|
These commands also support returning binary data to the client. (This can |
|
also be done with binary cursors.) Although the protocol definition and |
|
pgin.tcl commands support mixed text and binary results, libpq requires all |
|
result columns to be text, or all binary. Using mixed binary/text result |
|
columns will make your application incompatible with libpq-based versions |
|
of this interface. |
|
|
|
pg_exec_prepared is for execution of pre-prepared SQL statements after |
|
binding parameters. A named SQL statement must be prepared using the SQL |
|
"PREPARE" command before using pg_exec_prepared. An advantage of |
|
pg_exec_prepared is that the protocol-level Parse step requires the client |
|
to translate parameter types to OIDs, but using PREPARE lets the server |
|
determine the parameter argument types. pg_exec_prepared is modeled after |
|
the Libpq call: PQexecPrepared(). |
|
|
|
pg_exec_params does all three steps of the extended query protocol: parse, |
|
bind, and execute. Parameter types can be specified by type OID, or parameters |
|
can be based as text to be interpreted by the server as it does for any |
|
untyped literal string. To find the type OID of a PostgreSQL type '<T>', |
|
you need to query the server like this: |
|
SELECT oid FROM pg_type where typname='<T>' |
|
pg_exec_params is modeled after the Libpq call: PQexecParams(). |
|
|
|
A limitation of both pg_exec_prepared and pg_exec_params is lack of support |
|
for NULLs as parameter values. There is no way to pass a NULL parameter to |
|
the prepared statement. This is not a protocol or database limitation, but |
|
just lack of a good idea on how to implement the command interface to |
|
support NULLs without needlessly complication the more common case without |
|
NULLs. |
|
|
|
|
|
----------------------------------------------------------------------------- |
|
MD5 AUTHENTICATION |
|
|
|
MD5 authentication was added at PostgreSQL-7.2. This is a |
|
challenge/response protocol which avoids having clear-text passwords passed |
|
over the network. To activate this, the PostgreSQL administrator puts "md5" |
|
in the pg_hba.conf file instead of "password". Pgin.tcl supports this |
|
transparently; that is, if the backend requests MD5 authentication during |
|
the connection, pg_connect will use this protocol. The MD5 implementation |
|
was coded by the original author of pgin.tcl. It does not use the tcllib |
|
implementation, which is significantly faster but much more complex. |
|
|
|
----------------------------------------------------------------------------- |
|
ENCODING |
|
|
|
Character set encoding was added to pgin.tcl-3.0.0. More information can be |
|
found in the README and REFERENCE files. |
|
|
|
The following are converted to Unicode before being sent to PostgreSQL: |
|
|
|
+ Query strings (pg_exec, and all higher-level commands which use it) |
|
+ TEXT-format query parameters in pg_exec_prepared/pg_exec_params |
|
+ All parameter arguments in pg_exec when query parameters are used |
|
+ Prepared statement name in pg_exec_prepared |
|
+ COPY table FROM STDIN data sent using pg_copy_write |
|
|
|
The following are converted from Unicode when received from PostgreSQL: |
|
|
|
+ Query result column data when TEXT-format (not when BINARY-format) |
|
+ All Error and Notice response strings |
|
+ Parameter names and values |
|
+ Notification messages |
|
+ Command completion message |
|
+ Query result field names (column names) |
|
+ COPY table TO STDOUT data received using pg_copy_read |
|
|
|
Conversion of data to Unicode for sending to PostgreSQL occurs in 5 places |
|
in the code: pg_exec and pg_exec_params query strings, pg_exec_prepared |
|
statement name, pg_exec_prepared text format parameters, and when writing |
|
COPY FROM data in pg_copy_write. |
|
|
|
Conversion of Unicode data from PostgreSQL occurs in 3 places in the code: |
|
when receiving a protocol message "string" type (which covers various |
|
messages, parameters, and field names), when reading TEXT mode tuple data, |
|
and when reading COPY TO data in pg_copy_read. |
|
|
|
There is no Unicode conversion for the connection parameters username, |
|
database-name, or password. PostgreSQL seems to store these using the |
|
encoding of the database cluster/template1 database, which may differ from |
|
the encoding of the database to which the client is connected. It is |
|
unclear how to recode these characters. At this time, it is wise to avoid |
|
non-ASCII characters in database names, usernames, and passwords. This may |
|
be fixed in the future. |
|
|
|
The fast-path function call interface treats all its arguments as binary |
|
data and does not encode or decode them. The fast-path function calls |
|
were implemented primarily for large object support, and large object |
|
support is not affected by Unicode encoding because it is all binary |
|
data. It is unlikely that encoding support will be added to fast-path |
|
function calls, since parameterized queries are the preferred replacement. |
|
|
|
----------------------------------------------------------------------------- |
|
ESCAPING |
|
|
|
An array pgtcl::std_str() is used to store the per-connection setting for |
|
the PostgreSQL setting standard_conforming_strings. This was added in |
|
Pgin.tcl-3.1.0 to support the versions of pg_escape_string, pg_quote, and |
|
pg_escape_bytea which accept an optional $conn argument. |
|
|
|
If the array value indexed by $conn is 1, then standard conforming strings |
|
is on for that database connection, and the backslash (\) is not considered |
|
special in SQL quoted string constants. In this case, pg_escape_string and |
|
pg_quote will not double backslashes. pg_escape_bytea will omit one level |
|
of backslashes when escaping backslash and octal values. |
|
|
|
If the array value indexed by $conn is 0, then standard conforming strings |
|
is off for that database and connection, and the backslash (\) is special |
|
in SQL quoted string constants. In that case, pg_escape_string and pg_quote |
|
will double backslashes. pg_escape_bytea will use 4 backslashes for a single |
|
backslash, and 2 backslashes in an octal value. |
|
|
|
There is also an array index "_default_" which is used when no $conn |
|
argument is supplied to the escape commands. Just as in libpq, the |
|
_default_ value is set any time a Set Parameter message for |
|
standard_conforming_strings is received over any open database connection. |
|
If you are using a single connection, or multiple connections with the same |
|
value for standard_conforming_strings, you will get correct escaping |
|
behavior even without using the $conn argument when escaping strings. |
|
|
|
|
|
-----------------------------------------------------------------------------
|
|
|