Linux.com Article DB: Excerpt from Professional Linux Programming: Chapter 4; PostgreSQL Interfacing

Cursors

Suppose our program was accessing a large database, with thousands of rows, and we executed a query that returned all those rows. Our program could suddenly need a very large amount of memory indeed, to store all those results. Since potentially this is all happening across a network as well, clearly we need a way of retrieving the data in smaller quantities, say a row at a time. There is a standard way of doing this, and it is how you would normally fetch data from an SQL database into a C program, (or indeed programs coded in many other languages). What we need is a cursor.

A cursor is a feature we have not met so far, because they are normally only applicable either when using SQL embedded in an external program, or within procedural language function stored in the database, often referred to as 'stored procedures'. Cursors are not generally used from the command line. The SQL92 standard only defines cursors for use in embedded programs, so in most database environments this is the only place you can use them, although in an extension to the SQL standard PostgreSQL does allow them from the command line as well.

A cursor is a way of scrolling through a set of results, fetching returned data in discrete blocks. To use a cursor, you declare it - with a name that has an associated SELECT statement. You then FETCH the results, usually one row at a time, though you can fetch many rows at a time.

The SQL92 standard, and many other implementations of cursors, require an additional step, an OPEN cursor command, between the DECLARE and the FETCH. The PostgreSQL libpq does not need it, the DECLARE is taken as an implicit command. When we get onto ecpg (the alternative way of embedding SQL in C code), you will see that we need to write the OPEN CURSOR command in the source code.

In pseudo code, the sequence looks something like:

BEGIN A TRANSACTION
DECLARE CURSOR mycursor FOR
    SELECT-statement
[OPEN mycursor]

DO {
   FETCH some data from mycursor
   Process the row(s) retrieved
} WHILE the FETCH command found data

CLOSE mycursor
COMMIT WORK

There are two new SQL commands here, DECLARE CURSOR and FETCH, both of which we need to look at before we can write some program code that fetches data using a cursor.

The syntax for declaring a cursor is very straightforward:

DECLARE cursor_name [BINARY] CURSOR FOR
    <SELECT-statement>

This creates, and implicitly opens in libpq, a cursor with the given name. Notice that the cursor is bound to a single SELECT statement. This cursor name is now effectively another way of referring to the SELECT statement. We only need the BINARY option when we wish to retrieve binary data stored in a column, an advanced topic that you will rarely need, so we are not considering it here. SQL92 experts will have noticed some keywords from the SQL standard are missing, notably SCROLL and FOR READ ONLY or FOR UPDATE. In PostgreSQL all cursors can scroll, so the keyword SCROLL, whilst accepted, has no effect. PostgreSQL also only supports 'read only' cursors, so we cannot use a cursor for updating the database, so the FOR clause is equally redundant, though the syntax FOR READ ONLY is accepted for compatibility with the standard.

The syntax of FETCH is very simple:

FETCH [FORWARD|BACKWARD] [number|ALL|NEXT] [IN
 cursor_name];

Normally FORWARD or BACKWARD is omitted, the default is FORWARD. We use a number, or ALL, to tell the FETCH that we wish to retrieve all, or just a fixed number, of rows. The keyword NEXT is the same as giving a number of 1. The SELECT statement specified when we declared the cursor determines the actual rows that can be fetched.

Now we have seen the principle, it's time to try it out in practice. This is sel4.c, which fetches data using a cursor:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <libpq-fe.h>
PGconn *conn = NULL;
void tidyup_and_exit();
int execute_one_statement(const char *stmt_to_exec,
    PGresult **result);

int main()
{
PGresult *result;
int stmt_ok;
const char *connection_str = "host=gw1 dbname=rick";
FILE *output_stream;
PQprintOpt print_options;
conn = PQconnectdb(connection_str);
if (PQstatus(conn) == CONNECTION_BAD) {
fprintf(stderr, "Connection to %s failed, %s",
   connection_str, PQerrorMessage(conn));
tidyup_and_exit();
} else {
printf("Connected OK\n");
}
stmt_ok = execute_one_statement("BEGIN WORK",
 &result);
if (stmt_ok) {
PQclear(result);
stmt_ok = execute_one_statement("DECLARE
 age_fname_cursor CURSOR FOR
SELECT age, fname FROM children
WHERE age < '6'", &result);
if (stmt_ok) {
PQclear(result);
stmt_ok = execute_one_statement("FETCH ALL IN
 age_fname_cursor", &result);
if (stmt_ok) {
PQclear(result);
stmt_ok = execute_one_statement("COMMIT WORK",
    &result);
}
}
}
if (stmt_ok) PQclear(result);
PQfinish(conn);
return EXIT_SUCCESS;
}

int execute_one_statement(const char *stmt_to_exec,
   PGresult **res_ptr) {
int retcode = 1;
const char *str_res;
PGresult *local_result;
printf("About to execute %s\n", stmt_to_exec)
local_result = PQexec(conn, stmt_to_exec);
*res_ptr = local_result;
if (!local_result) {
printf("PQexec command failed, no error code\n");
retcode = 0;
} else {
switch (PQresultStatus(local_result)) {
case PGRES_COMMAND_OK:
str_res = PQcmdTuples(local_result);
if (strlen(str_res) > 0) {
printf("Command executed OK, %s rows affected\n",
 str_res);
} else {
printf("Command executed OK, no rows affected\n");
}
break;
case PGRES_TUPLES_OK:
printf("Select executed OK, %d rows found\n",
    PQntuples(local_result));
break;
default:
printf("Command failed with code %s, error message
   %s\n",
PQresStatus(PQresultStatus(local_result)),
PQresultErrorMessage(local_result));
PQclear(local_result);
retcode = 0;
break;
}

}
return retcode;
} /* execute_one_statement */

void tidyup_and_exit() {
if (conn != NULL) PQfinish(conn);
exit(EXIT_FAILURE);
}

The main changes from the previous version are highlighted. We have removed the printing of the output, shortly we will see a more useful way of accessing the retrieved data than simply printing it out. When we run this, we get:

Connected OK
About to execute BEGIN WORK
Command executed OK, no rows affected
About to execute DECLARE age_fname_cursor CURSOR FOR
   SELECT age, fname FROM children WHERE age < '6'
Command executed OK, no rows affected
About to execute FETCH ALL IN age_fname_cursor
Select executed OK, 3 rows found
About to execute COMMIT WORK
Command executed OK, no rows affected

It is now trivial to fetch the rows one at a time, simply by changing the ALL to a 1 in the FETCH statement, and checking that rows are actually returned. A FETCH, just like a SELECT, can succeed, but return no data.

The changed lines in sel5.c, are:

conn = PQconnectdb(connection_str);
if (PQstatus(conn) == CONNECTION_BAD) {
fprintf(stderr, "Connection to %s failed, %s",
   connection_str, tidyup_and_exit();
} else {
printf("Connected OK\n");
}
stmt_ok = execute_one_statement("BEGIN WORK",
   &result);
if (stmt_ok) {
PQclear(result);
stmt_ok = execute_one_statement("DECLARE
 age_fname_cursor CURSOR FOR SELECT age, fname FROM children
 WHERE age < '6'", &result);
stmt_ok = execute_one_statement("FETCH 1 IN
   age_fname_cursor", &result);
while(stmt_ok && PQntuples(result) > 0) {
PQclear(result)
stmt_ok = execute_one_statement("FETCH NEXT IN
   age_fname_cursor",
&result);
}
stmt_ok = execute_one_statement("COMMIT WORK",
  &result);
}
if (stmt_ok) PQclear(result);
PQfinish(conn);
return EXIT_SUCCESS;
}

The output is:

[rick@gw1 psql]$ ./sel5
Connected OK
About to execute BEGIN WORK
Command executed OK, no rows affected
About to execute DECLARE age_fname_cursor CURSOR FOR
   SELECT age, fname FROM children WHERE age < '6'
Command executed OK, no rows affected
About to execute FETCH 1 IN age_fname_cursor
Select executed OK, 1 rows found
About to execute FETCH NEXT IN age_fname_cursor
Select executed OK, 1 rows found
About to execute FETCH NEXT IN age_fname_cursor
Select executed OK, 1 rows found
About to execute FETCH NEXT IN age_fname_cursor
Select executed OK, 0 rows found
About to execute COMMIT WORK
Command executed OK, no rows affected

As you can see, it's actually very easy to retrieve our data one row at a time. The only drawback, which usually doesn't matter, is that we don't know until we have retrieved all the data how many rows there were. This is because PQntuples(result), not unreasonably for a FETCH of one row, has a value of one when a row is retrieved.

Now we have our data being retrieved in a more manageable format, we can progress to access individual parts of that information.

Getting column information

The first piece of information that it's useful to extract from the returned data, is the column information (both the column names and data types). This is quite easy to do with three functions, one to discover how many columns there are, one for the name of each column, and one for the data size of that column. Of course, you could specify by name each of the columns you want, but then, in theory, you know in advance the type of each column that will be returned.

In general, it is a good idea to specify by name each column you require. The reason for this is to prevent your code from being 'surprised' if the database has new columns added. If columns are to be deleted, then at least a 'grep' through the code will show that the names of the columns to be deleted are used in the code. Assuming the column type in code isless clear cut - it may be that determining the type at run time means your code can then automatically take account of any changes of column type. Conversely you are writing more code, which increases the risk of a bug and slightly decreases performance.

We find the number of columns in the returned result with PQnfields:

int PQnfields(PGresult *result);

We find the name of an individual column using PQfname function, and passing the field_index, where the first column is at index 0:

char *PQfname(PGresult *result, int field_index);

We can get an idea of the size of the data with PQfsize. We use the word 'idea' because it returns only the amount of space that PostgreSQL has used internally, and even then is -1 for variable length fields, such as VARCHAR.

int PQfsize(PGresult *result, int field_index);

The obvious omission in this set is the type of the column being returned. Unfortunately, the routine that appears to do this, PQftype, returns an Oid type (actually a typedef for an unsigned integer). This gives only an internal representation of the type, and is not externally documented anywhere, which makes it almost useless. For this reason we will not use it here, though hopefully in a later release PostgreSQL, or at least the libpq library, will develop a more useful routine for discovering the type being returned.

We can now use this knowledge to extend our sel5.c program into sel6.c, by retrieving the column information. It doesn't matter which row of the retrieved data we use to extract the column header information from, indeed even if the SELECT statement returned no rows, we could still access the column information.

The changes are very minor, so we just show the additions here, rather than repeat all the code.

First we add a prototype for our new function:

void show_column_info(PGresult *result);

Then we call it when data is retrieved. We allow it to be called each time data is returned, to show that this works, though of course we would not do this in production code.

if (stmt_ok) {
PQclear(result);
stmt_ok = execute_one_statement("FETCH 1 IN
  age_fname_cursor",
&result);
if (stmt_ok) show_column_info(result);
while(stmt_ok && PQntuples(result) > 0) {
show_column_info(result);
PQclear(result);
stmt_ok = execute_one_statement("FETCH NEXT IN
    age_fname_cursor",
&result);
}
stmt_ok = execute_one_statement("COMMIT WORK",
  &result);
}

Finally, here is the implementation of show_column_info:

void show_column_info(PGresult *result) {
int num_columns;
int i;
if (!result) return;
num_columns = PQnfields(result);
printf("%d columns in the result set\n",
    num_columns);
for(i = 0; i < num_columns; i++) {
printf("Field %d, Name %s, Internal size %d\n",
i,
PQfname(result, i),
PQfsize(result, i));
}
} /* show_column_info */

When we execute this, we get output like this:

About to execute FETCH NEXT IN age_fname_cursor
Select executed OK, 1 rows found
2 columns in the result set
Field 0, Name age, Internal size 4
Field 1, Name fname, Internal size -1

We have abbreviated the full output, to save space. Notice that the size of fname is reported as -1, because it is a variable size field type, a VARCHAR.

Originally Published: Wednesday, 18 July 2001	Author: Neil Matthew and Richard Stones
Published to: develop_articles/Development Articles	Page: 3/4 - [Printable]
Excerpt from Professional Linux Programming: Chapter 4; PostgreSQL Interfacing In this articles excerpt from the brilliant Professional Linux Programming (Wrox Press) the authors look at two ways of accessing a PostgreSQL database from C code. Firstly with a conventional library based method, then at how SQL can be embedded more directly into C code. The excerpt also contains major parts of a complete application. Linux.com wants to thank Wrox Press for making this valuable educational material available to the Linux.com community!

	<< Page 3 of 4 >>
Cursors Suppose our program was accessing a large database, with thousands of rows, and we executed a query that returned all those rows. Our program could suddenly need a very large amount of memory indeed, to store all those results. Since potentially this is all happening across a network as well, clearly we need a way of retrieving the data in smaller quantities, say a row at a time. There is a standard way of doing this, and it is how you would normally fetch data from an SQL database into a C program, (or indeed programs coded in many other languages). What we need is a cursor. A cursor is a feature we have not met so far, because they are normally only applicable either when using SQL embedded in an external program, or within procedural language function stored in the database, often referred to as 'stored procedures'. Cursors are not generally used from the command line. The SQL92 standard only defines cursors for use in embedded programs, so in most database environments this is the only place you can use them, although in an extension to the SQL standard PostgreSQL does allow them from the command line as well. A cursor is a way of scrolling through a set of results, fetching returned data in discrete blocks. To use a cursor, you declare it - with a name that has an associated SELECT statement. You then FETCH the results, usually one row at a time, though you can fetch many rows at a time. The SQL92 standard, and many other implementations of cursors, require an additional step, an OPEN cursor command, between the DECLARE and the FETCH. The PostgreSQL libpq does not need it, the DECLARE is taken as an implicit command. When we get onto ecpg (the alternative way of embedding SQL in C code), you will see that we need to write the OPEN CURSOR command in the source code. In pseudo code, the sequence looks something like: BEGIN A TRANSACTION DECLARE CURSOR mycursor FOR SELECT-statement [OPEN mycursor] DO { FETCH some data from mycursor Process the row(s) retrieved } WHILE the FETCH command found data CLOSE mycursor COMMIT WORK There are two new SQL commands here, DECLARE CURSOR and FETCH, both of which we need to look at before we can write some program code that fetches data using a cursor. The syntax for declaring a cursor is very straightforward: DECLARE cursor_name [BINARY] CURSOR FOR <SELECT-statement> This creates, and implicitly opens in libpq, a cursor with the given name. Notice that the cursor is bound to a single SELECT statement. This cursor name is now effectively another way of referring to the SELECT statement. We only need the BINARY option when we wish to retrieve binary data stored in a column, an advanced topic that you will rarely need, so we are not considering it here. SQL92 experts will have noticed some keywords from the SQL standard are missing, notably SCROLL and FOR READ ONLY or FOR UPDATE. In PostgreSQL all cursors can scroll, so the keyword SCROLL, whilst accepted, has no effect. PostgreSQL also only supports 'read only' cursors, so we cannot use a cursor for updating the database, so the FOR clause is equally redundant, though the syntax FOR READ ONLY is accepted for compatibility with the standard. The syntax of FETCH is very simple: FETCH [FORWARD\|BACKWARD] [number\|ALL\|NEXT] [IN cursor_name]; Normally FORWARD or BACKWARD is omitted, the default is FORWARD. We use a number, or ALL, to tell the FETCH that we wish to retrieve all, or just a fixed number, of rows. The keyword NEXT is the same as giving a number of 1. The SELECT statement specified when we declared the cursor determines the actual rows that can be fetched. Now we have seen the principle, it's time to try it out in practice. This is sel4.c, which fetches data using a cursor: #include <stdlib.h> #include <stdio.h> #include <string.h> #include <libpq-fe.h> PGconn conn = NULL; void tidyup_and_exit(); int execute_one_statement(const char stmt_to_exec, PGresult *result); int main() { PGresult result; int stmt_ok; const char connection_str = "host=gw1 dbname=rick"; FILE output_stream; PQprintOpt print_options; conn = PQconnectdb(connection_str); if (PQstatus(conn) == CONNECTION_BAD) { fprintf(stderr, "Connection to %s failed, %s", connection_str, PQerrorMessage(conn)); tidyup_and_exit(); } else { printf("Connected OK\n"); } stmt_ok = execute_one_statement("BEGIN WORK", &result); if (stmt_ok) { PQclear(result); stmt_ok = execute_one_statement("DECLARE age_fname_cursor CURSOR FOR SELECT age, fname FROM children WHERE age < '6'", &result); if (stmt_ok) { PQclear(result); stmt_ok = execute_one_statement("FETCH ALL IN age_fname_cursor", &result); if (stmt_ok) { PQclear(result); stmt_ok = execute_one_statement("COMMIT WORK", &result); } } } if (stmt_ok) PQclear(result); PQfinish(conn); return EXIT_SUCCESS; } int execute_one_statement(const char stmt_to_exec, PGresult res_ptr) { int retcode = 1; const char str_res; PGresult local_result; printf("About to execute %s\n", stmt_to_exec) local_result = PQexec(conn, stmt_to_exec); res_ptr = local_result; if (!local_result) { printf("PQexec command failed, no error code\n"); retcode = 0; } else { switch (PQresultStatus(local_result)) { case PGRES_COMMAND_OK: str_res = PQcmdTuples(local_result); if (strlen(str_res) > 0) { printf("Command executed OK, %s rows affected\n", str_res); } else { printf("Command executed OK, no rows affected\n"); } break; case PGRES_TUPLES_OK: printf("Select executed OK, %d rows found\n", PQntuples(local_result)); break; default: printf("Command failed with code %s, error message %s\n", PQresStatus(PQresultStatus(local_result)), PQresultErrorMessage(local_result)); PQclear(local_result); retcode = 0; break; } } return retcode; } /* execute_one_statement / void tidyup_and_exit() { if (conn != NULL) PQfinish(conn); exit(EXIT_FAILURE); } The main changes from the previous version are highlighted. We have removed the printing of the output, shortly we will see a more useful way of accessing the retrieved data than simply printing it out. When we run this, we get: Connected OK About to execute BEGIN WORK Command executed OK, no rows affected About to execute DECLARE age_fname_cursor CURSOR FOR SELECT age, fname FROM children WHERE age < '6' Command executed OK, no rows affected About to execute FETCH ALL IN age_fname_cursor Select executed OK, 3 rows found About to execute COMMIT WORK Command executed OK, no rows affected It is now trivial to fetch the rows one at a time, simply by changing the ALL to a 1 in the FETCH statement, and checking that rows are actually returned. A FETCH, just like a SELECT, can succeed, but return no data. The changed lines in sel5.c, are: conn = PQconnectdb(connection_str); if (PQstatus(conn) == CONNECTION_BAD) { fprintf(stderr, "Connection to %s failed, %s", connection_str, tidyup_and_exit(); } else { printf("Connected OK\n"); } stmt_ok = execute_one_statement("BEGIN WORK", &result); if (stmt_ok) { PQclear(result); stmt_ok = execute_one_statement("DECLARE age_fname_cursor CURSOR FOR SELECT age, fname FROM children WHERE age < '6'", &result); stmt_ok = execute_one_statement("FETCH 1 IN age_fname_cursor", &result); while(stmt_ok && PQntuples(result) > 0) { PQclear(result) stmt_ok = execute_one_statement("FETCH NEXT IN age_fname_cursor", &result); } stmt_ok = execute_one_statement("COMMIT WORK", &result); } if (stmt_ok) PQclear(result); PQfinish(conn); return EXIT_SUCCESS; } The output is: [rick@gw1 psql]$ ./sel5* Connected OK About to execute BEGIN WORK Command executed OK, no rows affected About to execute DECLARE age_fname_cursor CURSOR FOR SELECT age, fname FROM children WHERE age < '6' Command executed OK, no rows affected About to execute FETCH 1 IN age_fname_cursor Select executed OK, 1 rows found About to execute FETCH NEXT IN age_fname_cursor Select executed OK, 1 rows found About to execute FETCH NEXT IN age_fname_cursor Select executed OK, 1 rows found About to execute FETCH NEXT IN age_fname_cursor Select executed OK, 0 rows found About to execute COMMIT WORK Command executed OK, no rows affected As you can see, it's actually very easy to retrieve our data one row at a time. The only drawback, which usually doesn't matter, is that we don't know until we have retrieved all the data how many rows there were. This is because PQntuples(result), not unreasonably for a FETCH of one row, has a value of one when a row is retrieved. Now we have our data being retrieved in a more manageable format, we can progress to access individual parts of that information. Getting column information The first piece of information that it's useful to extract from the returned data, is the column information (both the column names and data types). This is quite easy to do with three functions, one to discover how many columns there are, one for the name of each column, and one for the data size of that column. Of course, you could specify by name each of the columns you want, but then, in theory, you know in advance the type of each column that will be returned. In general, it is a good idea to specify by name each column you require. The reason for this is to prevent your code from being 'surprised' if the database has new columns added. If columns are to be deleted, then at least a 'grep' through the code will show that the names of the columns to be deleted are used in the code. Assuming the column type in code isless clear cut - it may be that determining the type at run time means your code can then automatically take account of any changes of column type. Conversely you are writing more code, which increases the risk of a bug and slightly decreases performance. We find the number of columns in the returned result with PQnfields: int PQnfields(PGresult result); We find the name of an individual column using PQfname function, and passing the field_index, where the first column is at index 0: char PQfname(PGresult result, int field_index); We can get an idea of the size of the data with PQfsize. We use the word 'idea' because it returns only the amount of space that PostgreSQL has used internally, and even then is -1 for variable length fields, such as VARCHAR. int PQfsize(PGresult result, int field_index); The obvious omission in this set is the type of the column being returned. Unfortunately, the routine that appears to do this, PQftype, returns an Oid type (actually a typedef for an unsigned integer). This gives only an internal representation of the type, and is not externally documented anywhere, which makes it almost useless. For this reason we will not use it here, though hopefully in a later release PostgreSQL, or at least the libpq library, will develop a more useful routine for discovering the type being returned. We can now use this knowledge to extend our sel5.c program into sel6.c, by retrieving the column information. It doesn't matter which row of the retrieved data we use to extract the column header information from, indeed even if the SELECT statement returned no rows, we could still access the column information. The changes are very minor, so we just show the additions here, rather than repeat all the code. First we add a prototype for our new function: void show_column_info(PGresult result); Then we call it when data is retrieved. We allow it to be called each time data is returned, to show that this works, though of course we would not do this in production code. if (stmt_ok) { PQclear(result); stmt_ok = execute_one_statement("FETCH 1 IN age_fname_cursor", &result); if (stmt_ok) show_column_info(result); while(stmt_ok && PQntuples(result) > 0) { show_column_info(result); PQclear(result); stmt_ok = execute_one_statement("FETCH NEXT IN age_fname_cursor", &result); } stmt_ok = execute_one_statement("COMMIT WORK", &result); } Finally, here is the implementation of show_column_info: void show_column_info(PGresult result) { int num_columns; int i; if (!result) return; num_columns = PQnfields(result); printf("%d columns in the result set\n", num_columns); for(i = 0; i < num_columns; i++) { printf("Field %d, Name %s, Internal size %d\n", i, PQfname(result, i), PQfsize(result, i)); } } /* show_column_info */ When we execute this, we get output like this: About to execute FETCH NEXT IN age_fname_cursor Select executed OK, 1 rows found 2 columns in the result set Field 0, Name age, Internal size 4 Field 1, Name fname, Internal size -1 We have abbreviated the full output, to save space. Notice that the size of fname is reported as -1, because it is a variable size field type, a VARCHAR.
	<< Page 3 of 4 >>