15.19 从C语言中读取类文件对象
最后更新于:2022-04-01 15:41:33
## 问题
You want to write C extension code that consumes data from any Python file-like object(e.g., normal files, StringIO objects, etc.).
## 解决方案
To consume data on a file-like object, you need to repeatedly invoke its read() methodand take steps to properly decode the resulting data.Here is a sample C extension function that merely consumes all of the data on a file-likeobject and dumps it to standard output so you can see it:
#define CHUNK_SIZE 8192
/* Consume a “file-like” object and write bytes to stdout [*](#)/static PyObject [*](#)py_consume_file(PyObject [*](#)self, PyObject [*](#)args) {
> > > > > PyObject [*](#)obj;PyObject [*](#)read_meth;PyObject [*](#)result = NULL;PyObject [*](#)read_args;
if (!PyArg_ParseTuple(args,”O”, &obj)) {return NULL;> > }
> > /* Get the read method of the passed object [*](#)/if ((read_meth = PyObject_GetAttrString(obj, “read”)) == NULL) {
> > > return NULL;
> > }
> > /* Build the argument list to read() [*](#)/read_args = Py_BuildValue(“(i)”, CHUNK_SIZE);while (1) {
> > > > > > PyObject [*](#)data;PyObject [*](#)enc_data;char [*](#)buf;Py_ssize_t len;
> > > /* Call read() [*](#)/if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) {
> > > > goto final;
> > > }
> > > /* Check for EOF [*](#)/if (PySequence_Length(data) == 0) {
> > > > Py_DECREF(data);break;
> > > }
> > > /* Encode Unicode as Bytes for C [*](#)/if ((enc_data=PyUnicode_AsEncodedString(data,”utf-8”,”strict”))==NULL) {
> > > > Py_DECREF(data);goto final;
> > > }
> > > /* Extract underlying buffer data [*](#)/PyBytes_AsStringAndSize(enc_data, &buf, &len);
> > > /* Write to stdout (replace with something more useful) [*](#)/write(1, buf, len);
> > > /* Cleanup [*](#)/Py_DECREF(enc_data);Py_DECREF(data);
> > }result = Py_BuildValue(“”);
final:/* Cleanup [*](#)/Py_DECREF(read_meth);Py_DECREF(read_args);return result;
}
To test the code, try making a file-like object such as a StringIO instance and pass it in:
>>> import io
>>> f = io.StringIO('Hello\nWorld\n')
>>> import sample
>>> sample.consume_file(f)
Hello
World
>>>
## 讨论
Unlike a normal system file, a file-like object is not necessarily built around a low-levelfile descriptor. Thus, you can’t use normal C library functions to access it. Instead, youneed to use Python’s C API to manipulate the file-like object much like you would inPython.In the solution, the read() method is extracted from the passed object. An argumentlist is built and then repeatedly passed to PyObject_Call() to invoke the method. Todetect end-of-file (EOF), PySequence_Length() is used to see if the returned result haszero length.For all I/O operations, you’ll need to concern yourself with the underlying encodingand distinction between bytes and Unicode. This recipe shows how to read a file in textmode and decode the resulting text into a bytes encoding that can be used by C. If youwant to read the file in binary mode, only minor changes will be made. For example:
...
/* Call read() [*](#)/if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) {
> goto final;
}
/* Check for EOF [*](#)/if (PySequence_Length(data) == 0) {
> Py_DECREF(data);break;
}if (!PyBytes_Check(data)) {
> Py_DECREF(data);PyErr_SetString(PyExc_IOError, “File must be in binary mode”);goto final;
}
/* Extract underlying buffer data [*](#)/PyBytes_AsStringAndSize(data, &buf, &len);...
The trickiest part of this recipe concerns proper memory management. When workingwith PyObject * variables, careful attention needs to be given to managing referencecounts and cleaning up values when no longer needed. The various Py_DECREF() callsare doing this.The recipe is written in a general-purpose manner so that it can be adapted to other fileoperations, such as writing. For example, to write data, merely obtain the write()method of the file-like object, convert data into an appropriate Python object (bytes orUnicode), and invoke the method to have it written to the file.Finally, although file-like objects often provide other methods (e.g., readline(),read_into()), it is probably best to just stick with the basic read() and write() meth‐ods for maximal portability. Keeping things as simple as possible is often a good policyfor C extensions.