This post originated from an RSS feed registered with Python Buzz
by Andrew Dalke.
Original Post: Read 0 bytes, run out of memory
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
I've been working on a file format for chemical fingerprints,
influenced by the PNG file format. To make sure I'm doing it right, I
wrote a program to dump blocks from PNG files. I made a mistake and
my program gave a MemoryError. How did that happen when my test file
is only a few K long?
I tracked it down. I don't know if it's a bug. Here's something for
you all to ponder over:
BLOCKSIZE = 10*1024*1024
f=open("empty.txt", "w")
f.close()
f=open("empty.txt")
data = []
for i in range(10000):
s = f.read(BLOCKSIZE)
assert len(s) == 0
data.append(s)
That's an empty file, so the read() must return empty strings. The
assert statement verifies that that's the case. But when I run it I get:
Python(18996) malloc: *** vm_allocate(size=10489856) failed (error code=3)
Python(18996) malloc: *** error: can't allocate region
Python(18996) malloc: *** set a breakpoint in szone_error to debug
Traceback (most recent call last):
File "mem_fill.py", line 9, in <module>
s = f.read(BLOCKSIZE)
MemoryError
The reason why is in the C implementation of read
(Objects/fileobject.c). The relevant line i:
v = PyString_FromStringAndSize((char *)NULL, buffersize);
That preallocates space assuming the requested read size will be
correct. In my example code it preallocates 10MB of space even though
the result is 0 bytes long. Since I keep the result around, all of
the preallocated space is also kept around. Repeat that 10,000 times
and my machine quickly runs out of memory. So will yours.
Bug in Python? Correct behavior? You decide. Feel free to
make
comments if you wish.