Sometimes you know the exact
quantity, type, and lifetime of
the objects in your program. But not
always.
How many planes will an air-traffic
system need to handle? How many shapes will a CAD system use? How many nodes
will there be in a network?
To solve the general programming problem,
it’s essential that you be able to create and destroy objects at runtime.
Of course, C has always provided the dynamic memory allocation functions
malloc( )
and free( ) (along with variants of malloc( )) that
allocate storage from the heap (also called the free store) at
runtime.
However, this simply won’t work in
C++. The constructor doesn’t allow you to hand it
the address of the memory to initialize, and for good reason. If you could do
that, you
might:
And of
course, even if you did everything correctly, anyone who modifies your program
is prone to the same errors. Improper initialization is responsible for a large
portion of programming problems, so it’s especially important to guarantee
constructor calls for objects created on the heap.
So how does C++ guarantee proper
initialization
and cleanup, but allow you to create objects dynamically on the
heap?
The answer is by bringing dynamic object
creation into the core of the language. malloc( ) and
free( ) are library functions, and thus outside the control of the
compiler. However, if you have an operator to perform the combined act of
dynamic storage allocation and initialization and another operator to perform
the combined act of cleanup and releasing storage, the compiler can still
guarantee that constructors and destructors will be called for all
objects.
In this chapter, you’ll learn how
C++’s new and delete elegantly solve this problem by safely
creating objects on the
heap.
When a C++ object is created, two events
occur:
By now you should
believe that step two always happens. C++ enforces it because
uninitialized objects are a major source of program bugs. It doesn’t
matter where or how the object is created – the constructor is always
called.
Often these three
regions are placed in a single contiguous piece of physical memory: the static
area, the stack, and the heap (in an order determined by the compiler writer).
However, there are no rules. The stack may be in a special place, and the heap
may be implemented by making calls for chunks of memory from the operating
system. As a programmer, these things are normally shielded from you, so all you
need to think about is that the memory is there when you call for
it.
To allocate memory dynamically at
runtime, C provides functions in its standard library:
malloc( ) and its variants
calloc( ) and
realloc( ) to produce memory from the
heap, and
free( ) to release the memory back to the
heap. These functions are pragmatic but primitive and require understanding and
care on the part of the programmer. To create an instance of a class on the heap
using C’s dynamic memory functions, you’d have to do something like
this:
//: C13:MallocClass.cpp // Malloc with class objects // What you'd have to do if not for "new" #include "../require.h" #include <cstdlib> // malloc() & free() #include <cstring> // memset() #include <iostream> using namespace std; class Obj { int i, j, k; enum { sz = 100 }; char buf[sz]; public: void initialize() { // Can't use constructor cout << "initializing Obj" << endl; i = j = k = 0; memset(buf, 0, sz); } void destroy() const { // Can't use destructor cout << "destroying Obj" << endl; } }; int main() { Obj* obj = (Obj*)malloc(sizeof(Obj)); require(obj != 0); obj->initialize(); // ... sometime later: obj->destroy(); free(obj); } ///:~
You can see the use of
malloc( ) to create storage for the object in the
line:
Obj* obj = (Obj*)malloc(sizeof(Obj));
Here, the user must determine the size of
the object (one place for an error). malloc( ) returns a
void* because it just produces a patch of memory, not an object. C++
doesn’t allow a void* to be assigned to any other pointer, so it
must be cast.
Because malloc( ) may fail to
find any memory (in which case it returns zero), you must check the returned
pointer to make sure it was successful.
But the worst problem is this
line:
Obj->initialize();
If users make it this far correctly, they
must remember to initialize the object before it is used. Notice that a
constructor was not used because the constructor cannot
be called
explicitly[50]
– it’s called for you by the compiler when an object is created. The
problem here is that the user now has the option to forget to perform the
initialization before the object is used, thus reintroducing a major source of
bugs.
It also turns out that many programmers
seem to find C’s dynamic memory functions too confusing and complicated;
it’s not uncommon to find C programmers who use virtual memory
machines allocating huge arrays of variables in the
static storage area to avoid thinking about dynamic memory allocation. Because
C++ is attempting to make library use safe and effortless for the casual
programmer, C’s approach to dynamic memory is
unacceptable.
The solution in C++ is to combine all the
actions necessary to create an object into a single operator called
new. When you create an
object with new (using a
new-expression), it
allocates enough storage on the heap to hold the object and calls the
constructor for that storage. Thus, if you say
MyType *fp = new MyType(1,2);
at runtime, the equivalent of
malloc(sizeof(MyType)) is called (often, it is literally a call to
malloc( )), and the constructor for
MyType is called with the resulting address as the this
pointer, using (1,2) as the argument list. By the
time the pointer is assigned to fp, it’s a live, initialized object
– you can’t even get your hands on it before then. It’s also
automatically the proper MyType type so no cast
is necessary.
The default new checks to make
sure the memory allocation was successful before passing the address to the
constructor, so you don’t have to explicitly determine if the call was
successful. Later in the chapter you’ll find out what happens if
there’s no memory left.
You can create a new-expression using any
constructor available for the class. If the constructor has no arguments, you
write the new-expression without the constructor argument list:
MyType *fp = new MyType;
Notice how simple the process of creating
objects on the heap becomes – a single expression, with all the sizing,
conversions, and safety checks built in. It’s as easy to create an object
on the heap as it is on the
stack.
The complement to the new-expression is
the delete-expression, which first calls the
destructor and then releases the memory (often with a call to
free( )). Just as a new-expression returns a
pointer to the object, a delete-expression requires the address of an
object.
delete fp;
This destructs and then releases the
storage for the dynamically allocated MyType object created
earlier.
delete can
be called only for an object created by new. If you malloc( )
(or calloc( ) or realloc( )) an object and then
delete it, the behavior is undefined. Because most default
implementations of new and delete use malloc( ) and
free( ), you’d probably end up releasing the memory without
calling the destructor.
If the pointer you’re deleting is
zero, nothing will happen. For this reason, people often
recommend setting a pointer to zero immediately after you delete it, to prevent
deleting it twice. Deleting an object more than once is definitely a bad thing
to do, and will cause
problems.
This example shows that initialization
takes place:
//: C13:Tree.h #ifndef TREE_H #define TREE_H #include <iostream> class Tree { int height; public: Tree(int treeHeight) : height(treeHeight) {} ~Tree() { std::cout << "*"; } friend std::ostream& operator<<(std::ostream& os, const Tree* t) { return os << "Tree height is: " << t->height << std::endl; } }; #endif // TREE_H ///:~
//: C13:NewAndDelete.cpp // Simple demo of new & delete #include "Tree.h" using namespace std; int main() { Tree* t = new Tree(40); cout << t; delete t; } ///:~
We can prove that the constructor is
called by printing out the value of the Tree. Here, it’s done by
overloading the operator<< to use with an ostream and a
Tree*.
Note, however, that even though the function is declared as a
friend, it is defined as an inline! This is a
mere convenience – defining a friend function as an inline to a
class doesn’t change the friend status or the fact that it’s
a global function and not a class member function. Also notice that the return
value is the result of the entire output expression, which is an
ostream& (which it must be, to satisfy the return value type of the
function).
When you create automatic objects on the
stack, the size of the objects
and their lifetime is built
right into the generated code, because the compiler knows the exact type,
quantity, and scope. Creating objects on the heap
involves
additional overhead, both in time and in space. Here’s a typical scenario.
(You can replace malloc( ) with
calloc( ) or
realloc( ).)
You call malloc( ), which
requests a block of memory from the pool. (This code may actually be part of
malloc( ).)
The pool is searched for a block of
memory large enough to satisfy the request. This is done by checking a map or
directory of some sort that shows which blocks are currently in use and which
are available. It’s a quick process, but it may take several tries so it
might not be deterministic – that is, you can’t necessarily count on
malloc( ) always taking exactly the same
amount of time.
Before a pointer to that block is
returned, the size and location of the block must be recorded so further calls
to malloc( ) won’t use it, and so that when you call
free( ), the system knows how much memory to
release.
The way all this is implemented can vary
widely. For example, there’s nothing to prevent primitives for memory
allocation being implemented in the processor. If you’re curious, you can
write test programs to try to guess the way your malloc( ) is
implemented. You can also read the library source code, if you have it (The GNU
C sources are always
available).
Using new and delete, the
Stash example introduced previously in this book can be rewritten using
all the features discussed in the book so far. Examining the new code will also
give you a useful review of the topics.
At this point in the book, neither the
Stash nor Stack classes will
“own” the objects
they point to; that is, when the Stash or Stack object goes out of
scope, it will not call delete for all the objects it points to. The
reason this is not possible is because, in an attempt to be generic, they hold
void pointers. If you
delete a void pointer, the only thing that happens is the memory
gets released, because there’s no type information and no way for the
compiler to know what destructor to
call.
It’s worth making a point that if
you call delete for a void*, it’s almost certainly going to
be a bug in your program unless the destination of that pointer is very simple;
in particular, it should not have a destructor. Here’s an example to show
you what happens:
//: C13:BadVoidPointerDeletion.cpp // Deleting void pointers can cause memory leaks #include <iostream> using namespace std; class Object { void* data; // Some storage const int size; const char id; public: Object(int sz, char c) : size(sz), id(c) { data = new char[size]; cout << "Constructing object " << id << ", size = " << size << endl; } ~Object() { cout << "Destructing object " << id << endl; delete []data; // OK, just releases storage, // no destructor calls are necessary } }; int main() { Object* a = new Object(40, 'a'); delete a; void* b = new Object(40, 'b'); delete b; } ///:~
The class Object contains a
void* that is initialized to “raw” data (it doesn’t
point to objects that have destructors). In the Object destructor,
delete is called for this void* with no ill effects, since the
only thing we need to happen is for the storage to be released.
However, in main( ) you can
see that it’s very necessary that delete know what type of object
it’s working with. Here’s the output:
Constructing object a, size = 40 Destructing object a Constructing object b, size = 40
Because delete a knows that
a points to an Object, the destructor is called and thus the
storage allocated for data is released. However, if you manipulate an
object through a void* as in the case of delete b, the only thing
that happens is that the storage for the Object is released – but
the destructor is not called so there is no release of the memory that
data points to. When this program compiles, you probably won’t see
any warning messages; the compiler assumes you know what you’re doing. So
you get a very quiet memory leak.
If you have a
memory leak in your program, search through all the
delete statements and check the type of pointer being deleted. If
it’s a void* then you’ve probably found one source of your
memory leak (C++ provides ample other opportunities for memory leaks,
however).
To make the Stash and Stack
containers flexible (able to hold any type of object), they will hold
void pointers. This means that when a pointer is returned from the
Stash or Stack object, you must cast it to the proper type before
using it; as seen above, you must also cast it to the proper type before
deleting it or you’ll get a memory leak.
The other memory leak issue has to do
with making sure that delete is actually called for each object pointer
held in the container. The container cannot “own” the pointer
because it holds it as a void* and thus cannot perform the proper
cleanup. The user must be responsible for cleaning up the objects. This produces
a serious problem if you add pointers to objects created on the stack and
objects created on the heap to the same container because a
delete-expression is unsafe for a pointer that hasn’t been allocated on
the heap. (And when you fetch a pointer back from the container, how will you
know where its object has been allocated?) Thus, you must be sure that objects
stored in the following versions of Stash and Stack are made only
on the heap, either through careful programming or by creating classes that can
only be built on the heap.
It’s also important to make sure
that the client programmer takes responsibility for cleaning up all the pointers
in the container. You’ve seen in previous examples how the Stack
class checks in its destructor that all the Link objects have been
popped. For a Stash of pointers, however, another approach is
needed.
This new version of the Stash
class, called PStash, holds pointers to objects that exist by
themselves on the heap, whereas the old Stash in earlier chapters copied
the objects by value into the Stash container. Using new and
delete, it’s easy and safe to hold pointers to objects that have
been created on the heap.
Here’s the header file for the
“pointer Stash”:
//: C13:PStash.h // Holds pointers instead of objects #ifndef PSTASH_H #define PSTASH_H class PStash { int quantity; // Number of storage spaces int next; // Next empty space // Pointer storage: void** storage; void inflate(int increase); public: PStash() : quantity(0), storage(0), next(0) {} ~PStash(); int add(void* element); void* operator[](int index) const; // Fetch // Remove the reference from this PStash: void* remove(int index); // Number of elements in Stash: int count() const { return next; } }; #endif // PSTASH_H ///:~
The underlying data elements are fairly
similar, but now storage is an array of void pointers, and the
allocation of storage for that array is performed with
new instead of malloc( ). In the
expression
void** st = new void*[quantity + increase];
the type of object allocated is a
void*, so the expression allocates an array of void
pointers.
The destructor deletes the storage where
the void pointers are held rather than attempting to delete what they
point at (which, as previously noted, will release their storage and not call
the destructors because a void
pointer has no type
information).
The other change is the replacement of
the fetch( ) function with operator[
], which makes more sense syntactically. Again,
however, a void* is returned, so the user must remember what types are
stored in the container and cast the pointers when fetching them out (a problem
that will be repaired in future chapters).
Here are the member function
definitions:
//: C13:PStash.cpp {O} // Pointer Stash definitions #include "PStash.h" #include "../require.h" #include <iostream> #include <cstring> // 'mem' functions using namespace std; int PStash::add(void* element) { const int inflateSize = 10; if(next >= quantity) inflate(inflateSize); storage[next++] = element; return(next - 1); // Index number } // No ownership: PStash::~PStash() { for(int i = 0; i < next; i++) require(storage[i] == 0, "PStash not cleaned up"); delete []storage; } // Operator overloading replacement for fetch void* PStash::operator[](int index) const { require(index >= 0, "PStash::operator[] index negative"); if(index >= next) return 0; // To indicate the end // Produce pointer to desired element: return storage[index]; } void* PStash::remove(int index) { void* v = operator[](index); // "Remove" the pointer: if(v != 0) storage[index] = 0; return v; } void PStash::inflate(int increase) { const int psz = sizeof(void*); void** st = new void*[quantity + increase]; memset(st, 0, (quantity + increase) * psz); memcpy(st, storage, quantity * psz); quantity += increase; delete []storage; // Old storage storage = st; // Point to new memory } ///:~
The add( ) function is
effectively the same as before, except that a pointer is stored instead of a
copy of the whole object.
The inflate( ) code is
modified to handle the allocation of an array of void* instead of the
previous design, which was only working with raw bytes. Here, instead of using
the prior approach of copying by array indexing, the Standard C library function
memset( ) is first used to set all the new
memory to zero (this is not strictly necessary, since the PStash is
presumably managing all the memory correctly – but it usually
doesn’t hurt to throw in a bit of extra care). Then
memcpy( ) moves the existing data from the
old location to the new. Often, functions like memset( ) and
memcpy( ) have been optimized over time, so they may be faster than
the loops shown previously. But with a function like inflate( ) that
will probably not be used that often you may not see a performance difference.
However, the fact that the function calls are more concise than the loops may
help prevent coding errors.
To put the responsibility of object
cleanup squarely on the shoulders of the client programmer, there are two ways
to access the pointers in the PStash: the operator[], which simply
returns the pointer but leaves it as a member of the container, and a second
member function remove( ), which returns the pointer but also
removes it from the container by assigning that position to zero. When the
destructor for PStash is called, it checks to make sure that all object
pointers have been removed; if not, you’re notified so you can prevent a
memory leak (more elegant solutions will be forthcoming in later
chapters).
Here’s the old test program for
Stash rewritten for the PStash:
//: C13:PStashTest.cpp //{L} PStash // Test of pointer Stash #include "PStash.h" #include "../require.h" #include <iostream> #include <fstream> #include <string> using namespace std; int main() { PStash intStash; // 'new' works with built-in types, too. Note // the "pseudo-constructor" syntax: for(int i = 0; i < 25; i++) intStash.add(new int(i)); for(int j = 0; j < intStash.count(); j++) cout << "intStash[" << j << "] = " << *(int*)intStash[j] << endl; // Clean up: for(int k = 0; k < intStash.count(); k++) delete intStash.remove(k); ifstream in ("PStashTest.cpp"); assure(in, "PStashTest.cpp"); PStash stringStash; string line; while(getline(in, line)) stringStash.add(new string(line)); // Print out the strings: for(int u = 0; stringStash[u]; u++) cout << "stringStash[" << u << "] = " << *(string*)stringStash[u] << endl; // Clean up: for(int v = 0; v < stringStash.count(); v++) delete (string*)stringStash.remove(v); } ///:~
As before, Stashes are created and
filled with information, but this time the information is the pointers resulting
from new-expressions. In the first case, note the line:
intStash.add(new int(i));
The expression new int(i) uses the
pseudo-constructor form, so
storage for a new int object is created on the heap, and the int
is initialized to the value i.
During printing, the value returned by
PStash::operator[ ] must be cast to the proper type; this is repeated for
the rest of the PStash objects in the program. It’s an undesirable
effect of using void pointers
as the underlying representation
and will be fixed in later chapters.
The second test opens the source code
file and reads it one line at a time into another PStash. Each line is
read into a string using
getline( ), then a new string
is created from line to make an independent copy of that line. If we just
passed in the address of line each time, we’d get a whole bunch of
pointers pointing to line, which would only contain the last line that
was read from the file.
When fetching the pointers, you see the
expression:
*(string*)stringStash[v]
The pointer returned from operator[
] must be cast to a string* to give it the proper type. Then the
string* is dereferenced so the expression evaluates to an object, at
which point the compiler sees a string object to send to
cout.
The objects created on the heap must be
destroyed through the use of the remove( ) statement or else
you’ll get a message at runtime telling you that you haven’t
completely cleaned up the objects in the PStash. Notice that in
the case of the int pointers, no cast is necessary because there’s
no destructor for an int and all we need is memory
release:
delete intStash.remove(k);
However, for the string pointers,
if you forget to do the cast you’ll have another (quiet) memory leak, so
the cast is essential:
delete (string*)stringStash.remove(k);
Some of these issues (but not all) can be
removed using templates (which you’ll learn about in Chapter
16).
In C++, you can create arrays of objects
on the stack or on the heap with equal ease, and (of course) the constructor is
called for each object in the array. There’s one constraint, however:
There must be a default
constructor, except for
aggregate initialization on the stack (see Chapter 6), because a constructor
with no arguments must be called for every object.
When creating arrays of objects on the
heap using new, there’s something else you must do. An example of
such an array is
MyType* fp = new MyType[100];
This allocates enough storage on the heap
for 100 MyType objects and calls the constructor for each one. Now,
however, you simply have a MyType*, which is exactly the same as
you’d get if you said
MyType* fp2 = new MyType;
to create a single object. Because you
wrote the code, you know that fp is actually the starting address of an
array, so it makes sense to select array elements using an expression like
fp[3]. But what happens when you destroy the array? The
statements
delete fp2; // OK delete fp; // Not the desired effect
look exactly the same, and their effect
will be the same. The destructor will be called for the MyType object
pointed to by the given address, and then the storage will be released. For
fp2 this is fine, but for fp this means that the other 99
destructor calls won’t be made. The proper amount of storage will still be
released, however, because it is allocated in one big chunk, and the size of the
whole chunk is stashed somewhere by the allocation routine.
The solution requires you to give the
compiler the information that this is actually the starting address of an array.
This is accomplished with the following syntax:
delete []fp;
The empty brackets tell the compiler to
generate code that fetches the number of objects in the array, stored somewhere
when the array is created, and calls the destructor for that many array objects.
This is actually an improved syntax from the earlier form, which you may still
occasionally see in old code:
delete [100]fp;
which forced the programmer to include
the number of objects in the array and introduced the possibility that the
programmer would get it wrong. The additional overhead of letting the compiler
handle it was very low, and it was considered better to specify the number of
objects in one place instead of
two.
As an aside, the fp defined above
can be changed to point to anything, which doesn’t make sense for the
starting address of an array. It makes more sense to define it as a constant, so
any attempt to modify the pointer will be flagged as an error. To get this
effect, you might try
int const* q = new int[10];
or
const int* q = new int[10];
but in both cases the const will
bind to the int, that is, what is being pointed to, rather than
the quality of the pointer itself. Instead, you must say
int* const q = new int[10];
Now the array elements in q can be
modified, but any change to q (like q++) is illegal, as it is with
an ordinary array
identifier.
What happens when the operator
new cannot find a contiguous
block of storage large enough to hold the desired object? A special function
called the new-handler is called. Or rather, a
pointer to a function is checked, and if the pointer is nonzero, then the
function it points to is called.
The default behavior for the new-handler
is to throw an exception, a subject covered in
Volume 2. However, if you’re using heap allocation in your program,
it’s wise to at least replace the new-handler with a message that says
you’ve run out of memory and then aborts the program. That way, during
debugging, you’ll have a clue about what happened. For the final program
you’ll want to use more robust recovery.
You replace the new-handler by including
new.h and then calling set_new_handler( ) with the address of
the function you want installed:
//: C13:NewHandler.cpp // Changing the new-handler #include <iostream> #include <cstdlib> #include <new> using namespace std; int count = 0; void out_of_memory() { cerr << "memory exhausted after " << count << " allocations!" << endl; exit(1); } int main() { set_new_handler(out_of_memory); while(1) { count++; new int[1000]; // Exhausts memory } } ///:~
The new-handler function must take no
arguments and have a void return value. The while loop will keep
allocating int objects (and throwing away their return addresses) until
the free store is exhausted. At the very next call to new, no storage can
be allocated, so the new-handler will be called.
The behavior of the new-handler is tied
to operator new, so if you overload operator new (covered in the
next section) the new-handler will not be called by default. If you still want
the new-handler to be called you’ll have to write the code to do so inside
your overloaded operator new.
Of course, you can write more
sophisticated new-handlers, even one to try to reclaim memory (commonly known as
a garbage collector). This is not a job for the
novice
programmer.
When you create a
new-expression, two things occur. First, storage is
allocated using the operator new, then the constructor is called. In a
delete-expression, the destructor is called, then
storage is deallocated using the operator delete. The constructor and
destructor calls are never under your control (otherwise you might accidentally
subvert them), but you can change the storage allocation functions
operator new and operator delete.
The memory allocation
system used by new and
delete is designed for general-purpose use. In special situations,
however, it doesn’t serve your needs. The most common reason to change the
allocator is efficiency: You might be creating and
destroying so many objects of a particular class that it has become a speed
bottleneck. C++ allows you to overload new and delete to implement
your own storage allocation scheme, so you can handle problems like
this.
Another issue is
heap fragmentation. By
allocating objects of different sizes it’s possible to break up the heap
so that you effectively run out of storage. That is, the storage might be
available, but because of fragmentation no piece is big enough to satisfy your
needs. By creating your own allocator for a particular class, you can ensure
this never happens.
In embedded and real-time systems, a
program may have to run for a very long time with restricted resources. Such a
system may also require that memory allocation always take the same amount of
time, and there’s no allowance for heap exhaustion or fragmentation. A
custom memory allocator is the solution; otherwise, programmers will avoid using
new and delete altogether in such cases and miss out on a valuable
C++ asset.
When you overload operator new and
operator delete, it’s important to remember that you’re
changing only the way raw storage is allocated. The compiler will simply
call your new instead of the default version to allocate storage, then
call the constructor for that storage. So, although the compiler allocates
storage and calls the constructor when it sees new, all you can
change when you overload new is the storage allocation portion.
(delete has a similar limitation.)
When you overload operator
new, you also replace the behavior when it runs out of memory, so you
must decide what to do in your operator new: return zero, write a loop to
call the new-handler and retry allocation, or (typically) throw a
bad_alloc exception (discussed in Volume 2, available at
www.BruceEckel.com).
Overloading new and delete
is like overloading any other operator. However, you have a choice of
overloading the global allocator or using a different allocator for a particular
class.
This is the drastic approach, when the
global versions of new and delete are unsatisfactory for the whole
system. If you overload the global versions, you make the defaults completely
inaccessible – you can’t even call them from inside your
redefinitions.
The overloaded new must take an
argument of size_t (the Standard C standard type
for sizes). This argument is generated and passed to you by the compiler and is
the size of the object you’re responsible for allocating. You must return
a pointer either to an object of that size (or bigger, if you have some reason
to do so), or to zero if you can’t find the memory (in which case the
constructor is not called!). However, if you can’t find the memory,
you should probably do something more informative than just returning zero, like
calling the new-handler or throwing an exception, to signal that there’s a
problem.
The return value of operator new
is a void*, not a pointer to any particular type. All you’ve
done is produce memory, not a finished object – that doesn’t happen
until the constructor is called, an act the compiler guarantees and which is out
of your control.
The operator delete takes a
void* to memory that was allocated by operator new. It’s a
void* because operator delete only gets the pointer after
the destructor is called, which removes the object-ness from the piece of
storage. The return type is void.
Here’s a simple example showing how
to overload the global new and delete:
//: C13:GlobalOperatorNew.cpp // Overload global new/delete #include <cstdio> #include <cstdlib> using namespace std; void* operator new(size_t sz) { printf("operator new: %d Bytes\n", sz); void* m = malloc(sz); if(!m) puts("out of memory"); return m; } void operator delete(void* m) { puts("operator delete"); free(m); } class S { int i[100]; public: S() { puts("S::S()"); } ~S() { puts("S::~S()"); } }; int main() { puts("creating & destroying an int"); int* p = new int(47); delete p; puts("creating & destroying an s"); S* s = new S; delete s; puts("creating & destroying S[3]"); S* sa = new S[3]; delete []sa; } ///:~
Here you can see the general form for
overloading new and delete. These use the Standard C library
functions malloc( ) and
free( ) for the allocators (which is
probably what the default new and delete use as well!). However,
they also print messages about what they are doing. Notice that
printf( ) and
puts( ) are used rather than
iostreams. This is because when an iostream
object is created (like the global cin,
cout, and cerr), it calls new to allocate memory. With
printf( ), you don’t get into a deadlock because it
doesn’t call new to initialize itself.
In main( ), objects of
built-in types are created to prove that the overloaded new and
delete are also called in that case. Then a single object of type
S is created, followed by an array of S. For the array,
you’ll see from the number of bytes requested that extra memory is
allocated to store information (inside the array) about the number of objects it
holds. In all cases, the global overloaded versions of new and
delete are
used.
Although you don’t have to
explicitly say static, when you overload new and delete for
a class, you’re creating static member functions. As before, the
syntax is the same as overloading any other operator. When the compiler sees you
use new to create an object of your class, it chooses the member
operator new over the global version. However, the global versions of
new and delete are used for all other types of objects (unless
they have their own new and delete).
In the following example, a primitive
storage allocation system
is
created for the class Framis. A chunk of memory is set aside in the
static data area at program start-up, and that memory is used to allocate space
for objects of type Framis. To determine which blocks have been
allocated, a simple array of bytes is used, one byte for each
block:
//: C13:Framis.cpp // Local overloaded new & delete #include <cstddef> // Size_t #include <fstream> #include <iostream> #include <new> using namespace std; ofstream out("Framis.out"); class Framis { enum { sz = 10 }; char c[sz]; // To take up space, not used static unsigned char pool[]; static bool alloc_map[]; public: enum { psize = 100 }; // frami allowed Framis() { out << "Framis()\n"; } ~Framis() { out << "~Framis() ... "; } void* operator new(size_t) throw(bad_alloc); void operator delete(void*); }; unsigned char Framis::pool[psize * sizeof(Framis)]; bool Framis::alloc_map[psize] = {false}; // Size is ignored -- assume a Framis object void* Framis::operator new(size_t) throw(bad_alloc) { for(int i = 0; i < psize; i++) if(!alloc_map[i]) { out << "using block " << i << " ... "; alloc_map[i] = true; // Mark it used return pool + (i * sizeof(Framis)); } out << "out of memory" << endl; throw bad_alloc(); } void Framis::operator delete(void* m) { if(!m) return; // Check for null pointer // Assume it was created in the pool // Calculate which block number it is: unsigned long block = (unsigned long)m - (unsigned long)pool; block /= sizeof(Framis); out << "freeing block " << block << endl; // Mark it free: alloc_map[block] = false; } int main() { Framis* f[Framis::psize]; try { for(int i = 0; i < Framis::psize; i++) f[i] = new Framis; new Framis; // Out of memory } catch(bad_alloc) { cerr << "Out of memory!" << endl; } delete f[10]; f[10] = 0; // Use released memory: Framis* x = new Framis; delete x; for(int j = 0; j < Framis::psize; j++) delete f[j]; // Delete f[10] OK } ///:~
The pool of memory for the Framis
heap is created by allocating an array of bytes large enough to hold
psize Framis objects. The allocation map is psize elements
long, so there’s one bool for every block. All the values in the
allocation map are initialized to false using the aggregate
initialization trick of setting the first element so the compiler automatically
initializes all the rest to their normal default value (which is false,
in the case of bool).
The local operator new has the
same syntax as the global one. All it does is search through the allocation map
looking for a false value, then sets that location to true to
indicate it’s been allocated and returns the address of the corresponding
memory block. If it can’t find any memory, it issues a message to the
trace file and throws a bad_alloc exception.
This is the first example of
exceptions that you’ve seen in this book. Since
detailed discussion of exceptions is delayed until Volume 2, this is a very
simple use of them. In operator new there are two artifacts of exception
handling. First, the function argument list is followed by
throw(bad_alloc), which
tells the compiler and the reader that this function may throw an exception of
type bad_alloc. Second, if there’s no more
memory the function actually does throw the exception in the statement throw
bad_alloc. When an exception is thrown, the function stops executing and
control is passed to an exception handler, which is expressed as a
catch clause.
In main( ), you see the other
part of the picture, which is the try-catch clause. The
try block is surrounded
by braces and contains all the code that may throw exceptions – in this
case, any call to new that involves Framis objects. Immediately
following the try block is one or more
catch clauses, each one
specifying the type of exception that they catch. In this case,
catch(bad_alloc) says that that bad_alloc exceptions will be
caught here. This particular catch clause is only executed when a
bad_alloc exception is thrown, and execution continues after the end of
the last catch clause in the group (there’s only one here, but
there could be more).
The operator delete assumes the
Framis address was created in the pool. This is a fair assumption,
because the local operator new will be called whenever you create a
single Framis object on the heap – but not an array of them: global
new is used for arrays. So the user might accidentally have called
operator delete without using the empty bracket syntax to indicate array
destruction. This would cause a problem. Also, the user might be deleting a
pointer to an object created on the stack. If you think these things could
occur, you might want to add a line to make sure the address is within the pool
and on a correct boundary (you may also begin to see the potential of
overloaded new and delete for finding
memory leaks).
operator delete calculates the
block in the pool that this pointer represents, and then sets the allocation
map’s flag for that block to false to indicate the block has been
released.
In main( ), enough
Framis objects are dynamically allocated to run out of memory; this
checks the out-of-memory behavior. Then one of the objects is freed, and another
one is created to show that the released memory is reused.
Because this allocation scheme is
specific to Framis objects, it’s probably much faster than the
general-purpose memory allocation scheme used for the default new and
delete. However, you should note that it doesn’t automatically work
if inheritance is used (inheritance is covered in Chapter
14).
If you overload operator new and
delete for a class, those operators are called whenever you create an
object of that class. However, if you create an array of those class
objects, the global operator new is called to allocate enough
storage for the array all at once, and the global operator delete
is called to release that storage. You can control the allocation of arrays of
objects by overloading the special array versions of operator new[ ] and
operator delete[ ] for the class. Here’s an example that shows when
the two different versions are called:
//: C13:ArrayOperatorNew.cpp // Operator new for arrays #include <new> // Size_t definition #include <fstream> using namespace std; ofstream trace("ArrayOperatorNew.out"); class Widget { enum { sz = 10 }; int i[sz]; public: Widget() { trace << "*"; } ~Widget() { trace << "~"; } void* operator new(size_t sz) { trace << "Widget::new: " << sz << " bytes" << endl; return ::new char[sz]; } void operator delete(void* p) { trace << "Widget::delete" << endl; ::delete []p; } void* operator new[](size_t sz) { trace << "Widget::new[]: " << sz << " bytes" << endl; return ::new char[sz]; } void operator delete[](void* p) { trace << "Widget::delete[]" << endl; ::delete []p; } }; int main() { trace << "new Widget" << endl; Widget* w = new Widget; trace << "\ndelete Widget" << endl; delete w; trace << "\nnew Widget[25]" << endl; Widget* wa = new Widget[25]; trace << "\ndelete []Widget" << endl; delete []wa; } ///:~
Here, the global versions of new
and delete are called so the effect is the same as having no overloaded
versions of new and delete except that trace information is added.
Of course, you can use any memory allocation scheme you want in the overloaded
new and delete.
You can see that the syntax of array
new and delete is the same as for the individual object versions
except for the addition of the brackets. In both cases you’re handed the
size of the memory you must allocate. The size handed to the array version will
be the size of the entire array. It’s worth keeping in mind that the
only thing the overloaded operator new is required to do is hand
back a pointer to a large enough memory block. Although you may perform
initialization on that memory, normally that’s the job of the constructor
that will automatically be called for your memory by the
compiler.
The constructor and destructor simply
print out characters so you can see when they’ve been called. Here’s
what the trace file looks like for one compiler:
new Widget Widget::new: 40 bytes * delete Widget ~Widget::delete new Widget[25] Widget::new[]: 1004 bytes ************************* delete []Widget ~~~~~~~~~~~~~~~~~~~~~~~~~Widget::delete[]
Creating an individual object requires 40
bytes, as you might expect. (This machine uses four bytes for an int).
The operator new is called, then the constructor (indicated by the
*). In a complementary fashion, calling delete causes the
destructor to be called, then the operator delete.
When an array of Widget objects is
created, the array version of operator new is used, as promised. But
notice that the size requested is four more bytes than expected. This extra four
bytes is where the system keeps information about the array, in particular, the
number of objects in the array. That way, when you say
delete []Widget;
the brackets tell the compiler it’s
an array of objects, so the compiler generates code to look for the number of
objects in the array and to call the destructor that many times. You can see
that, even though the array operator new and operator delete are
only called once for the entire array chunk, the default constructor and
destructor are called for each object in the
array.
Considering that
MyType* f = new MyType;
calls new to allocate a
MyType-sized piece of storage, then invokes the MyType constructor
on that storage, what happens if the storage allocation in new fails? The
constructor is not called in
that case, so although you still have an unsuccessfully created object, at least
you haven’t invoked the constructor and handed it a zero this
pointer. Here’s an example to prove it:
//: C13:NoMemory.cpp // Constructor isn't called if new fails #include <iostream> #include <new> // bad_alloc definition using namespace std; class NoMemory { public: NoMemory() { cout << "NoMemory::NoMemory()" << endl; } void* operator new(size_t sz) throw(bad_alloc){ cout << "NoMemory::operator new" << endl; throw bad_alloc(); // "Out of memory" } }; int main() { NoMemory* nm = 0; try { nm = new NoMemory; } catch(bad_alloc) { cerr << "Out of memory exception" << endl; } cout << "nm = " << nm << endl; } ///:~
When the program runs, it does not print
the constructor message, only the message from operator new and the
message in the exception handler. Because new never returns, the
constructor is never called so its message is not printed.
It’s important that nm be
initialized to zero because the new expression never completes, and the
pointer should be zero to make sure you don’t misuse it. However, you
should actually do more in the exception handler than just print out a message
and continue on as if the object had been successfully created. Ideally, you
will do something that will cause the program to recover from the problem, or at
the least exit after logging an error.
In earlier versions of C++ it was
standard practice to return zero from new if storage allocation failed.
That would prevent construction from occurring. However, if you try to return
zero from new with a Standard-conforming compiler, it should tell you
that you ought to throw bad_alloc
instead.
Both of these
situations are solved with the same mechanism: The overloaded operator
new can take more than one argument. As you’ve seen before, the first
argument is always the size of the object, which is secretly calculated and
passed by the compiler. But the other arguments can be anything you want –
the address you want the object placed at, a reference to a memory allocation
function or object, or anything else that is convenient for
you.
The way that you pass the extra arguments
to operator new during a call may seem slightly curious at first. You put
the argument list (without the size_t argument, which is handled
by the compiler) after the keyword new and before the class name of the
object you’re creating. For example,
X* xp = new(a) X;
will pass a as the second argument
to operator new. Of course, this can work only if such an operator
new has been declared.
Here’s an example showing how you
can place an object at a particular location:
//: C13:PlacementOperatorNew.cpp // Placement with operator new #include <cstddef> // Size_t #include <iostream> using namespace std; class X { int i; public: X(int ii = 0) : i(ii) { cout << "this = " << this << endl; } ~X() { cout << "X::~X(): " << this << endl; } void* operator new(size_t, void* loc) { return loc; } }; int main() { int l[10]; cout << "l = " << l << endl; X* xp = new(l) X(47); // X at location l xp->X::~X(); // Explicit destructor call // ONLY use with placement! } ///:~
Notice that operator new only
returns the pointer that’s passed to it. Thus, the caller decides where
the object is going to sit, and the constructor is called for that memory as
part of the new-expression.
Although this example shows only one
additional argument, there’s nothing to prevent you from adding more if
you need them for other purposes.
A dilemma occurs when you want to destroy
the object. There’s only one version of operator delete, so
there’s no way to say, “Use my special deallocator for this
object.” You want to call the destructor, but you don’t want the
memory to be released by the dynamic memory mechanism because it wasn’t
allocated on the heap.
The answer is a very special syntax. You
can explicitly call the destructor, as in
xp->X::~X(); // Explicit destructor call
A stern warning
is in order here. Some people see this as a way to destroy objects at some time
before the end of the scope, rather than either adjusting the scope or (more
correctly) using dynamic object creation if they want the object’s
lifetime to be determined at runtime. You will have serious problems if you call
the destructor this way for an ordinary object created on the stack because the
destructor will be called again at the end of the scope. If you call the
destructor this way for an object that was created on the heap, the destructor
will execute, but the memory won’t be released, which probably isn’t
what you want. The only reason that the destructor can be called explicitly this
way is to support the placement syntax for operator new.
There’s also a placement
operator delete that is only called if a constructor for a placement
new expression throws an exception (so that the memory is automatically
cleaned up during the exception). The placement operator delete has an
argument list that corresponds to the placement operator new that is
called before the constructor throws the exception. This topic will be explored
in the exception handling chapter in Volume
2.
It’s convenient and optimally
efficient to create automatic objects on the stack, but to solve the general
programming problem you must be able to create and destroy objects at any time
during a program’s execution, particularly to respond to information from
outside the program. Although C’s dynamic memory allocation will get
storage from the heap, it doesn’t provide the ease of use and guaranteed
construction necessary in C++. By bringing dynamic object creation into the core
of the language with new and delete, you can create objects on the
heap as easily as making them on the stack. In addition, you get a great deal of
flexibility. You can change the behavior of new and delete if they
don’t suit your needs, particularly if they aren’t efficient enough.
Also, you can modify what happens when the heap runs out of
storage.
Solutions to selected exercises
can be found in the electronic document The Thinking in C++ Annotated
Solution Guide, available for a small fee from
www.BruceEckel.com.
[50]
There is a special syntax called placement new that allows you to call a
constructor for a pre-allocated piece of memory. This is introduced later in the
chapter.