This section will give a *brief* introduction to troubleshooting the cause
of crashes on your mud, with much of the emphasis on gdb. It does not
pretend to be a complete tutorial. For that, read the gdb documentation,
or for another quick-start guide, see the one on Whiplash's page, address
in section 8.4 of this document.
There are very many reasons your mud crashed, and if you plan to run your
mud seriously, you really need to learn how to find out the cause of the
problem, and then address it. Common sense is a very powerful weapon in
this hunt, if, for example, your mud was running fine then started
crashing often immediately after you added some new code, this new code
should be your first suspect as to the cause of the crash. Even if the
code looks right and compiles error free, it may be overrunning an array,
trying to dereference a non-initialised pointer, or all manner of fun
stuff. It may also be reacting in unexpected ways with existing code.
This point is one that you won't be able to discern immediately, the only
way to evaluate this is familiarity with the general ROM code which can
only come with time.
Crashes can also be caused by areas when the mud boots, due to
unterminated strings, wrongly placed values, attempts to reset a
non-existent object amongst others. Generally this type of problem is
easier to find, as it is most often the case that the mud detects the
problem, writes a message to the log file and then quits running.
The importance of frequently monitoring your log files cannot be
overstated. If your mud crashes, the first thing you should do is look at
the end of the most recent file.. or if the mud has restarted, the
penultimate file. Remember that the log files are numbered sequentially
from 1000.log upwards. A very useful tool here is the unix 'tail'
command, which displays the last page of a file. So, if after a crash you
turn to your log directory to see the files:
1000.log 1001.log 1002.log 1003.log
and you know the mud has restarted, typing
tail 1002.log
will show you the last things the mud logged before it crashed.
Look at the end of the most recent file too, just to make sure you are
looking at the correct file.
There are a number of errors that cause the mud to terminate, and if one
of these has happened you should get the bug message and have a good clue
here how to fix the problem. The next section talks of some commonly seen
log messages, what they mean and what to do about them.
This error will be seen at boot-time, and will cause the mud to abort and
try again. The problem is that you now have too many areas for the mud to
hold in the space it has allocated for them. To fix this, you need change
the value of MAX_STRING, found in db.c, to something bigger than it
currently is. Incrementing in steps of 500,000 or so seems to be a
frequently chosen step.
If this is the last entry in a logfile and the mud crashed after this,
this is a good sign that the playerfile in question has been corrupted.
You should look at the file with a text editor, but most often you will
need to delete the pfile and restore from a backup. You do make regular
player file backups don't you? :)
Again found only at boot-time, this error is indicative of errors in an
area file. You should look carefully at the file, both at the line given
by the bug message but also at the preceeding chunk..since the mud tries
to interpret the file as it reads it, an incorrect value can mean that the
mud staggers on for a while, wrongly interpreting some values until
finally it gets to something it can't meaningfully resolve. Compare the
file with another 'good' file for comparison, and try to get to learn
generally how each section should look.
This boot-time error says that a vnum was given for a room, mob or object
that doesn't exist. This is most often found in the resets section of an
area file. Edit the file, and if the error is obvious, eg 1001 instead of
101, change it, or if you can't work out what value should be there, you
may have to delete the reset. Note that you need to be careful here..
deleting a mob reset 'M' means you also have to delete all the 'E' and 'G'
resets following it, and likewise deleting a 'O' reset means any 'P'
resets following it will need removed also. It is therefore very much in
your best interests to resolve the problem before going butchering with
your text editor.
This error can arise when you have inadvertently ran your startup script
twice. The first should boot the mud properly, but the second will fail
with this message, and will continuously try to restart the mud, quickly
resulting in lots and lots of such files. This problem gets particularly
ugly if the mud is failing to boot, as you then get both startup's in an
endless cycle of either failing to boot due to some other error or because
of the other mud trying to startup. To save your sanity, you need to kill
at least one of the startup's, and then try diagnosing any other problem
from there.
On a unix system, type
ps -ux (or ps-uYOURUSERNAME on SunOS)
to get a list of your running processes. Kill the startup that has the
pid furthest from that of your rom process, if you can discern this. For
clarity you may be safer killing both, then shutdown from within the mud,
then restarting a single startup.
killall -9 startup
should achieve the suitable effect here. But don't try that on Digital
Unix... it interprets 'killall' quite literally.
Another possibility here is that the port you are trying to run the mud on
is being used by a system service or another user. If this seems to be
the case, consult your system administrator.
This error is logged when the mud sees a mob being reset with a piece of
equipment that is significantly higher in level than that of the mob.
This won't stop the mud from booting, but it is perhaps to look closely at
the mobs and objects in question, as potentially powerful objects are
being made easy to get by popping on weak mobs.
This error can be flagged during boot for a mob, room or obj that is found
to have a vnum that has already been assigned to another entity of that type.
You can have a room with vnum 1300, an obj with vnum 1300 and a mob also with
vnum 1300 with no problems, but two mobs with 1300 as their vnum is a
problem. There are two most frequent causes for this error.
Firstly, you may simply have two areas with clashing vnums. This can
happen if you download a bunch of areas from online sites written by
different people, as there is nothing stopping them from using the same
vnums for some of their areas. To cure this problem, either remove one of
the areas or renumber it, a laborious task if you don't have some way of
automating it.
The second most common cause for this error is
when you accidentally include the same area twice in your area.lst file.
This can happen if you rename an area for example. The solution obviously
is simply to remove the duplicate from the area.lst
Many thanks to Erwin Andreasen
for his contribution to the GDB related questions.
GDB is the GNU debugger, a tool you can use to help determine code
problems. It is very powerful with many options, but is initially
daunting. Time spent getting familiar with GDB may well be the most well
spent time for a mud admin. It is usually installed on any system running
GCC, but if not can be obtained from: http://www.gnu.org
Note that the gdb shipped with slackware 96(*** 95?) was broken..if you
run this OS and gdb doesn't work you need to get a new one. The error
that indicates this is akin to 'cannot fetch registers.. wrong format'.
In the immortal words - 'GDB is your friend.'. It can be useful in very
many ways, to determine why the mud crashed, to step through code as it
runs to track variable values and basically do all any other fully
featured debugger can do. If your mud is crashing all the time, your
options are either to spend many hours looking at any code you suspect may
be wrong, or fire up gdb and quite possibly see straight away where the
problem is. It won't do this all the time, but it is essential for those
occasions when you are stumped. Conversely though, the best way to avoid
bugs is to write your code more carefully, so it can be a compromise
between using gdb when necessary but not relying on it to clean up your
mess.
Most definitely. Using GDB this way is probably the most successful of
it's uses for many people. If you have a core file in your area
directory, switch to that directory and type:
gdb -c core ../src/rom
GDb will start, show some copyright information then show the last
commands executed by the mud.
Yes, simply go to the area directory, and type:
gdb ../src/rom <pid>
Where <pid> is the process id of the running mud.
.... the program will then run until it hits a breakpoint, it gets some
signal like segmentation fault or until you press control-C.
Once you are in the debugger again, the MUD is still running - gdb has
just suspended the process and waits for your input. You can type CONT to
continue, or you can examine variables.
You can also step through the code line by line: using the S (step)
command will execute one line, entering the function if there is one.
Using N will also execute one line, but if there is a function call, it
will be skipped.
If you are in a function, but really are interested in what happens when
you return from it, you can use "finish" to finish executing this function
and return to the calling function.
Switch to your area directory, and type
gdb ../src/rom
Then type
set args <port>
where <port> is your port number the mud runs on. Then:
run
Alternatively,
run <port> has the same effect as both of these steps.
This starts the mud from within the debugger.
Note that ctrl-z will not suspend gdb in this state, rather it will pause
the mud.
A breakpoint is a marker you can put on a function, or a specific line of
code. When the mud arrives at that line or starts to execute that
function, it halts and gdb gives you a prompt, allowing you to examine
variable values, step through the code and so forth. This can be very
useful to follow through code that you want to check carefully.
To add a breakpoint to a function, type:
break <fn_name> for example
break do_say
This will place a breakpoint at the beginning of the do_say function, and
halt the mud when it reaches that part of the code. Multiple breakpoints
can be in place at once.
Type
info break
to list all breakpoints, and use the delete command to remove them, for
example
delete 1
Will delete the first break point.
The initial display gdb gives you should be the last few entries on the
stack, which will each be at the place in the code where the next stack
frame was called. For example, if you type "vnum mob fido" in the mud,
this will call the do_vnum function, which in turn calls do_mfind, which
in turn will call get_mob_index. If the mud crashes within get_mob_index,
then your gdb display will have the line where the error occured in this
function at the top of the stack display, the next line will be within
do_mfind and the one below that will be In do_vnum.
It is possible to look at each of these stack 'frames' in turn, examining
variable values and looking at what exactly the code was doing at that
point.
Typing 'bt' at any time will show the stack display, and you can move up
and down a frame at a time using the 'frame' command. The 'where' and
'list' commands are also extremely useful.
Sometimes you can get some unreadable stack traces. This is usually caused
by a stack overflow :
If you notice that the addresses of the functions seem to have each byte
within 0x20 - 0x80, you might be able to track down where this happens.
E.g. if you have gdb claiming that the top function has address
0x666f6f74
you can print out each byte as a character:
(gdb) p /c 0x66
and find that the address seem to contain letters "foot" - perhaps you
just added some foot-sweat code that is overflowing a buffer.
When in a stack frame, you can see what local variables are in scope by
typing 'info locals'. This will show the variables and their type, and
their value/memory address. You can use the 'print' command to examine
these in more detail. For example, to see the name of the CHAR_DATA named
ch, you would do:
print ch->name or p ch->name
You can dereference pointers in this way just as you would do in code, and
thus can look at all items in a list among other things.
Once you have printed a variable, you can then refer to it with the $
symbol, which can be very useful for traversing a linked list among other
things. Example :
p *list_head
p *$->next
p *$->next
This is obviously much easier than :
p *list_head->next->next->next
Note that using '*' dereferences a pointer, thus while
p ch
Will show the address of the pointer,
p *ch
Will show the value of the structure it points to.
Note that you can recall and edit previous commands using the cursor keys.
Main Rom Page Index 1 2 3 4 5 6 7 8