This section will give a *brief* introduction to troubleshooting the cause of crashes on your mud, with much of the emphasis on gdb. It does not pretend to be a complete tutorial. For that, read the gdb documentation, or for another quick-start guide, see the one on Whiplash's page, address in section 8.4 of this document.

5.1 My mud crashed..why?

There are very many reasons your mud crashed, and if you plan to run your mud seriously, you really need to learn how to find out the cause of the problem, and then address it. Common sense is a very powerful weapon in this hunt, if, for example, your mud was running fine then started crashing often immediately after you added some new code, this new code should be your first suspect as to the cause of the crash. Even if the code looks right and compiles error free, it may be overrunning an array, trying to dereference a non-initialised pointer, or all manner of fun stuff. It may also be reacting in unexpected ways with existing code. This point is one that you won't be able to discern immediately, the only way to evaluate this is familiarity with the general ROM code which can only come with time.

Crashes can also be caused by areas when the mud boots, due to unterminated strings, wrongly placed values, attempts to reset a non-existent object amongst others. Generally this type of problem is easier to find, as it is most often the case that the mud detects the problem, writes a message to the log file and then quits running.

5.2 Working with log files.

The importance of frequently monitoring your log files cannot be overstated. If your mud crashes, the first thing you should do is look at the end of the most recent file.. or if the mud has restarted, the penultimate file. Remember that the log files are numbered sequentially from 1000.log upwards. A very useful tool here is the unix 'tail' command, which displays the last page of a file. So, if after a crash you turn to your log directory to see the files:
1000.log 1001.log 1002.log 1003.log
and you know the mud has restarted, typing
tail 1002.log
will show you the last things the mud logged before it crashed. Look at the end of the most recent file too, just to make sure you are looking at the correct file.

There are a number of errors that cause the mud to terminate, and if one of these has happened you should get the bug message and have a good clue here how to fix the problem. The next section talks of some commonly seen log messages, what they mean and what to do about them.

5.3 Common log file messages


5.3.1 'File *.are line *: MAX_STRING * exceeded'


This error will be seen at boot-time, and will cause the mud to abort and try again. The problem is that you now have too many areas for the mud to hold in the space it has allocated for them. To fix this, you need change the value of MAX_STRING, found in db.c, to something bigger than it currently is. Incrementing in steps of 500,000 or so seems to be a frequently chosen step.

5.3.2 'Loading playername'


If this is the last entry in a logfile and the mud crashed after this, this is a good sign that the playerfile in question has been corrupted. You should look at the file with a text editor, but most often you will need to delete the pfile and restore from a backup. You do make regular player file backups don't you? :)

5.3.3 'fread_*: bad format'


Again found only at boot-time, this error is indicative of errors in an area file. You should look carefully at the file, both at the line given by the bug message but also at the preceeding chunk..since the mud tries to interpret the file as it reads it, an incorrect value can mean that the mud staggers on for a while, wrongly interpreting some values until finally it gets to something it can't meaningfully resolve. Compare the file with another 'good' file for comparison, and try to get to learn generally how each section should look.

5.3.4 '* : no such *index'


This boot-time error says that a vnum was given for a room, mob or object that doesn't exist. This is most often found in the resets section of an area file. Edit the file, and if the error is obvious, eg 1001 instead of 101, change it, or if you can't work out what value should be there, you may have to delete the reset. Note that you need to be careful here.. deleting a mob reset 'M' means you also have to delete all the 'E' and 'G' resets following it, and likewise deleting a 'O' reset means any 'P' resets following it will need removed also. It is therefore very much in your best interests to resolve the problem before going butchering with your text editor.

5.3.5 'Init socket: bind: Address already in use'


This error can arise when you have inadvertently ran your startup script twice. The first should boot the mud properly, but the second will fail with this message, and will continuously try to restart the mud, quickly resulting in lots and lots of such files. This problem gets particularly ugly if the mud is failing to boot, as you then get both startup's in an endless cycle of either failing to boot due to some other error or because of the other mud trying to startup. To save your sanity, you need to kill at least one of the startup's, and then try diagnosing any other problem from there.
On a unix system, type
ps -ux (or ps-uYOURUSERNAME on SunOS)
to get a list of your running processes. Kill the startup that has the pid furthest from that of your rom process, if you can discern this. For clarity you may be safer killing both, then shutdown from within the mud, then restarting a single startup.
killall -9 startup
should achieve the suitable effect here. But don't try that on Digital Unix... it interprets 'killall' quite literally.

Another possibility here is that the port you are trying to run the mud on is being used by a system service or another user. If this seems to be the case, consult your system administrator.

5.3.6 'ER <mob> <object>'


This error is logged when the mud sees a mob being reset with a piece of equipment that is significantly higher in level than that of the mob. This won't stop the mud from booting, but it is perhaps to look closely at the mobs and objects in question, as potentially powerful objects are being made easy to get by popping on weak mobs.

5.3.7 Vnum x duplicated


This error can be flagged during boot for a mob, room or obj that is found to have a vnum that has already been assigned to another entity of that type. You can have a room with vnum 1300, an obj with vnum 1300 and a mob also with vnum 1300 with no problems, but two mobs with 1300 as their vnum is a problem. There are two most frequent causes for this error.

Firstly, you may simply have two areas with clashing vnums. This can happen if you download a bunch of areas from online sites written by different people, as there is nothing stopping them from using the same vnums for some of their areas. To cure this problem, either remove one of the areas or renumber it, a laborious task if you don't have some way of automating it.

The second most common cause for this error is when you accidentally include the same area twice in your area.lst file. This can happen if you rename an area for example. The solution obviously is simply to remove the duplicate from the area.lst

5.4 What is GDB?

Many thanks to Erwin Andreasen for his contribution to the GDB related questions.

GDB is the GNU debugger, a tool you can use to help determine code problems. It is very powerful with many options, but is initially daunting. Time spent getting familiar with GDB may well be the most well spent time for a mud admin. It is usually installed on any system running GCC, but if not can be obtained from: http://www.gnu.org
Note that the gdb shipped with slackware 96(*** 95?) was broken..if you run this OS and gdb doesn't work you need to get a new one. The error that indicates this is akin to 'cannot fetch registers.. wrong format'.

5.5 Do I need GDB?


In the immortal words - 'GDB is your friend.'. It can be useful in very many ways, to determine why the mud crashed, to step through code as it runs to track variable values and basically do all any other fully featured debugger can do. If your mud is crashing all the time, your options are either to spend many hours looking at any code you suspect may be wrong, or fire up gdb and quite possibly see straight away where the problem is. It won't do this all the time, but it is essential for those occasions when you are stumped. Conversely though, the best way to avoid bugs is to write your code more carefully, so it can be a compromise between using gdb when necessary but not relying on it to clean up your mess.

5.6 Can I use GDB on a corefile? How?


Most definitely. Using GDB this way is probably the most successful of it's uses for many people. If you have a core file in your area directory, switch to that directory and type:
gdb -c core ../src/rom
GDb will start, show some copyright information then show the last commands executed by the mud.

5.7 Can I attach GDB to a running mud? How?


Yes, simply go to the area directory, and type:
gdb ../src/rom <pid>
Where <pid> is the process id of the running mud.

.... the program will then run until it hits a breakpoint, it gets some signal like segmentation fault or until you press control-C.

Once you are in the debugger again, the MUD is still running - gdb has just suspended the process and waits for your input. You can type CONT to continue, or you can examine variables.

You can also step through the code line by line: using the S (step) command will execute one line, entering the function if there is one. Using N will also execute one line, but if there is a function call, it will be skipped.

If you are in a function, but really are interested in what happens when you return from it, you can use "finish" to finish executing this function and return to the calling function.

5.8 How do I start the mud from GDB?


Switch to your area directory, and type
gdb ../src/rom
Then type
set args <port>
where <port> is your port number the mud runs on. Then:
run
Alternatively,
run <port> has the same effect as both of these steps.
This starts the mud from within the debugger.
Note that ctrl-z will not suspend gdb in this state, rather it will pause the mud.

5.9 What are breakpoints? How do I use them?


A breakpoint is a marker you can put on a function, or a specific line of code. When the mud arrives at that line or starts to execute that function, it halts and gdb gives you a prompt, allowing you to examine variable values, step through the code and so forth. This can be very useful to follow through code that you want to check carefully.
To add a breakpoint to a function, type:
break <fn_name> for example
break do_say
This will place a breakpoint at the beginning of the do_say function, and halt the mud when it reaches that part of the code. Multiple breakpoints can be in place at once.
Type
info break
to list all breakpoints, and use the delete command to remove them, for example
delete 1
Will delete the first break point.

5.10 OK.. the mud crashed and I've lots of info..now what? What does this GDB screen mean?


5.11 Examining the stack


The initial display gdb gives you should be the last few entries on the stack, which will each be at the place in the code where the next stack frame was called. For example, if you type "vnum mob fido" in the mud, this will call the do_vnum function, which in turn calls do_mfind, which in turn will call get_mob_index. If the mud crashes within get_mob_index, then your gdb display will have the line where the error occured in this function at the top of the stack display, the next line will be within do_mfind and the one below that will be In do_vnum.
It is possible to look at each of these stack 'frames' in turn, examining variable values and looking at what exactly the code was doing at that point.
Typing 'bt' at any time will show the stack display, and you can move up and down a frame at a time using the 'frame' command. The 'where' and 'list' commands are also extremely useful.

Sometimes you can get some unreadable stack traces. This is usually caused by a stack overflow :
If you notice that the addresses of the functions seem to have each byte within 0x20 - 0x80, you might be able to track down where this happens. E.g. if you have gdb claiming that the top function has address
0x666f6f74
you can print out each byte as a character:
(gdb) p /c 0x66
and find that the address seem to contain letters "foot" - perhaps you just added some foot-sweat code that is overflowing a buffer.

5.12 Examining the value of variables


When in a stack frame, you can see what local variables are in scope by typing 'info locals'. This will show the variables and their type, and their value/memory address. You can use the 'print' command to examine these in more detail. For example, to see the name of the CHAR_DATA named ch, you would do:
print ch->name or p ch->name
You can dereference pointers in this way just as you would do in code, and thus can look at all items in a list among other things.
Once you have printed a variable, you can then refer to it with the $ symbol, which can be very useful for traversing a linked list among other things. Example :
p *list_head
p *$->next
p *$->next
This is obviously much easier than :
p *list_head->next->next->next
Note that using '*' dereferences a pointer, thus while
p ch
Will show the address of the pointer,
p *ch
Will show the value of the structure it points to.
Note that you can recall and edit previous commands using the cursor keys.
Main Rom Page Index 1 2 3 4 5 6 7 8