On the kinds and sources of lag

The word “lag” taken by itself means simply “delay between cause and effect”, however, in Second Life in particular there are multiple causes, types and reasons lag can happen, even though all of them will be perceived as “lag” by the user. Whenever you hear, “X causes lag”, don’t believe it until they say which particular kind it causes and how, because then they just don’t know what are they talking about. And yes, Avatar Rendering Cost, I’m talking about you, missy.

Second Life’s lag meter in the viewer describes three types of lag — network, server, and client, and this is just as confusing, because there are multiple kinds and causes of server lag, just as well as multiple kinds and causes of client lag. Besides, it quite simply lies at times. So let me list:

Network-originating lag

  1. Pure network lag. Back in time of MUSHes, that was the only kind we ever got. Every time you press a key, your viewer sends a command for your avatar within SL to move. Laws of physics say that there is no way in hell to make it get faster from one side of the globe to another than about 50 milliseconds — it simply can’t move faster than the speed of light, and since it needs to be recognised, routed, buffered, stored and resent along the way, multiple times, from Moscow I get about 200 milliseconds before my commands reach my avatar, and 200 more to see the results if they are aren’t something the viewer could expect. 200ms is very fast, but it’s enough to recognise a delay in sound, for example. It can be worse or better depending on your connection, depending on who ate the submarine cables this morning and whether it was Cthulhu, and innumerable other factors. Like weather, it is largely out of anyone’s control unless they move somewhere else. There is, though, one interesting cause of network lag, and that is stuffing too much data into your connection — and in case of Second Life that includes playing media streams.
  2. Packet loss. Second Life uses UDP for most of it’s traffic. UDP means ‘User Datagram Protocol’, but is often meant to say ‘Unreliable Datagram Protocol’. That is because it does not offer any built-in guarantees of delivery of every piece of data that goes in. If an intermediary machine in the long chain that connects your computer to the Second Life servers decides to chew on your packets, there’s nobody to hold it accountable for it and no builtin safeguards anywhere along the way to replenish the chewed-up packets, unlike with TCP protocol you use to get webpages. Packets can get lost for a multitude of reasons, including packets being sent too fast, and when a packet doesn’t arrive even though it should have in theory, your viewer often can’t tell — but will pretend nothing is happening. There are numerous causes of packet loss and they are just as fickle as network lag in general, but some you can combat — mostly by making sure SL isn’t sending or receiving data too fast, and ensuring that your endpoint network equipment is in top condition. As a side note, network lag may include misconfiguration of the internal network that connects the multiple components of Second Life server grid itself.

Server-originating lag

  1. Physics lag. All avatars and all objects that have the ‘physical’ flag set on them have attributes like weight and a ‘physbox’, which determines just where they begin and end. Physics system determines what happens when they collide, what speeds they move at, and where they end up after smashing into each other. Second Life uses a licensed Havok 4 physics engine to do this, which is a third-party library seen in most, but not all modern first person shooter games. And it has some common known problems — objects that penetrate each other aren’t sure where they should go, and the computational requirements of determining what happens to a stack of boxes standing on each other rise exponentially as you add more boxes. Eventually, as you add more and more boxes, the physics system chokes the CPU, and accumulation of small errors in calculations causes the otherwise perfectly built tower of boxes to fall apart. Physics in SL is highly odd. I’ve spent quite a bit of time playing with a physical catapult, and not a day passes without me coming in to see the catapult arm having fallen through the axis it is supposed to rotate on — at some moment, the physical arm moves fast enough to get completely through the stationary axis before it’s next position is to be calculated. Similarly, you can often fly through walls this way, and you can lag a sim to a standstill by dropping enough big boxes on each other. The way to deal with this kind of lag is to avoid physical objects unless they are necessary wherever possible. Before Havok 4, a physical megaprim reportedly could lag a sim to it’s knees just by being there, but this issue has now been dealt with.
  2. Prim update lag. I have no idea how efficently this works, except that gleaned from the ‘Show Updates’ line in the Advanced menu. Normally, once a sim has rezzed, your client knows where the objects are and what state they are in. Once the object is sent to your viewer, nothing more needs to be said about it. But if it’s state changes — for example, the texture changes, or the object text hanging over it changes — in many cases the entire state of the object (but not the texture picture itself, which is the bulk of the data traffic) needs to be sent again, which causes another burst of data in the network and eats away a little more of the sim’s CPU. This is far less of an issue with avatar attachments than it is with stationary scripted objects – my prim hair and shoes only send updates at the beginning of an animation sequence. The longer your animation, including stand animations, the less updates your attached prims will cause to be sent. Stationary objects, particularly locked ones, don’t sen prim updates at all.
  3. Script-induced lag. Apparently, there is no way in the original LSL virtual machine to actually throttle the CPU time allocated to running a single script. Don’t ask me what they were thinking. While numerous restrictions on scripts exist, starting from memory and ending with some rather braindead ways of the way variable assignment works, working around them causes more CPU time to be used running the script, and not less. Scripts that declare that they want to hear too many communication channels can choke the chat subsystem as the number of checks it needs to perform per ever message sent by anything in the sim multiplies exponentially. Scripts not written to take all those intricacies into account can lag the sim into the ground just like a pile of boxes can. We can only hope that with the new Mono virtual machine this problem will eventually be gone.
  4. Asset-server lag. The asset server database is the single weakest point in the entire Second Life server system. No matter how distributed it is, every resident needs to access it all the time, often multiple times per second. Whenever you open your 30000-item inventory, you request lots of data about your items, more than you actually see, and all that data needs to be sent by something, which takes up CPU time and bandwidth.

Client-originating lag.

  1. Memory leak swap lag. Second Life client leaks memory and other resources, some of them far more finite than the memory we get these days. That is, it requests them from the system and then forgets to give them back when it’s done with them. Eventually the resource runs out. If the resource is memory, the system starts to frantically dump everything it doesn’t need at the moment into the disk swap space, and disk access is magnitudes slower than memory access. Eventually it runs out anyway and your client dies, but before it does, it will go slower and slower as it finds that bits of it ended up on disk instead of memory where it expects to find them, and it has to wait until the system loads them back.
  2. Graphics rendering lag. Strictly speaking, it’s not lag at all, because what whatever actually happens inside the server when you press a button while your graphics card can’t cope with rendering still happens at the same time as it would if your graphics card were just dead or the next top of the line model. But because of the way Second Life is written, you don’t even see the letters you type in until the next screen update is shown, which gives the impression of lag. Each object, each prim, everything in Second Life eventually becomes a mesh of triangles that your graphics card has to draw, and your graphics card has a finite capacity of taking care of those triangles, measured in millions per second by now, but compounded by the fact that Second Life cannot perform optimizations the normal 3D games have made an art out of. They can preprocess a 3D world and ensure that parts that cannot possibly be seen will never go to the graphics card. Second Life cannot, because all objects can at some point be seen, if only because all objects can at some point be moved, so it dumps most of what you see into the graphics card and hopes for the best. Some things, like reflections, cause certain objects to need to be drawn twice. Second Life tortures your poor graphics card like nothing else out there, and if you see too much things at once, the card will choke and your FPS will drop.
  3. Flexi drain. Actually, that would be a subset of graphics rendering lag, because flexi objects are not actually subject to the physics system. They are phantom, meaning that they cannot possibly collide with anything, and as far as I can see, all processing related to how they flex actually happens clientside — your client takes into account how they would move given a certain flexi setting and draws that for you. Extra flexies actually cause lots of interesting and time-consuming computations to happen.

As you can see, Avatar Rendering Cost, with it’s abstract parrots geared to calculate the amount of triangles your card has to draw, is not related to lag as such, but only to the work put on your card to draw the stuff, which is within your power to remove at the viewing side — pull your draw distance down and most of it will be gone, as well as most of the prim update lag, since you don’t get updates to prims you cannot see. The only way ARC would actually matter to lag as such, cause lag that you couldn’t get rid of by removing the bells and whistles in your own renderer, is prim update lag, but ARC awards highly inflated scores to alpha textures which do not involve more prim updates at all.

And next time I will tell you some interesting things about how object-object occlusion works and what SL actually does to reduce the load put on the card, because the issue of rendering costs turned out to be quite a bit more complex than I thought…

update much later:

It turns out that the bulk of a sim’s time is spent sending textures to newly arrived avatars. As a result, there is one factor that generates most of the lag: Sheer numbers. No matter how you dress, or what your ARC is, there’s simply no way around it. There is, however, one way to make things worse for everyone.

Clearing your cache.


2 thoughts on “On the kinds and sources of lag

  1. Pingback: Bare Naked Or Lagged? | Living in the Metaverse

  2. Pingback: Thoughts on Hair Fair 2009 : slFIX

Comments are closed.