The last few weeks I've been implementing a scenegraph on top of my renderer. It had renderable nodes (meshes, vertices, material lists etc.), transform nodes, various visitors, serialization, and it is all basically working. The problem I find now, is that I want to get rid of the whole thing.
The reason is something that has been bothering me the whole time I've been working on it. I am building a modern game engine, free of as much historical cruft as I can arrange. The biggest challenge of modern hardware is effectively using the sheer number of processors (CPU's, GPU's, SPU's etc.) you have access to.
Classical game designs mostly are of the one big loop variety. But how do you optimize a scenegraph for a CPU that has 8 processors? Nodes are all over the place in memory so you can't traverse the data in any cache-efficient manner. Having more than one thread running over the same tree structure is fraught with peril. And it is a tree so you can't really split it into 8 pieces and process each section independently very effectively.
But I was building a scenegraph because every game engine I looked at had one. This was bothering me until I read this post. And the light went on. Why have a scenegraph at all? I'm not planning for heirarchical character animations which are the main selling point of the scenegraph, so what was I doing?
That realization made me understand a little better what he was getting at in this article. Organize your rendering data structures in ways that make sense, don't just go with what is in the textbooks.
That got me thinking about how I was going to manage multithreading. So far I was making a one big loop game. Whoops. Hmm, the renderer has a whole bunch of optimization steps that could be done in parallel :)
Once I started thinking about multithreading it got me looking at thread pools, dependency graphs, scheduling and more when I came across Intel's Smoke tech demo and this rather confusing article. (There is another article at gamasutra.com which is a bit friendlier.)
I've looked at next-gen multithreaded engines, and I hadn't seen anything like Smoke's Change Control Manager which uses a publish-subscribe mechanism to put all the cross-system communication into one small section of the frame. Once everyone is updated, each system runs independently from that point. Hmm. Interesting!
Thread pools I like since you can scale them to however many processors you have, but I need a system to figure out how to distribute that work. I don't really like the idea of using a dependency graph to schedule tasks - sure it is predictable, but it seems like something a computer should be able to do for me. That led me to the Cilk Scheduler and work stealing which seems like a really sensible way to distribute work over multiple processors.
Figuring out how to implement that got me looking at Deques (Double Ended Queues - been doing this over 20 years, never heard it called a deque) and specifically at this lock free deque (and this one.) That first one is really scary. I can appreciate the fact that they put it together and maybe at some point I'll be desperate enough for a little additional performance, but for now I'm thinking that for this application, where the queues should contain reasonable sized tasks and be manipulated relatively infrequently, that a single threaded deque and a spinlock using InterlockedExchange should give me adequate performance.
Once you start looking hard at multithreading, you figure out why the optimization guys are always harping on about the badness of global variables. Yes, this is where those singletons bite you in the ass. They need to be serialized. I had already built a couple global systems: a memory manager, a logging system, an asset manager. I can see that I am going to have to refactor the memory manager so I can do per-thread memory allocation without the overhead of locking in a few specific cases. In the general case, the memory manager, asset manager and logger will get spinlocks.
For me, not being any kind of an expert on rendering, the scenegraph debacle was a learning experience and I'll chalk it up to that. Rest assured a lot more thought will go into the features I'm working on from here on out. If you happen to be an expert in concurrency and know of better ways than the above, please do let me know.
We were unable to retrieve our session cookie from your web browser. If pressing F5 once to reload this page does not get rid of this message, please read this to learn more.
You will not be able to post until you resolve this problem.