View Single Post
  #17 (permalink)  
Old 24th September, 2007, 11:49 AM
ThunderRd's Avatar
ThunderRd ThunderRd is offline
Irreverent Query Chairman
 
Join Date: June 2007
Location: NYC native in northern Thailand
Posts: 2,241

Quote:
Originally Posted by Strongwolf View Post
Strongwolf wrote:
"Core successfully engaged"
Both core @ 100% usage.

How can we be sure?


OK now, relax. Good question. I'm happy to hear that you're running, but a lot can still get *fuxored* so here's how to check if it's working properly.

Now, you can do this the fast way or the safe way. I don't know if the fast way works, I'm a safe and slow guy at heart. Here's what I did, only because I did A LOT of reading about the problems with this client first.

First, I ran it in a window for several days before I tried to run as service. It means that you have to manually start it, or start it with a shortcut in the startup folder. As service, of course, there's no window, and it starts up by itself. There was a distinct disadvantage, for me, of not having it in a window when I began and was learning (I still am). That is I couldn't see in real time what was happening with the client. Well, that's not completely true, I did have FAHMon, but it's inadequate in keeping you informed when clients hang. [Another topic]. So I ran a window, as I said before, for a few days until I was certain that it was running ok.

Here's how I satisfied myself it was ok:

There are 4 running processes for this client. Mpiexec, smpd, fah, and the fah core. You will see 4 instances of the core, and 1 each of the other 3. Fah calls mpiexec and the core. Mpi is weird. Sometimes it doesn't shut down when you ask it, and when you start fah there are 2 instances of mpi running. This is bad. The client will NEVER get off the ground if mpi is running twice. So check it every time you manually shut down, and make sure you kill the mpi process, as well as any fah cores that are being naughty and not terminating. (That happens on occasion, too.) Do this in Process Explorer or task manager or whatever you fancy. I use PE.

The CORRECT AND DOCUMENTED way to shut down is CTRL-C. There is no other method of which I am aware. [Actually, it's the best way anytime you're running a "dos" box]. However, if after using CTRL-C, mpiexec and/or fahcore(s) are still running, use PE to kill them all.

When you start fah.exe and it is working properly, you will immediately see fah.exe and mpiexec in the PE window. Smpd is a system process and was already running. Quickly behind that the cores will appear. Count them. If there aren't EXACTLY 4, something's wrong. Shut down everything and try again. I can't explain why it misbehaves sometimes but it's documented on the web, and on my 35 machines I see it not often, but regularly. Sometimes it takes several tries to get it started. This is more rare, but it does happen. So be patient. And make sure to kill the mpi process every time you shut down.

Now, if it's working, you will see the 4 core processes use cpu time, and it should total 100%. The core cpu time will fluctuate for maybe 30 seconds or a minute. In the window you will see "+working", followed closely by "ensuring status". Then it will drop to 0%. Don't be alarmed. The fluctuation is the client "synching up" the cores. Then in the window you'll see "entering M.D." After a few seconds, the cpu time will ramp up again to a total of 100%, and the window will show the percentage completed.

Sometimes the cores misbehave and only 3 of them will use cpu. The fourth one sits at 0%. In this case, the client will hang in the window at "entering MD". Shut down, try again.

I have also noticed that after killing the processes it helps to wait for a minute or so before restarting them. It *seems* to be more reliable. I'm fairly sure that this is a result of Windows needing some time to free up the process handles, so it may help to wait.

What you need to see to establish a successful launch is: the 4 processes running, with 4 instances of the core; and the combined core cpu at 100%. IF THESE THINGS DON'T HAPPEN, KILL THE PROCESSES AND TRY AGAIN. And sometimes again, and again. Have patience. Hopefully this quirkiness will be at least *partially* solved in the next release. Remember that the client was designed for quad-core chips. It just *happens* to also work on dualies.

Now, as for overclocking. Personally, I'd wait until I had a good handle on how the client behaves before I tried. I do plan to do it soon. I have been running the client for about a month now, so I don't consider myself knowledgeable enough as yet. Maybe I'm being overly cautious; only you can decide what you try. If I were you, or anyone just starting to use this beta, I'd enjoy the extra production for a while before I started to push it.

If you decide that all is ok and you feel the need for running as service, let me know and I'll tell you what to do. I now have all 35 running that way. I think CCPerf runs his rig as service, too, so he could help you as well if I'm not around.

This client is still flakey and unstable at times, for no apparent reason. So it's good to visualize what SHOULD happen in the window and in task manager. That way you know if it's going wrong. I have some complaints about FAHMon, and the way it detects hung clients, which I plan to bring up in another forum. For large farms like this a more powerful monitoring app would be useful. But at this time it's the only monitoring program that does *most* of what I want it to.
__________________
#1: Tt Armor, ASUS Maximus Extreme, QX9650@4.1G, 8G Corsair Dominator GT DDR3-2000, Corsair HX1050, H2O-Swiftech, Gigabyte GTX470/Arctic Accelero Xtreme Plus II, Intel 520 SSD, Kingston SSD, 2xRaptor 150G RAID0, Win 7 Pro 64
#2: Tt Shark, ASUS P5Q Pro Turbo, Q6600@3.8G, 4G HyperX-1600, Corsair HX850, CoolerMaster V10, 2xASUS 9600GT, 2xRaptor 74G RAID0, OCZ Vertex 4 SSD, Gentoo/siduction Linux [64-bit]
#3, #4: Opteron 170@2.75G nude, A8N-SLI Deluxe, Gentoo

AOA Folding @HomeOur sister site: www.gamersonlinux.com

Last edited by ThunderRd; 26th September, 2007 at 06:16 PM. Reason: added quote
Reply With Quote