Strongwolf 23rd September, 2007 04:32 PM

Windows SMP Client, general troubleshooting
Try to install SMP last night but fail miserably. In fact manage to make another big mess with my rig :banghead:. Help,...please!!!

ccperf721p 23rd September, 2007 04:42 PM

Two things you must have before installing the SMP client. .Net 2.0 and an administrator passwd. Once you have those, install the client, go to the directory you installed it to and open the dos install file, follow the onscreen instructions, then open the fah folder and input the rest of the info

ThunderRd 23rd September, 2007 07:04 PM

Yes, do what cc says and then let us know how it goes. Are the other problems you have because of the machine or a result of installing SMP?

One more thing. After you run install.bat from the directory, enter the password, and start the client, answer NO to install as service (FOR NOW). There is an undocumented trick to get it going that way. Let's make sure that it works properly for you in a command shell window first. Then we can help you to install as service if everything goes ok.

It's bedtime for me but I'll check this thread in the AM.

Strongwolf 23rd September, 2007 11:51 PM


Strongwolf 24th September, 2007 01:17 AM

It looks like I screw up somewhere. when and open the dos install file it says "press any key to continue". I press any key and it takes me back to the directory(each and every time I do this). I go ahead and open the fah folder and input the rest of the info.
Everything looks OK but at the end appears "error starting Folding@home core"; then go over the whole thing again (like a loop).
Also a folder name "FahCore_a1" appears on the directory.
Any clues??

ThunderRd 24th September, 2007 03:41 AM

You won't get running if MPI doesn't install. (That's what the install.bat program does)

1-Uninstall everything thru Add/Remove
2-Did you create a logon password for the Windows account? (This means that you have to type it in when you start the box. I'm not sure, but I don't think that MPI works with auto-logon)
3-Is dot net 2.0 installed?
4-Did you download the correct file? There are quite a few different versions, it's easy to FU
5-On my machines it was necessary to unload my anti-virus and my HIPS software. You may need to do the same. You can punch a hole for SMP later by making exclusions for the install directory and work sub-directory.

If the answers are yes, reinstall using the original downloaded file.

Run install.bat from the install directory. You should see some prompts that ask you to enter your windows account password followed by enter and the password again. Then there will be some lines that say "if you see this twice, MPI is running". If you DO see it twice, you're OK. Press any key to exit the dos box.

Then run the client again, and enter the information. Remember to answer NO to install as service, as I said before.

Then report back to us. ;)

Strongwolf 24th September, 2007 05:35 AM

Errrr.... I think I got it!!! I do see that twice!!!

"Core successfully engaged"
Both core @ 100% usage.

How can we be sure?

Strongwolf 24th September, 2007 05:44 AM

I don't dare to restart and start ocing. Afraid something bad happens. This is very exciting thou.

ccperf721p 24th September, 2007 05:46 AM

Glad to see it working. Good Job. If the SMP client is going it is at 100% you can be sure of that.

Strongwolf 24th September, 2007 05:50 AM

Can I restart and oc? Can I run a benchmark...or two? ah? ah? ah? What do you say? Can I?... Can I???

Strongwolf 24th September, 2007 05:52 AM

Still saying "o of 500000" thou.

ccperf721p 24th September, 2007 05:57 AM

It takes between 12 and 25 minutes for each percent, maybe a little longer without OC.

Of course you can restart and OC. Get your rig all set up first, then fold..

Strongwolf 24th September, 2007 06:00 AM

But we don't have it as a service. What do I do, start it again?/?

Strongwolf 24th September, 2007 06:08 AM

Come on... I'm dying here!!!

ccperf721p 24th September, 2007 06:12 AM

Yes start it again with fah. I usually just send a shortcut to the desktop

Strongwolf 24th September, 2007 06:15 AM

Oh boy, oh boy, oh boy. Hip, hip hurray, hip hip, hurray!!!!

PS Thanks a lot! I'm out to play!!!

"Core successfully engaged"
Both core @ 100% usage.

How can we be sure?

OK now, relax. Good question. I'm happy to hear that you're running, but a lot can still get *fuxored* so here's how to check if it's working properly.

Now, you can do this the fast way or the safe way. I don't know if the fast way works, I'm a safe and slow guy at heart. Here's what I did, only because I did A LOT of reading about the problems with this client first.

First, I ran it in a window for several days before I tried to run as service. It means that you have to manually start it, or start it with a shortcut in the startup folder. As service, of course, there's no window, and it starts up by itself. There was a distinct disadvantage, for me, of not having it in a window when I began and was learning (I still am). That is I couldn't see in real time what was happening with the client. Well, that's not completely true, I did have FAHMon, but it's inadequate in keeping you informed when clients hang. [Another topic]. So I ran a window, as I said before, for a few days until I was certain that it was running ok.

Here's how I satisfied myself it was ok:

There are 4 running processes for this client. Mpiexec, smpd, fah, and the fah core. You will see 4 instances of the core, and 1 each of the other 3. Fah calls mpiexec and the core. Mpi is weird. Sometimes it doesn't shut down when you ask it, and when you start fah there are 2 instances of mpi running. This is bad. The client will NEVER get off the ground if mpi is running twice. So check it every time you manually shut down, and make sure you kill the mpi process, as well as any fah cores that are being naughty and not terminating. (That happens on occasion, too.) Do this in Process Explorer or task manager or whatever you fancy. I use PE.

The CORRECT AND DOCUMENTED way to shut down is CTRL-C. There is no other method of which I am aware. [Actually, it's the best way anytime you're running a "dos" box]. However, if after using CTRL-C, mpiexec and/or fahcore(s) are still running, use PE to kill them all.

When you start fah.exe and it is working properly, you will immediately see fah.exe and mpiexec in the PE window. Smpd is a system process and was already running. Quickly behind that the cores will appear. Count them. If there aren't EXACTLY 4, something's wrong. Shut down everything and try again. I can't explain why it misbehaves sometimes but it's documented on the web, and on my 35 machines I see it not often, but regularly. Sometimes it takes several tries to get it started. This is more rare, but it does happen. So be patient. And make sure to kill the mpi process every time you shut down.

Now, if it's working, you will see the 4 core processes use cpu time, and it should total 100%. The core cpu time will fluctuate for maybe 30 seconds or a minute. In the window you will see "+working", followed closely by "ensuring status". Then it will drop to 0%. Don't be alarmed. The fluctuation is the client "synching up" the cores. Then in the window you'll see "entering M.D." After a few seconds, the cpu time will ramp up again to a total of 100%, and the window will show the percentage completed.

Sometimes the cores misbehave and only 3 of them will use cpu. The fourth one sits at 0%. In this case, the client will hang in the window at "entering MD". Shut down, try again.

I have also noticed that after killing the processes it helps to wait for a minute or so before restarting them. It *seems* to be more reliable. I'm fairly sure that this is a result of Windows needing some time to free up the process handles, so it may help to wait.

What you need to see to establish a successful launch is: the 4 processes running, with 4 instances of the core; and the combined core cpu at 100%. IF THESE THINGS DON'T HAPPEN, KILL THE PROCESSES AND TRY AGAIN. And sometimes again, and again. Have patience. Hopefully this quirkiness will be at least *partially* solved in the next release. Remember that the client was designed for quad-core chips. It just *happens* to also work on dualies.

Now, as for overclocking. Personally, I'd wait until I had a good handle on how the client behaves before I tried. I do plan to do it soon. I have been running the client for about a month now, so I don't consider myself knowledgeable enough as yet. Maybe I'm being overly cautious; only you can decide what you try. If I were you, or anyone just starting to use this beta, I'd enjoy the extra production for a while before I started to push it.

If you decide that all is ok and you feel the need for running as service, let me know and I'll tell you what to do. I now have all 35 running that way. I think CCPerf runs his rig as service, too, so he could help you as well if I'm not around.

This client is still flakey and unstable at times, for no apparent reason. So it's good to visualize what SHOULD happen in the window and in task manager. That way you know if it's going wrong. I have some complaints about FAHMon, and the way it detects hung clients, which I plan to bring up in another forum. For large farms like this a more powerful monitoring app would be useful. But at this time it's the only monitoring program that does *most* of what I want it to.

Toro 24th September, 2007 05:22 PM

I just have f@h short cut located in the Startup folder in my Start Menu. Not having it running as a service doesn't bother me, as I am the only one that uses this system, plus it is easier to keep an eye on !

Strongwolf 24th September, 2007 05:47 PM

There are 4 instances of "FahCore_a1.exe" under Windows Task Manager. That's good,isn't it???

There are 4 instances of "FahCore_a1.exe" under Windows Task Manager. That's good,isn't it???

Yes... good.
What CPU are you using?

Be careful, follow thunder's advice closely. The smp core is still tempermental and has a bad habit of erroring out on startup if it was not shut down properly. That could mean losing 1700 points. No one has been able to find the exact cause, but some machines are more susceptible than others.... I have 2 boxes out of 5 that always dump WUs on reboot... good thing they are usually 24/7.

