Author |
Message |
|
Hello! I'd appreciate some help ensuring I have the right config to use as much of my card as possible (well, any of it would be nice since it's not being used at all...) I haven't run BOINC since 2019 and am pretty much out of the loop now.
I have a RTX 3080 that shows 0% GPU utilization, with a work unit that was at 64% with about 12 hours of runtime and 19 hours left, that is before I started tweaking things (rebooted with SWAN_SYNC = 1, tried the app_config.xml.
I was using BAM! since I have more than 1 PC running BOINC and more than 1 project but canceled or suspended everything else, now I have GPUGrid running alone with a restarted WU at 1.5% after 20 mins of runtime, still 0% GPU utilization according to GPU-z and I'm at a loss.
Wanted to add: Am on win10, Task manager CUDA graph also shows 0%, disabled both options to suspend computing or GPU usage for any reason.
Thanks for any help! |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1335 Credit: 7,472,617,459 RAC: 13,802,230 Level
Scientific publications
|
Since your hosts are hidden, just going to make a guess that you are running Python on GPU tasks?
What you are seeing is normal for these tasks. They are primarily a cpu task with small bursts of gpu use.
You can stop using the swan sync environment parameter as that was only helpful with the old tasks from several years ago.
You should get up to speed by reading the posts in the News forum, specifically this thread.
https://www.gpugrid.net/forum_thread.php?id=5233
|
|
|
|
Yes, I see my WU as a "Python apps for GPU hosts 4.03 (cuda1131)" application.
Thanks for the news thread! |
|
|
Igor MisicSend message
Joined: 12 Apr 11 Posts: 4 Credit: 1,352,519,335 RAC: 5,313,853 Level
Scientific publications
|
Since SWAN_SYNC = 1 doesn't help anymore with Python App I was hoping to utilize 3 tasks in parallel since there is plenty of memory at this HOST both as RAM and GPU RAM. But when I added a third task with an additional Boinc instance GPUGRID server started rejecting my tasks that are visible with the status "Abandoned".
Did anyone figure out how to do it?
http://www.gpugrid.net/results.php?hostid=601014 |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1335 Credit: 7,472,617,459 RAC: 13,802,230 Level
Scientific publications
|
I am assuming you have the 12GB version of the 3060. Can't tell by looking at your host since you are running an older version of BOINC that can only report 4GB of VRAM on Nvidia cards.
But these tasks can take as much as 4GB of memory to run each and around 60GB of system memory at 3X utilization.
So on the face of that assumption you don't have both enough GPU VRAM memory and system memory to run 3X.
My teammate was able to run 3X on his 12GB 3060's with the CUDA MPS server on his 128GB Epyc hosts. Those 3060's are now A4000's so plenty of memory on the gpu now. |
|
|
Igor MisicSend message
Joined: 12 Apr 11 Posts: 4 Credit: 1,352,519,335 RAC: 5,313,853 Level
Scientific publications
|
Thx for helping. You are right it is a 12GB 3060 version.
I'll write my observation, maybe it will also help someone else.
Previously when I had only 16 GB of RAM I observed that tasks would crash and BOINC would report errors. So I added an additional 32GB (now 48 GB) and 2 tasks in parallel works fine.
Then I added 2 tasks in parallel (a total of 4 tasks) and then I started to see errors again. So I figured out, ok, this can't fit both in RAM and in GPU VRAM.
Then tried 3 tasks in parallel.
I was observing GPU's VRAM that was at the current set of tasks not going over 10GB.
And then I was hoping that reported RAM + SWAP could take care of memory usage, but then after 1 hour of running all 3 tasks in parallel (exactly 1 hour running), first get aborted tasks that started first, and 10 minutes later 2 that started later also running in total for 1 hour each.
Then I started changing a bit of configuration in the hope that maybe I misconfigured something, and then I noticed at GPUGRID statistic that tasks are aborted even before BOINC gets information about it.
So what is the exact reason, I don't know. |
|
|
|
My teammate was able to run 3X on his 12GB 3060's with the CUDA MPS server on his 128GB Epyc hosts. Those 3060's are now A4000's so plenty of memory on the gpu now.
with some further tweaking, I'm actually now running 4x on the 3060. and 5x on the A4000s.
____________
|
|
|
zooxitSend message
Joined: 4 Jul 21 Posts: 23 Credit: 9,387,497,892 RAC: 49,780,347 Level
Scientific publications
|
Last time I tried running more than 2 tasks on the same GPU it only ran 2 tasks (I understood that GPUGRID limits the user at 2 tasks per GPU).
Did something change or did I misunderstand something? (must confess I didn't have time to read the newer posts yet...) |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1335 Credit: 7,472,617,459 RAC: 13,802,230 Level
Scientific publications
|
Still the limit AFAIK. And a total of 16 tasks per host.
You can get around that by spoofing the number of cards a host has up to that 16 task per host limit. |
|
|
zooxitSend message
Joined: 4 Jul 21 Posts: 23 Credit: 9,387,497,892 RAC: 49,780,347 Level
Scientific publications
|
Thanks. Didn't know, will look into that. |
|
|