Author |
Message |
|
why is using 2 nvidia gtx1080 cards a problem ? i only compute 1 wu even when 2 wu are send, python is using more than 30% capacity of the 3900X processor
if i leave 2 wu to work 1 of them " hangs " at 4%
cc-cofig is set to
<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>
____________
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1367 Credit: 7,967,038,047 RAC: 3,060,293 Level
Scientific publications
|
It's because these tasks are primarily cpu tasks, with small infrequent bursts of gpu activity.
The reason your tasks fail is because you are using Windows which has limitations.
Your tasks fail with this error message.
DefaultCPUAllocator: not enough memory: you tried to allocate 3612672 bytes.
You need to increase your paging file to around 60GB and you should be able to process two tasks concurrently.
They will use almost all of your cpu.
Please read through the main thread for these tasks for the reason why.
https://www.gpugrid.net/forum_thread.php?id=5233 |
|
|
|
hi Keith
thx for your reply > i changed the page file in w11 from automatic to 60000 and
at first computing of 2 gpu's went fine, both crashed after 4% progress
32 Gb ram is available
i noticed that windows does not allocate the 60000 MB after instruction to do so
the allocation seems to variate
now and before i run 1 task and stop the second before 4% when it wants to start
all the tasks perform succesful running 1 task |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1367 Credit: 7,967,038,047 RAC: 3,060,293 Level
Scientific publications
|
The 3900X host is still erroring out with not enough memory in the stderr.txt outputs.
You need to bump the pagefile up some more. Try 100000MB. I'm assuming your storage space actually has that much free space for that size of file.
I don't know much about Windows but maybe you need to restart Windows for the paging file change to be in effect. |
|
|
|
Keith
i have now 2 pagefiles of 100000 MB and watched in taskmanager 2 WU start, both will run python with about 20% processor capacity until 1 WU disappears and it stops at 4% progress
stderr.txt outputs is not found in W11
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1367 Credit: 7,967,038,047 RAC: 3,060,293 Level
Scientific publications
|
Keith
i have now 2 pagefiles of 100000 MB and watched in taskmanager 2 WU start, both will run python with about 20% processor capacity until 1 WU disappears and it stops at 4% progress
stderr.txt outputs is not found in W11
The stderr.txt output is the result file listed on every returned task on the website. You can examine every task in your browser here.
Just click on the task detail number in the left-most column.
For example your latest errored task:
https://www.gpugrid.net/result.php?resultid=33155788
This looks like a bad task however and failed first because it couldn't get all its requireed file resources. But then it failed later as usual because of not enough virtual memory.
Error loading "C:\ProgramData\BOINC\slots\39\lib\site-packages\torch\lib\shm.dll" or one of its dependencies
DefaultCPUAllocator: not enough memory: you tried to allocate 3612672 bytes.
Maybe some Windows user can help further. I am out of suggestions. When I have helped other Windows users by explaining why these task are troublesome for the Windows OS and offered the same suggestion to increase the pagefile size, the user has become successful.
I suggest returning to the main thread I linked and read through it or other Windows users posts and maybe glean some other pertinent information.
|
|
|
jjchSend message
Joined: 10 Nov 13 Posts: 101 Credit: 15,730,091,597 RAC: 1,360,243 Level
Scientific publications
|
Matthias,
First thing, simplify your troubleshooting. Only configure one Python task to run. After you get that working successfully then try adding the 2nd one. Go back to just one and see how that works. You can monitor things for one and see what the sizing looks like. If you are running other projects along with GPUgrid you should stop those and get them cleared off.
Second, you probably don't need 2 page files. That could actually be complicating things. Setup one page file on your primary OS disk. Select Custom size and set the Initial and Max size. For example with one Python task running mine is set 24576 and 51200. You can also see how much is currently allocated and that will be helpful to find out where your resources are limited. Mine currently says 48535 MB with one running. Remove the 2nd page file too.
The stderr.txt files are located in the slots directory wherever your BOINC Program Data folder is. You need to find the slot folder for the GPUGRID task by viewing the Properties. Once you open that you will find the file. Take a look at that when a job is running and see what it says. When a job is running correctly it should say "Created Learner" and it will stay there for several hours until it finishes or fails.
BOINC can also be a little touchy when it comes to how much disk space and memory it is allocating. This could actually be related to your problem. The default settings don't always work right for what you need. First look at the Disk tab and see what the Total disk usage looks like. Pay attention to the free, available to BOINC size. Then look at what GPUgrid is using. If you are running other projects you will need to compare the total size to what is available and make sure it is enough for everything.
You can make changes to these settings under Options > Computing preferences > Disk and memory tab. For the Disk section look and see if it is giving you enough disk space. You might only need 3-4 GB more so make an adjustment there as needed. You can lock it to a fixed size if you would like to do that too. Also, under the Memory section the "When computer is (not) in use ..." might need to be increased a bit. Make sure the Page/swap file setting is 100.00%
Final thoughts. I don't know how successful Win11 is for GPUgrid yet. There possibly could be other issues there. Recommend that you tune it up the best you can as well. Check for Windows updates, update GPU drivers, clean disk space etc. Don't run a lot of other programs at the same time you are running GPUgrid either. There could be a conflict of resources there too. GLHF |
|
|
gemini8 Send message
Joined: 3 Jul 16 Posts: 31 Credit: 2,234,529,869 RAC: 356,203 Level
Scientific publications
|
[...]
Select Custom size and set the Initial and Max size. For example with one Python task running mine is set 24576 and 51200.
[...]
Make sure the Page/swap file setting is 100.00%
I'd go for a fixed size page file.
Just set it to 51200 or whatever on inital AND max size. Thus, the space is always reserved, and adding more space fast enough can't become a problem.
My page file setting is 1% on all my machines. This is including Debian, Mac OS, Ubuntu and Win7 crunchers. No problems with that so far.
I don't know too much about fragmentation on recent Windows machines. My Macs defragmentage themselves quite nicely, so Windows might be able to do so as well nowadays, and the next thing might not be necessary anymore: If the page file is on a rotating disk, try to disable it, start anew, defragment you drive, then enable the page file to the size you want.
____________
- - - - - - - - - -
Greetings, Jens |
|
|
|
hi Keith and jjch and germini8
the funny mysterious is, it works today for both (2) WU's, no crash
pagingfile is now 81845 MB allocated by W11 ( i fixed the size but windows ignore's )
jjch . GLHF is funny > i looked it up : good luck have fun, thanks for your sharing
i have fun running boinc for years now and i need to keep up buying new (faster and more core's) hardware
we could/ should ask some people to return to contributing to gpugrid, don't know why they stopped ( python ? )
thanks again all ~ Matthias-Poortvliet-Netherlands
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1367 Credit: 7,967,038,047 RAC: 3,060,293 Level
Scientific publications
|
I just became a pioneer with arrows in my back.
Just upgraded one host to a new AM5 platform with a 7950X cpu and DDR5-6000 memory.
Lots of stuff to figure out now. Like absolutely no sensors are available in Linux except ffor the gpus and NVME stick temps. No fan speeds, no temps, no voltages are available.
Too new a platform for Ubuntu 22.04.1 LTS. |
|
|
|
my planned upgrade within a few weeks is on AM4 5950X
i'm collecting second hand hardware when i can
and Keith ~ arrows > you 're not dead yet
i think AM5 is a bit over the top > energy usage / efficiency
yet i wish you GLHF
(:-)
[/img]
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1367 Credit: 7,967,038,047 RAC: 3,060,293 Level
Scientific publications
|
So far no difference in energy usage or temps.
Benefit of being able to run my PCIE Gen.4 cards at Gen.4 speeds now.
Benefit of having Gen. 4 M.2 speeds with a Gen. 4 device for storage now.
Benefit of running cpu tasks at 800-1000Mhz faster than previously on the 5950X.
Some projects cpu tasks scale linearly just with clock speeds.
Haven't run any projects that can make use of AVX-512 SIMD instructions yet. |
|
|