Message boards : News : ACEMD updated app
Author | Message |
---|---|
As I said. We are currently compiling the Windows version. | |
ID: 59700 | Rating: 0 | rate:
![]() ![]() ![]() | |
might as well compile it for CUDA 11.8 to bring Ada (40-series) support. | |
ID: 59708 | Rating: 0 | rate:
![]() ![]() ![]() | |
大家好! 我在中国上海 如何让GPU 工作在百分之一百的状态 我发现在运行时GPU 一直在百分之30左右![img][/img] | |
ID: 59720 | Rating: 0 | rate:
![]() ![]() ![]() | |
大家好! 我在中国上海 如何让GPU 工作在百分之一百的状态 我发现在运行时GPU 一直在百分之30左右![img][/img] 这个情况对于这个Python程序很正常,这个python程序用更多的CPU,而不是GPU。GPU的使用会被CPU限制。如果你同时运行两个任务,可以提高GPU的使用。但是在用这个Python程序的时候,你无法让GPU达到百分之百的状态。 ____________ ![]() | |
ID: 59722 | Rating: 0 | rate:
![]() ![]() ![]() | |
我Nvidia能到80%。我也同时在运行其他的CPU(20%)和Intel GPU(97%)项目。电源调成最佳性能后,CPU到50%。Intel i7 12代。 | |
ID: 59725 | Rating: 0 | rate:
![]() ![]() ![]() | |
Looking around I see the present batch of protein ligand sims are crashing... DARNIT! process exited with code 195 (0xc3, -61)</message> anything else found? ____________ "Together we crunch To check out a hunch And wish all our credit Could just buy us lunch" Piasa Tribe - Illini Nation | |
ID: 59734 | Rating: 0 | rate:
![]() ![]() ![]() | |
Looking around I see the present batch of protein ligand sims are crashing... DARNIT! if someone can preserve the data files and slot directory before it gets uploaded and subsequently wiped from your system, should be easy to figure out what's wrong. my guess is they didn't name that run.sh file properly (via open_name probably), or didnt add a task to extract the file in the wrapper config file (jobs.xml), or something along those lines. ____________ ![]() | |
ID: 59735 | Rating: 0 | rate:
![]() ![]() ![]() | |
actually I have some on my system so i took a look. | |
ID: 59736 | Rating: 0 | rate:
![]() ![]() ![]() | |
What app exactly? | |
ID: 59738 | Rating: 0 | rate:
![]() ![]() ![]() | |
What app exactly? the new free energy one ('ATM' moniker). using the wrapper to call the run.sh script. also it would be a good idea to add a checkbox for this app in project preferences. this app showed up with no warning and no announcement from the project and no way to prevent it it seems. I'm not sure if it's marked as beta or not. ____________ ![]() | |
ID: 59739 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yes, we should have made a beta, but this app is not related to this thread. | |
ID: 59740 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yes, we should have made a beta, but this app is not related to this thread. you're right, but there is no announcement thread for this app, so no where else appropriate in the News section to get your attention about it. ____________ ![]() | |
ID: 59741 | Rating: 0 | rate:
![]() ![]() ![]() | |
Soon we will announce it. This is just testing to see if it works which should have been done on a beta app. | |
ID: 59746 | Rating: 0 | rate:
![]() ![]() ![]() | |
interesting to see that Ada "should" run on the Ampere cubins. I know the app has an architecture compatibility check, and it may fail there even if it could otherwise work. | |
ID: 59749 | Rating: 0 | rate:
![]() ![]() ![]() | |
I am successfully running the current ACEMD_3 tasks on a GTX980ti, on a Quadro P5000, and on two RTX3070. | |
ID: 59758 | Rating: 0 | rate:
![]() ![]() ![]() | |
As a first, you can try resetting GPUGRID project at failing host. | |
ID: 59759 | Rating: 0 | rate:
![]() ![]() ![]() | |
... that's what I am guessing, too. However, I was closely watching the RAM usage (via MemInfo) when the tasks started: at the moment the task crashed, about 2 GB were still free. Further, for the tasks running on the other hosts mentioned above, the Windows tasks manager shows a RAM usage between 60MB and 400MB per task. Maybe the CPU Intel Core2 Duo E7400 @ 2.80GHz is too old for these tasks? (However, some other GPU projects like Einstein, WCG and Primegrid are running well). | |
ID: 59760 | Rating: 0 | rate:
![]() ![]() ![]() | |
... i could very well be that the CPU is too old. it does not support AVX extensions for example, and if the application is built with this requirement then that could be a reason. ____________ ![]() | |
ID: 59763 | Rating: 0 | rate:
![]() ![]() ![]() | |
perhaps one of the GPUGRID people could tell me if this is the case? | |
ID: 59766 | Rating: 0 | rate:
![]() ![]() ![]() | |
Just had one and it failed after 26 seconds on my 4090 | |
ID: 59767 | Rating: 0 | rate:
![]() ![]() ![]() | |
Just had one and it failed after 26 seconds on my 4090 are the Python tasks working on your 4090? or were those run on a different GPU? ____________ ![]() | |
ID: 59768 | Rating: 0 | rate:
![]() ![]() ![]() | |
Python run fine on my 4090, though they don't do much at all, all the work seems to be on the CPU. | |
ID: 59769 | Rating: 0 | rate:
![]() ![]() ![]() | |
Python run fine on my 4090, though they don't do much at all, all the work seems to be on the CPU. Thanks. could you please report your failed task? click update on BOINC for GPUGRID to send back the result. I'd like to see the nature of the failure, to see if the architecture check is the reason for failure. ____________ ![]() | |
ID: 59770 | Rating: 0 | rate:
![]() ![]() ![]() | |
Done :) | |
ID: 59772 | Rating: 0 | rate:
![]() ![]() ![]() | |
Looks like the application does not understand the 4090 architecture. Needs to be recompiled with the gencodes that Ian pointed out. | |
ID: 59774 | Rating: 0 | rate:
![]() ![]() ![]() | |
maybe you can tell, (if you can run an ACEMD3 app on another host that is AVX enabled) by setting the AVX offset in the bios of a capable host and then checking to see if the processor speed corresponds while running the wrapper (with no other WU). | |
ID: 59776 | Rating: 0 | rate:
![]() ![]() ![]() | |
... interesting, larrywhitehead's 1060 3GB also does not seem to want to do these tasks https://www.gpugrid.net/results.php?hostid=493191 only a vague siderr message onl(unknown error) - exit code 195 (0xc3)</message> Yet I only observe a little over 2GB graphics memory being utilized max so far on my hosts. | |
ID: 59777 | Rating: 0 | rate:
![]() ![]() ![]() | |
Looks like the application does not understand the 4090 architecture. Needs to be recompiled with the gencodes that Ian pointed out. That’s exactly what I thought would happen. I had the same experience with some other people trying to run the Einstein CUDA BRP7 app. Didn’t work on 11.7 but did work once I compiled it for 11.8 with gencode defined for CC 8.9 ____________ ![]() | |
ID: 59778 | Rating: 0 | rate:
![]() ![]() ![]() | |
Is ACEMD3 not yet supporting the NV 4k architecture on W10? This is a 4070 Ti with the CUDA 1121 app. | |
ID: 60026 | Rating: 0 | rate:
![]() ![]() ![]() | |
Is ACEMD3 not yet supporting the NV 4k architecture on W10? This is a 4070 Ti with the CUDA 1121 app. That’s correct. The current CUDA 11.21 app does not support Ada 4000 series. ____________ ![]() | |
ID: 60027 | Rating: 0 | rate:
![]() ![]() ![]() | |
Is ACEMD3 not yet supporting the NV 4k architecture on W10? This is a 4070 Ti with the CUDA 1121 app. Thanks for confirming. | |
ID: 60033 | Rating: 0 | rate:
![]() ![]() ![]() | |
I got ACEMD 3 task for my gtx 1080ti on Windows (2oiq-ADRIA_KDMD_1k_test_3809-0-1-RND9959). | |
ID: 60109 | Rating: 0 | rate:
![]() ![]() ![]() | |
I see that a new acemd3 app was published yesterday for the Linux hosts in an attempt to fix the expired Acellera licensing issue. | |
ID: 60898 | Rating: 0 | rate:
![]() ![]() ![]() | |
Looks like they tried to just use the Windows code and of course failed with trying to use a Windows only msvcrt Python function. It seems that You're right. And currently still pending to address for Linux hosts: Nombre 0_0-CRYPTICSCOUT_pocket_discovery_c82914d2_15b4_4300_b4db_cb72998e09bf-6-7-RND0445_6 No hope for a solution in short term, since usually Universities get frozen in Christmas time... Merry Xmas | |
ID: 60904 | Rating: 0 | rate:
![]() ![]() ![]() | |
I'm waiting till after New Years before bugging Gianni again with the request to fix the acemd3 app properly. | |
ID: 60905 | Rating: 0 | rate:
![]() ![]() ![]() | |
I'm waiting till after New Years before bugging Gianni again with the request to fix the acemd3 app properly. my Windows10 PCs were successfully crunching ACEMD 3 until this morning. Within the past hour, some more ACEMD 3 tasks were downloaded and failed after about 1 minute. See here: http://www.gpugrid.net/result.php?resultid=33725238 | |
ID: 60922 | Rating: 0 | rate:
![]() ![]() ![]() | |
I'm shocked to discover that this morning I have a acemd3 task running for 50 minutes so far. | |
ID: 60923 | Rating: 0 | rate:
![]() ![]() ![]() | |
But that is only one task out of about 20 so far today that is being successfully run. All the rest are ATMbeta and have failed due to bad configuration file inputs. | |
ID: 60924 | Rating: 0 | rate:
![]() ![]() ![]() | |
New Linux acemd3 app has an expiration date 3649 days into the future. Should not be an issue for years now. | |
ID: 60925 | Rating: 0 | rate:
![]() ![]() ![]() | |
New Linux acemd3 app has an expiration date 3649 days into the future. Should not be an issue for years now. good news for the Linux crunchers. However, it would be great it they did the same for the Windows version, and until this will be done, they should stop sending out Windows tasks which keep failing within a minute. | |
ID: 60926 | Rating: 0 | rate:
![]() ![]() ![]() | |
You need to look at a running task while it is still in its slot and capture the stderr.txt and progress files for later examination before the task errors out and clears the slot. | |
ID: 60927 | Rating: 0 | rate:
![]() ![]() ![]() | |
... Your uploaded result files do not have any useful information about why the tasks are failing. yes, you are right, the task from the link I uploaded before does not show any stderr.txt - for what reason ever (I did not check this before, sorry for that). I have noticed that this is the case with all tasks from this PC, regardless of whether they succeed for fail; no idea why. However, the stderr from the other PC where ACEMD 3 tasks also failed does work, here is an example: http://www.gpugrid.net/result.php?resultid=33725327 You should at least examine the acemd application for its license expiration as posted in my last post. Assuming the Windows application got the same license expiration, the tasks should run. As yesterday the ACEMD 3 started failing at about the same time on both of my PCs (with a third PC, unfortunately I cannot crunch ACEMD 3 because the app does not work with Ada Lovelace yet), my guess, of course, was that this is not due to any problems with my hardware or my software, but rather due to a problem with the app itself, probably with the license. | |
ID: 60928 | Rating: 0 | rate:
![]() ![]() ![]() | |
The stderr.txt on Windows hosts never shows any reason for failing or succeeding. | |
ID: 60929 | Rating: 0 | rate:
![]() ![]() ![]() | |
Actually, Erich's https://www.gpugrid.net/result.php?resultid=33725327 does contain a useful error code: 0xC0000135 You have to be careful and search Microsoft itself for that one: the general internet chatterbox will usually say that a specific component is at fault (usually the .NET framework), which is unlikely to be relevant for research applications. You might be able to get a name for the missing component by trying to launch the application manually in a terminal window - it should populate that %hs parameter. | |
ID: 60930 | Rating: 0 | rate:
![]() ![]() ![]() | |
4 ACEMD tasks received at this Linux host on January 7-8th still continued failing after a few seconds. Application: ACEMD 3: molecular dynamics simulations for GPUs 2.22 (cuda1121) | |
ID: 60935 | Rating: 0 | rate:
![]() ![]() ![]() | |
They have "ModuleNotFoundError: No module named 'msvcrt'". | |
ID: 60936 | Rating: 0 | rate:
![]() ![]() ![]() | |
Following on from the reported issue in the ATM thread ("exceeded elapsed time limit" error - message 61483): App speed: <flops> 6271039115434 Task size: <rsc_fpops_est> 1000000000000000000 Correction: <duration_correction_factor> 0.010000 for an estimated run time of 1594 seconds - or 26 minutes 34 seconds, shown in BOINC Manager. The time limit for the task is set by <rsc_fpops_bound>, which is 10 times larger than the estimate. So, 4 hours, 25 minutes, 40 seconds on this GeForce GTX 1660 Ti. I'll let you know how it gets on - or you can look it up yourself this afternoon, at task 35250069. Or not. ACEMD failed: Error loading CUDA module: CUDA_ERROR_UNSUPPORTED_PTX_VERSION (222) Back to the drawing board, while it gets on with Quantum chemistry as usual! | |
ID: 61504 | Rating: 0 | rate:
![]() ![]() ![]() | |
You may need to update your drivers. | |
ID: 61505 | Rating: 0 | rate:
![]() ![]() ![]() | |
You may need to update your drivers. It's a possibility - but the card/driver combo is accepted to run the cuda1121 version of QC. It's only the Python beta which needs cuda1131. We'll see what happens when my other Linux machine catches a task - that does have a newer card and driver. | |
ID: 61509 | Rating: 0 | rate:
![]() ![]() ![]() | |
OK, that's looking more plausible. My other machine (driver 535.99) has completed tasks on the primary RTX 3060 GPU, and is now running one on the secondary GTX 1660 GPU - no problems so far. | |
ID: 61519 | Rating: 0 | rate:
![]() ![]() ![]() | |
I see we've been given a big new block of ACEND tasks to chew on. | |
ID: 61526 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yup, confirmed: | |
ID: 61527 | Rating: 0 | rate:
![]() ![]() ![]() | |
I guess updating the drivers solved your previous problem. | |
ID: 61528 | Rating: 0 | rate:
![]() ![]() ![]() | |
I guess updating the drivers solved your previous problem. Yes, that machine is running fine now - 6 tasks completed, plus two running. It's still in the danger zone for 'exceeded elapsed time limit', but looks like it should pull through. | |
ID: 61529 | Rating: 0 | rate:
![]() ![]() ![]() | |
All of my WU have failed for the past 3 days | |
ID: 61570 | Rating: 0 | rate:
![]() ![]() ![]() | |
It may be related with ACEMD 3 app update to v2.28 deployed on 26/06/2024. | |
ID: 61571 | Rating: 0 | rate:
![]() ![]() ![]() | |
I just started a new computer yesterday to run gpugrid | |
ID: 61572 | Rating: 0 | rate:
![]() ![]() ![]() | |
No way back. | |
ID: 61573 | Rating: 0 | rate:
![]() ![]() ![]() | |
Are there no work units? Is something amiss? | |
ID: 61574 | Rating: 0 | rate:
![]() ![]() ![]() | |
Are there no work units? Is something amiss? Watch the Server Status. It tells you how many work units are available: https://www.gpugrid.net/server_status.php | |
ID: 61575 | Rating: 0 | rate:
![]() ![]() ![]() | |
Bad batch of ACEMD 3: fails on Linux after ~20 sec, with: ERROR: read error for file "input.coor", byte number 4: number of atoms (1880162304) != (107863) expected ERROR: /home/user/mambaforge/conda-bld/acemd_1704215649797/work/src/mdsim/forcefield.cpp line 300: Cannot read BINCOORD file: input.coor Tasks 35376930, 35377052, 35377128. | |
ID: 61576 | Rating: 0 | rate:
![]() ![]() ![]() | |
aucun probleme chez moi sous linux mint. | |
ID: 61577 | Rating: 0 | rate:
![]() ![]() ![]() | |
aucun probleme chez moi sous linux mint. You just haven't been sent one of the bad ones yet. https://www.gpugrid.net/result.php?resultid=35372532 https://www.gpugrid.net/result.php?resultid=35376959 Host https://www.gpugrid.net/show_host_detail.php?hostid=462662 | |
ID: 61578 | Rating: 0 | rate:
![]() ![]() ![]() | |
Bad batch of ACEMD 3: fails on Linux after ~20 sec, with: Same here: https://www.gpugrid.net/workunit.php?wuid=28923983 | |
ID: 61579 | Rating: 0 | rate:
![]() ![]() ![]() | |
All of the ACEMD 3 2.28 tasks I received have failed, as well for the other people who were running them. So I'd venture with it being a fresh release, there are some issues with this batch. | |
ID: 61580 | Rating: 0 | rate:
![]() ![]() ![]() | |
Darn lag in posting! | |
ID: 61581 | Rating: 0 | rate:
![]() ![]() ![]() | |
mine are still failing | |
ID: 61587 | Rating: 0 | rate:
![]() ![]() ![]() | |
<core_client_version>8.0.2</core_client_version> | |
ID: 61590 | Rating: 0 | rate:
![]() ![]() ![]() | |
Hi, same here ..... | |
ID: 61596 | Rating: 0 | rate:
![]() ![]() ![]() | |
bonsoir | |
ID: 61597 | Rating: 0 | rate:
![]() ![]() ![]() | |
Hi, thanks Pascal but "mia mamma usa Windows" (my mom like/use Windows). | |
ID: 61598 | Rating: 0 | rate:
![]() ![]() ![]() | |
e peccato.anch'io useva windows e non volevo conoscere linux ma adesso e il contrario.ciao | |
ID: 61599 | Rating: 0 | rate:
![]() ![]() ![]() | |
Hello, it does appear the the Windows app version 2.28 is now broken. 2.27 worked. We are investigating. | |
ID: 61643 | Rating: 0 | rate:
![]() ![]() ![]() | |
It is now fixed, one of the files was corrupted. | |
ID: 61644 | Rating: 0 | rate:
![]() ![]() ![]() | |
Hi, thanks a lot. | |
ID: 61645 | Rating: 0 | rate:
![]() ![]() ![]() | |
Have started getting the ACEMD 3: molecular dynamics simulations for GPUs v2.30 tasks for Windows. | |
ID: 61653 | Rating: 0 | rate:
![]() ![]() ![]() | |
Hi, | |
ID: 61656 | Rating: 0 | rate:
![]() ![]() ![]() | |
I think, at this project, that the application is more likely to be having a problem with your GPU hardware than the driver version. | |
ID: 61657 | Rating: 0 | rate:
![]() ![]() ![]() | |
Hi, ZLUDA will only work with PTX code, code that's agnostic to CC version. the acemd3 app is not compiled with PTX code, it's compiled with discrete CC compatibility values (6.0, 7.5, 8.6, etc). ZLUDA gives you a cc of 8.8, which is not a real CC from nvidia, and as such it is not possible to compile non-PTX code for this CC. ____________ ![]() | |
ID: 61659 | Rating: 0 | rate:
![]() ![]() ![]() | |
Message boards : News : ACEMD updated app