Advanced search

Message boards : Number crunching : The hardware enthusiast's corner (2)

Author Message
Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 584
Credit: 10,697,126,258
RAC: 15,337,675
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58213 - Posted: 28 Dec 2021 | 23:08:24 UTC

The hardware enthusiast's corner (2)

It is customary at other project's forums for when threads become to long or unwieldy to close the original thread and create its successor thread with an enumeration typically.

Taking this Keith Myers wise advice, a new The hardware enthusiast's corner (2) thread is now opened.
Previous posts at the original thread keep being accessible at The hardware enthusiast's corner

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 584
Credit: 10,697,126,258
RAC: 15,337,675
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58215 - Posted: 28 Dec 2021 | 23:18:29 UTC
Last modified: 28 Dec 2021 | 23:34:16 UTC

Assembling a computer for working BOINC is not only building it, but also maintaining for it being in perfect shape for hard and reliable crunching.
For me, this gives sense for "The hardware enthusiast's corner (2)" thread.

It's delightful that a self-built host ends up serving for something useful!!
My preferent BOINC project is Gpugrid, but I have several other backup projects for productively filling the ACEMD3 / Python tasks scarcity periods like previous weeks.
I've been recently notified by PrimeGrid about my Gpugrid Host #557889 being discoverer of an unknown Prime number large enough for entering the Top 5000 List in Chris Caldwell's The Largest Known Primes Database.
The number: 95635202^131072+1

Here, a recent image of the inside of the lucky host.



The harboring chassis was recovered from scrap to give it a second life... ✅
It's something that we can expect from a hardware enthusiast ;-)

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1373
Credit: 7,998,731,143
RAC: 2,831,643
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58216 - Posted: 29 Dec 2021 | 5:28:10 UTC

Congratz on the new prime number discovery.

Looking forward to your insights and knowledge in the new thread.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 584
Credit: 10,697,126,258
RAC: 15,337,675
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58287 - Posted: 15 Jan 2022 | 11:44:02 UTC
Last modified: 15 Jan 2022 | 12:12:32 UTC

Upgrading system RAM, It's easy

If you're able to exchange your car's wheel without calling a tow truck, then, for sure, you're able to upgrade your system's RAM.
(And even if you aren't, I'd say that you're still able to upgrade your system's RAM ;-)

When new batches of ABOU Python GPU tasks arised, I got in problems with my two Multi-GPU hosts.
My triple GTX 1650 GPU system had installed 32 GB of system RAM, while my twin GTX 1650 GPU system had 16 GB RAM installed.
Grossly, each ABOU task needs 16 GB of system RAM to expand its environment. Therefore, I had to take care for my Triple GPU system not to get more than two ABOU simultaneous tasks, and my twin GPU system more than one...
So I decided to upgrade my triple GPU system from 32 to 64 GB RAM (and by carom, my twin GPU system from 16 to 32 GB RAM).

First of all, please, seat down a few minutes to study your move. Otherwise, you may get a nasty surprise...

As an example, take a look to Gigabyte B365M H motherboard specifications. Two of my hosts are based on it.
At Memory specifications section, it's said that maximum allowed RAM size is 2 x DDR4 @2666 MHz DIMMs up to a total of 32 GB system RAM.
This system can't handle 64 GB RAM. Installing 2 x 32 GB DIMMs will cause that system even won't start!

Here are specifications for the motherboard of the two systems to upgrade: Gigabyte Z390 UD
It can handle 4 x DDR4 @2666 MHz DIMMs up to a total of 128 GB system RAM, with maximum size of 32 GB DDR4 modules. Good!
I opened my hardware piggy bank and purchased 2 x DDR4 @2666 MHz 32 GB DIMMs.
The starting configuration for my triple GPU system was this 4 x DDR4 @2666 MHz 8 GB DIMMs = 32 GB total system RAM.
Upgrading is as simple as opening the computer's case, extract the four existing DIMMs, and install the two new ones.

Here one tip, based in my experience:
When I install a new slot-based component (memory DIMM, graphics card, expansion card) I insert and extract and reinsert it three times.
This is what I call the slot and the module contacts "becoming friends". This usually will prevent present and future problems due to poor electric contacts.
If modules refuse to enter, try turning them by 180º... They are "mechanically codified".
Always verify at the end that DIMM is fully inserted into the slot, and lateral latches are all the way closed.
When in doubt, there is usually a section on motherboard's manual explaining how the memory modules are to be installed depending on each model.

Now, the new system's memory section looks like this.
As you can see, I've installed both memory modules at slots of the same (grey) color. Please, take care of doing this way for taking advantage of dual channel memory performance enhancement.
Also it is described at each motherboard user manual when multi-channel memory architecture is available.

And the four extracted DIMM modules from previous system, are now re-used to replace the existing 16 GB RAM at my twin GPU system to upgrade to 32 GB.
Here is its final new look.

After that, My triple GPU system has been able to successfully process three concurrent ABOU tasks without worrying about a lack of system RAM.
This is a screenshot of BOINC Manager while they were processing.
Also I took a nvidia-smi screenshot.
And the following Psensor screenshot, where the typical "GPU spikes" caused by the learning agents can be appreciated:



Even a 6% of system RAM remains still available at this full load situation.

In the same way, my twin GPU system is able to process two concurrent ABOU tasks without worrying about it.

It was worth it! 👍️

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,953,018,481
RAC: 6,402,621
Level
Trp
Scientific publications
watwatwat
Message 58456 - Posted: 7 Mar 2022 | 21:03:34 UTC
Last modified: 7 Mar 2022 | 21:03:52 UTC

Woke today to find 8 Linux Mint 20.3 computers no longer able to communicate. When I put a head on them I can find nothing out of the ordinary except that it cannot communicate with the internet.
Strangely one computer can communicate over my LAN but cannot communicate with the internet.
Each has a static IP and I keep them updated.
I turned off all my switches and turned them back on starting from the hub. I've also rebooted several of them.
Any suggestions would be welcome. TIA

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1629
Credit: 9,691,921,332
RAC: 9,268,125
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58458 - Posted: 7 Mar 2022 | 21:49:57 UTC - in response to Message 58456.

For the commoner DHCP-allocated IP addresses, you don't just get an IP address. You also need (and get):

Subnet mask
Default gateway
DNS server

Of those three, the most likely culprit is the default gateway. Have you changed or reconfigured your router recently?

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,953,018,481
RAC: 6,402,621
Level
Trp
Scientific publications
watwatwat
Message 58460 - Posted: 7 Mar 2022 | 22:46:38 UTC - in response to Message 58458.
Last modified: 7 Mar 2022 | 23:06:56 UTC

For the commoner DHCP-allocated IP addresses, you don't just get an IP address. You also need (and get):
Subnet mask (255.255.255.0)
Default gateway (192.168.1.1, just checked and it still has this address)
DNS server (8.8.4.4,8.8.8.8)
Of those three, the most likely culprit is the default gateway. Have you changed or reconfigured your router recently?
No changes to router or gateway but I did try rebooting them.
If the default gateway had changed then all computers would lose connection but 30 are still running fine.
All my computers are on a wired ethernet with the motherboard RJ-45 status lights on. I did unplug and plug the cables back in.
I ran Advanced IP Scanner looking for evidence of duplicate IP addresses even though I'd made no changes.
My wife and kids use DHCP wireless connections for their handbrains and laptops. All my BOINC computers have static IP addresses. DHCP has 192.168.1.2-99 available and I've never seen them try to use higher numbers.
Most of my gear is long in the tooth so every month or two I scrap out a PSU or a motherboard. One or two failing wouldn't have even gotten my attention but 8 with one that can communicate on the LAN but not the WAN is strange.
The deaf computers are split between two unmanaged 24-port switches. I'll try turning off all switches, router and gateway and powering up from gateway, to router to hub switch, then the other switches. If that doesn't bring them back to life I'll just turn them off until I can attempt a fresh build. But I'd sure welcome a better suggestion.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1083
Credit: 40,330,187,595
RAC: 3,975,907
Level
Trp
Scientific publications
wat
Message 58461 - Posted: 8 Mar 2022 | 0:58:02 UTC - in response to Message 58460.

check the cables too, just as a quick check. or even specific ports on the switch could go bad. try a different port or known/good port.

what is the result of 'ifconfig' from a terminal? or 'ip a' if you don't have the ifconfig package installed

____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 584
Credit: 10,697,126,258
RAC: 15,337,675
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58463 - Posted: 8 Mar 2022 | 17:50:14 UTC - in response to Message 58460.

I connect to the Internet at home by means of a fiber optics line from my supplier.
From time to time, I lose the whole Internet connection with no apparent reason (not your case, a partial loss)
I then switch off everything but Ethernet switches, and start booting the Optical Node Terminal (ONT), then the Internet Router, and finally the WiFi access points. This usually solves my problem.

Recently, I noticed a linux-firmware update. If Linux OS, you could check for a common NIC at non-connecting hosts that would be affected by this update in some way.
Trying momentarily with any trusted USB - Ethernet or USB - WiFi device could help to diagnose something like that.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,953,018,481
RAC: 6,402,621
Level
Trp
Scientific publications
watwatwat
Message 58514 - Posted: 14 Mar 2022 | 17:29:42 UTC
Last modified: 14 Mar 2022 | 17:50:29 UTC

Solved, I hope. A couple of days ago I shut down my network and powered it from the router to the hub switch... That seemed to fix things. I thought maybe a glitchy computer messed up the address tables in the switches.
Today I woke to find a big list of completed WUs that would not upload. Moved cables but problem followed the computers and not the cables. Shut down and rebooted network. Didn't fix it. For some reason I opened my router gui and clicked on one of the 7 hung computers. For some reason it had turned the parental control to Deny Monday on a few of them. I clicked Allow Monday and they worked. A couple showed Allow All. These two required that I click Deny All, wait for it to set, and then click Allow All. Then they worked. First I had tried All Devices Allow All but that only timed out before it completed applying it.

Now I'm wondering if my Charter Spectrum WiFi router model Sagemcom F@st 5260 needs to be replaced.

Thinking of buying a WiFi router with 8 LAN ports so I can get rid of the Spectrum rental and my hub switch. Something like this but it's expensive:
https://www.newegg.com/asus-rt-ax88u-ca-ieee-802-11a-ieee-802-11b-ieee-802-11g-ieee-802-11n-ieee-802-11ac-ieee-802-11a/p/N82E16833320374?Item=9SIAD6H9RM4620&quicklink=true
Or maybe this reburished one that's much cheaper: https://www.newegg.com/tp-link-archer-c5400x/p/N82E16833704584?quicklink=true
Maybe having the hub switch is okay so I could use a 4 LAN port router. Or maybe there's a WiFi router that can be attached to a non-WiFi 8 LAN router.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1373
Credit: 7,998,731,143
RAC: 2,831,643
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58515 - Posted: 14 Mar 2022 | 18:12:23 UTC
Last modified: 14 Mar 2022 | 18:12:51 UTC

I have been having very slow website populations lately across the board so I just ran my two command line statements to refresh everything. Boom! now all the websites populate fast and normally.

sudo ip -s -s neigh flush all


sudo systemd-resolve --flush-caches

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1083
Credit: 40,330,187,595
RAC: 3,975,907
Level
Trp
Scientific publications
wat
Message 58516 - Posted: 14 Mar 2022 | 20:42:07 UTC - in response to Message 58514.

an old router could certainly be an issue and have all kinds of weird problems.

if you feel able, you could build your own router from an old PC. it doesn't require much power. pfSense is very robust and stable router software.

I run my router on pfSense using an Intel Atom 8-core processor and 8GB ECC RAM. which is more than enough to run my VPN and several packet related services in addition to all the routing functions.

the only downside is the initial configuration, and you'll need a switch and some access points for wifi. but the plus side is you get the flexibility to choose your own wifi access points and you can put them wherever you want. getting whole home wifi coverage is a lot easier with a few well placed APs than with a single consumer grade router that usually has bad radio/antenna properties and anemic hardware.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 584
Credit: 10,697,126,258
RAC: 15,337,675
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58517 - Posted: 14 Mar 2022 | 22:49:50 UTC - in response to Message 58514.

For some reason I opened my router gui and clicked on one of the 7 hung computers. For some reason it had turned the parental control to Deny Monday on a few of them. I clicked Allow Monday and they worked. A couple showed Allow All. These two required that I click Deny All, wait for it to set, and then click Allow All. Then they worked.

Glad that the mystery was solved, congratulations.

Most of the routers / access points have a management section with a configuration backup option.
I keep always an updated backup for all of these devices when everything is configured and working. I'd recommend this.
If I suspect that any parameter may have become altered, I restore configuration from backup, and doubt vanishes.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 584
Credit: 10,697,126,258
RAC: 15,337,675
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59960 - Posted: 21 Feb 2023 | 21:35:44 UTC

Always there is a first time

On a routine temperature screening of my working hosts, today I found a graphics card based on a Nvidia GT 1030 GPU showing 82 ºC
A normal Psensor screenshot for that host looks like this. Around 54 ºC at full load for GT 1030 GPU.
That ASUS PH-GT1030-O2G graphics card is a 30 Watts low power consumption one, so the abnormally high temperature must have any explanation...
...Jumping jack fan, for example?

Once the card on the table, it can be seen that the blades had completely detached from the fan's body.
Digging in my backup fan collection, I found a PWM fan directly compatible with the damaged one.
After a thorough heatsink clean and replacing the fan, the graphics card is ready to work again.

I had never found so far anything like this... But always there is a first time.

grep
Send message
Joined: 4 May 23
Posts: 3
Credit: 3,342,500
RAC: 0
Level
Ala
Scientific publications
wat
Message 60367 - Posted: 6 May 2023 | 3:32:38 UTC - in response to Message 58456.
Last modified: 6 May 2023 | 3:33:23 UTC

Woke today to find 8 Linux Mint 20.3 computers no longer able to communicate. When I put a head on them I can find nothing out of the ordinary except that it cannot communicate with the internet.
Strangely one computer can communicate over my LAN but cannot communicate with the internet.
Each has a static IP and I keep them updated.
I turned off all my switches and turned them back on starting from the hub. I've also rebooted several of them.
Any suggestions would be welcome. TIA


something like this happens to me occasionally and its almost always due to the network switch; if I unplug the power cable on the network switch and plug it back in, things usually go back to normal.

grep
Send message
Joined: 4 May 23
Posts: 3
Credit: 3,342,500
RAC: 0
Level
Ala
Scientific publications
wat
Message 60368 - Posted: 6 May 2023 | 3:49:48 UTC - in response to Message 58514.



Now I'm wondering if my Charter Spectrum WiFi router model Sagemcom F@st 5260 needs to be replaced.

Thinking of buying a WiFi router with 8 LAN ports so I can get rid of the Spectrum rental and my hub switch.


I took a different approach to this, as alluded to by others in this thread

My current setup looks like this;

- Netgate 1100 pfSense router https://shop.netgate.com/products/1100-pfsense
- some basic unmanaged network switches https://www.amazon.com/Ethernet-Splitter-Optimization-Unmanaged-TL-SG105/dp/B00A128S24/
- TP-Link AXE5400 Tri-Band WiFi 6E Router (Archer AXE75) https://www.amazon.com/gp/product/B0B3SQK74L/

The Netgate 1100 is the primary router for the apartment network. Its the first-party offering by the people who make pfsense, which I highly recommend.

The network switches extend the router's Ethernet coverage to all wired network devices.

The AXE5400 router is set to "Access Point Mode" and connected to one of the network switches, it provides the wifi coverage; I dont have a lot of space to cover so its plenty powerful. Its also kinda overkill, there are cheaper options available if all you need is a Wifi Access Point. Though I do appreciate the 6GHz coverage for the single device I own that currently supports it

If you are considering upgrades to your home network, then you might consider moving in the direction of splitting up the tasks between different devices, instead of just buying a single consumer grade router. The wireless coverage can come from a separate device than the Ethernet router, which will give you more options than searching for just a new "wifi router". Also, I chose the Netgate 1100 for this because I wanted to go with pfsense, but I did not want the hassle of having to piece together my own DIY pfsense router (there are tons of great YouTube videos out there about exactly this, highly recommended).

So far this setup is working well. I only have one device with BOINC, but I do have about 25 devices on the network, most with static IP's via the Netgate's DHCP server.

Robertobit
Send message
Joined: 29 Mar 20
Posts: 3
Credit: 695,184,248
RAC: 390,818
Level
Lys
Scientific publications
wat
Message 62099 - Posted: 30 Dec 2024 | 15:38:37 UTC

Hi, I would update my 1070 gpu 8 GB 256 bit and I’m looking to gpu efficiency. I’m’ thinking to 3050 6 gb TDP 70 w or 4060 TDP 115. Could you home me a suggestion? Is GPU bandwidth impacting or more CUDA is better? Many thanks have a nice day
____________

makracz
Send message
Joined: 9 May 24
Posts: 6
Credit: 3,000,131,725
RAC: 8,104,604
Level
Arg
Scientific publications
wat
Message 62100 - Posted: 30 Dec 2024 | 16:41:23 UTC - in response to Message 62099.

3050 6 GB would hardly be an improvement over 1070. You can have a look at my host: http://www.gpugrid.net/show_host_detail.php?hostid=627261

The 50xx series is due for release next month, so if you're not in a hurry you can wait and see if the 40xx series prices will fall.

Pascal
Send message
Joined: 15 Jul 20
Posts: 87
Credit: 2,131,553,398
RAC: 9,534,763
Level
Phe
Scientific publications
wat
Message 62101 - Posted: 30 Dec 2024 | 17:59:05 UTC - in response to Message 62099.
Last modified: 30 Dec 2024 | 18:02:01 UTC

rtx 4060 -gpu 8gb minimum ou les gpu serveurs comme rtx 2000 ada-rtx a2000 ou rtx 4000sff ada
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 584
Credit: 10,697,126,258
RAC: 15,337,675
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 62102 - Posted: 30 Dec 2024 | 19:05:14 UTC - in response to Message 62099.

I updated this year one of my GPUs to an RTX 3060 12 GB.
TDP 170W, but I have power limited it to 140W.
I would now choose some RTX 4060 or RTX 4060 Ti for better performance per watt
GeForce RTX 4060 Family

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1373
Credit: 7,998,731,143
RAC: 2,831,643
Level
Tyr
Scientific publications
watwatwatwatwat
Message 62103 - Posted: 30 Dec 2024 | 19:44:16 UTC

New, 50 series mid to low end cards won't be seen in the market for 3-6 months after release of 5080 and 5090 at CES next month.

You'll have a bit of a wait.

But benefit will be lower 40 series pricing over that time period.

KeithBriggs
Send message
Joined: 29 Aug 24
Posts: 37
Credit: 1,660,040,047
RAC: 16,186,461
Level
His
Scientific publications
wat
Message 62104 - Posted: 30 Dec 2024 | 22:29:25 UTC - in response to Message 62103.

I like my 4060 Ti's bang for the buck. It's a 3 fan card. There's no room for a second 3 fan card. If I got a 2 fan option of the same card, would boinc recognize both? If this is too far off topic, please just PM me.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1373
Credit: 7,998,731,143
RAC: 2,831,643
Level
Tyr
Scientific publications
watwatwatwatwat
Message 62105 - Posted: 31 Dec 2024 | 5:25:45 UTC - in response to Message 62104.
Last modified: 31 Dec 2024 | 5:30:07 UTC

Sure, no problem for BOINC. As long as you have configured

<use_all_gpus>1</use_all_gpus>

in the <Options> section of your cc_config.xml file and re-read your config files

in the Manager.

May not even be necessary as long as BOINC interprets both cards to be 4060 Ti's.

But sometimes same model card from different vendors don't get picked up as the same.

The statement in the cc_config.xml ensures both will be seen.

Verify they are both seen in the coproc_info.xml file to be sure.

KeithBriggs
Send message
Joined: 29 Aug 24
Posts: 37
Credit: 1,660,040,047
RAC: 16,186,461
Level
His
Scientific publications
wat
Message 62106 - Posted: 1 Jan 2025 | 6:05:30 UTC - in response to Message 62105.

Thanks Keith. Always a wealth of information. I will wait for MSI pricing to get real again. I found a site that compares cards for gaming:
https://www.techspot.com/review/2685-nvidia-geforce-rtx-4060-ti/

That has the 3070 beating it more times than not. Maybe everyone else knows it but it surprised me. Also not sure of "game vs. coprocessor" differences. I do like that the 4060 Ti runs at 98% on less than 120W.

Pascal
Send message
Joined: 15 Jul 20
Posts: 87
Credit: 2,131,553,398
RAC: 9,534,763
Level
Phe
Scientific publications
wat
Message 62107 - Posted: 1 Jan 2025 | 11:23:34 UTC - in response to Message 62106.

bonjour
tgp rtx 4060ti est de 160 watts.


https://www.techpowerup.com/gpu-specs/geforce-rtx-4060-ti-8-gb.c3890
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1373
Credit: 7,998,731,143
RAC: 2,831,643
Level
Tyr
Scientific publications
watwatwatwatwat
Message 62108 - Posted: 1 Jan 2025 | 18:54:38 UTC - in response to Message 62106.
Last modified: 1 Jan 2025 | 18:55:57 UTC

You can't really compare gaming benchmark table performance with how we use gpus in BOINC crunching.

Gaming is all about rasterization performance. With BOINC crunching we don't even use the rasterization portion of the gpu at all.

We only use the compute part of the gpu and in that regard, # of CUDA cores and memory bandwidth trump all.

So that is why the old 3070 is actually a better performer compared to the 4070 Ti with respect to BOINC gpu crunching.

The clocks got higher in the 4000 series, the core counts basically always stay the same in each generations echelon, but Nvidia keeps gimping the memory bandwidth each generation because of use of faster memory.

The rasterization bandwidth stays the same or improves, but it doesn't help the memory bandwidth of moving data into and out of the gpu when fed by the cpu.

Some projects could care less what the memory bandwidth is because a single task loads and unloads from the gpu in one shot. But some projects like this one and Einstein for example move a ton of data into and out of the gpu constantly.

In general, for BOINC gpu crunching you want to have at minimum PCIE X4 slots in use, X8 preferred, and have as wide a memory architecture in the card as possible.

The cards with 512, 384 or 256bit memory widths perform the best. HBM memory trounces GDDR memory completely. So the professional cards with that type of memory architecture perform the best and they also don't have gimped FP64 performance that Nvidia forces onto all the consumer cards.

All I can say is that the 4060 Ti is a more efficient card compared to a 3070. Each new generation of gpu silicon is always more efficient in power usage. So that helps out with the power bill.

My $0.02 of historical observation. Take it as you may.

KeithBriggs
Send message
Joined: 29 Aug 24
Posts: 37
Credit: 1,660,040,047
RAC: 16,186,461
Level
His
Scientific publications
wat
Message 62109 - Posted: 2 Jan 2025 | 2:00:32 UTC - in response to Message 62108.

Thanks pascal for confirming that the 3070 does beat the 4060 and 4060 Ti. Very interesting.

KeithM makes me think about and look up things that I never knew existed. I like dell's HBM card: NVIDIA® RTX™ A800, 40 GB HBM2, full height, PCIe 4.0x16, Graphics Card save the price. 18,000 USD. It runs on 240 Watts!

Fascinating about the memory swapping. I'm a snow bird and will likely build another system in Colorado and cart my gpu(s) with me. Current box is a resurrected 11yo i5 with ddr3 memory. For CO, I was considering an AMD Ryzen 5 5600X 6-core which of course would need ddr5 memory. I could also spring for a Ryzen 5 7600 or faster memory or both something else altogether?

Roughly what would a 40% increase in CPU and memory speed translate to for GPU coprocessing? It sounds like memory speed after the ACTUAL GPU is 2nd priority.

Pascal
Send message
Joined: 15 Jul 20
Posts: 87
Credit: 2,131,553,398
RAC: 9,534,763
Level
Phe
Scientific publications
wat
Message 62110 - Posted: 2 Jan 2025 | 5:27:53 UTC - in response to Message 62109.
Last modified: 2 Jan 2025 | 5:29:26 UTC

bonjour
je suis passé d'un i5 11400f avec 32 gigas de ddr4 2666 a un i9 14900 avec 96 gigas de ddr5 5600,je n'ai pas vu beaucoup de différences au niveau performance si ce n'est le nombre de thread du cpu.
la grosse différence de mon nouveau pc se trouve dans les 3 rtx 4000 sff ada qui ,je pense ,auraient fonctionner de la meme maniere avec mon ancien i5 et mon i9.
meme le changement de carte mere n'a pas beaucoup améliorer les performances(passage de pci express 4 en pci express 5).
Personnellement,je vous conseillerais d'investir surtout sur la carte graphique et moins sur la partie cpu-ram-carte mere sans tomber dans l'exces inverse.
N'importe quel pc milieu de gamme en ddr 5 devrait faire l'affaire.

hello
I went from an i5 11400f with 32 gigas of ddr4 2666 to an i9 14900 with 96 gigas of ddr5 5600, I did not see many differences in performance but the number of threads of the cpu.
the big difference of my new pc is in the 3 rtx 4000 sff ada which I think would work the same way with my old i5 and my i9.
even the change of mere card did not improve much performance (change from pci express 4 to pci express 5).
Personally, I would advise you to invest mainly on the graphics card and less on the cpu-ram-card part mere without falling into the opposite extreme.
Any mid-range pc in ddr 5 should do the trick.
____________

makracz
Send message
Joined: 9 May 24
Posts: 6
Credit: 3,000,131,725
RAC: 8,104,604
Level
Arg
Scientific publications
wat
Message 62111 - Posted: 2 Jan 2025 | 7:03:54 UTC - in response to Message 62106.

This website collects GPU/CPU statistics from Folding at Home
https://folding.lar.systems/gpu_ppd/overall_ranks
I think it's only from their alternative client users and of course FAH points system is different than what we have here, but still gives you a sense of performance difference between GPUs for similar scientific application.

For CO, I was considering an AMD Ryzen 5 5600X 6-core which of course would need ddr5 memory. I could also spring for a Ryzen 5 7600 or faster memory or both something else altogether?


I would buy a 7600 (or another 7 series) just because of the AM5 socket, which AMD has promised to keep ‘alive’ for a few more years.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2363
Credit: 16,532,464,161
RAC: 3,320,296
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 62112 - Posted: 2 Jan 2025 | 9:17:45 UTC - in response to Message 62109.

Roughly what would a 40% increase in CPU and memory speed translate to for GPU coprocessing? It sounds like memory speed after the ACTUAL GPU is 2nd priority.
Very little. (depending on the application and the workunit)
They were talking about the memory on the GPU itself.
For GPU crunching the ACTUAL GPU and its memory is the 1st priority.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1373
Credit: 7,998,731,143
RAC: 2,831,643
Level
Tyr
Scientific publications
watwatwatwatwat
Message 62113 - Posted: 2 Jan 2025 | 22:49:20 UTC
Last modified: 2 Jan 2025 | 22:52:58 UTC

With Einstein and to some extent here, the cpu core clocks matter for dropping the compute time down. The faster the memory load/unloads are the faster the task runs.

At Einstein for example on the O3AS gpu work units, a significant portion of runtime is accrued on the cpu when the gpu moves its 99% completed calculation results back onto the cpu because the FP64 gpu precision is insufficient for the science results to be validated.

So the faster cpu core clocks the better for that app. I can really see the difference between the same tasks run on 3080's between the slow 2Ghz clocks of the Epyc servers and the 5Ghz clocks of the Ryzen 9950X hosts.

But it always depends on how the application is written how it handles task computation on various types and classes of hardware.

KeithBriggs
Send message
Joined: 29 Aug 24
Posts: 37
Credit: 1,660,040,047
RAC: 16,186,461
Level
His
Scientific publications
wat
Message 62125 - Posted: 8 Jan 2025 | 21:43:50 UTC - in response to Message 62113.

Sorry for the delay. Was out of pocket. Thanks for all the help! I think I'm clear now. I hope others are aided as well. I'm looking forward to assembling a system in April in Colorado.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 584
Credit: 10,697,126,258
RAC: 15,337,675
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 62130 - Posted: 11 Jan 2025 | 22:40:35 UTC

It is not possible to win always

In mid-november last year, one of my hosts started to restart spontaneously.
First every other day, then daily, and finally several times a day.

I first checked electrical contacts for every computer components, with no result.

I usually have a spare PSU.
I replaced it, but problem reproduced with the new one.
I reverted the new PSU to its original package, and reinstalled the previous one.

Suspecting of some RAM module intermittently failing, I ordered at local market two new 16 GB modules.
When I received them and replaced, the problem persisted.

Affected system is a twin GPU host.
I tested with one only GPU, and then only with the other one, with no effect.
Computer continued to restart randomly.

I replaced the original 8-core CPU by an spare confident 6-core CPU I keep for tests, and nothing changed.

Then I suspected of something going wrong on the disk.
I purchased a new SATA SSD, cloned the original disk, and replaced it and SATA data cable by new ones.
The problem didn't solve!

I know what you're thinking of...
"He is being lazy to replace the motherboard"
Yes, you're right!
I usually keep this as the last resource, for being the most laborious task.
You have to remove every components from old motherboard, unfix it, fix the new one, rebuild every connections, reinstall components, renew thermal paste for CPU, config BIOS parameters...
I didn't find a compatible motherboard for my setup at local market, so I ordered a new one from abroad.
After a two-weeks pause I received it, I replaced, I installed every components, I started the system for the first time, and...
Computer restarted with no time enough to configure all BIOS parameters!!!
And then, it kept restarting every few seconds.

The first I thought was: "This system is haunted". But it is not very scientific
The second I thought was: "Well I've replaced EVERYTHING and the problem is unsolved. It is not possible to win always"

But then: "Hey!!! Really EVERYTHING?"
Wait a minute...
What's the problem?... The system is spontaneously restarting = resetting
It was a flash. I disconnected RESET terminals coming from computer case to motherboard.
And the system has not restarted a single time since then, more than two weeks past.
Conclusion: RESET line coming from chassis was producing the problem.
Measuring statically its impedance, it is as expected to be: short circuit when RESET button is pressed, and open circuit when it is not.
It must have been catching some kind of electrical interference strong enough to randomly activate motherboard's RESET input.
Amazing.

The mentioned system is this twin GPU host #557889

Additionally, with some of the components recovered from this affair, and few others, I was able to renew an old retired system to this new twin GPU host #604216

It is not possible to win always... But this time I did!

🤗️

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2363
Credit: 16,532,464,161
RAC: 3,320,296
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 62131 - Posted: 12 Jan 2025 | 0:43:21 UTC - in response to Message 62130.
Last modified: 12 Jan 2025 | 0:44:36 UTC

I had similar experience many years before. The microswitch was faulty in my case.
Beside electrical faults, or static electricity the deterioration and abrasion of the plastic buttons and button holes can make the button stuck in midway, causing random restarts or switch-offs.
Recently a DELL OptiPlex 3060 started acting weird. It took me a while to realize that I should not put the blame on Windows this time, so I run a RAM test. The original RAM stick made by SKhynix turned out to be faulty. It worked for 6 years. The other RAM stick (Kingston) is still working fine. I didn't have a failed RAM module in the last 10+ years, though I manage 100+ computers.
Back in the 80286 era we shipped a PC for a cheese wholesale company. We tested the PC thoroughly before we shipped it. They asked us to put it in the chilled area of their store-house. It worked for a day, then started acting weird. We moved it back to our office for testing, it worked fine. We put it in their office, it worked fine. We put it back to the chilled store-house, after a day it started acting weird again. We thought that it must be condensation, but there wasn't any. We spent at least a week swapping parts in and out, the problem persisted. We tested the AC input power and the DC output voltages with an oscilloscope for spikes, there wasn't any. We gave up, they gave up, and we put the PC in their office. It worked fine for 4 years then they've bought an upgrade. The upgrade showed similar symptoms in their chilled store-house. :) We never figured out what caused this behavior there.

Post to thread

Message boards : Number crunching : The hardware enthusiast's corner (2)

//