Author |
Message |
|
The hardware enthusiast's corner (2)
It is customary at other project's forums for when threads become to long or unwieldy to close the original thread and create its successor thread with an enumeration typically.
Taking this Keith Myers wise advice, a new The hardware enthusiast's corner (2) thread is now opened.
Previous posts at the original thread keep being accessible at The hardware enthusiast's corner |
|
|
|
Assembling a computer for working BOINC is not only building it, but also maintaining for it being in perfect shape for hard and reliable crunching.
For me, this gives sense for "The hardware enthusiast's corner (2)" thread.
It's delightful that a self-built host ends up serving for something useful!!
My preferent BOINC project is Gpugrid, but I have several other backup projects for productively filling the ACEMD3 / Python tasks scarcity periods like previous weeks.
I've been recently notified by PrimeGrid about my Gpugrid Host #557889 being discoverer of an unknown Prime number large enough for entering the Top 5000 List in Chris Caldwell's The Largest Known Primes Database.
The number: 95635202^131072+1
Here, a recent image of the inside of the lucky host.
The harboring chassis was recovered from scrap to give it a second life... ✅
It's something that we can expect from a hardware enthusiast ;-) |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1373 Credit: 7,998,731,143 RAC: 2,831,643 Level
Scientific publications
|
Congratz on the new prime number discovery.
Looking forward to your insights and knowledge in the new thread. |
|
|
|
Upgrading system RAM, It's easy
If you're able to exchange your car's wheel without calling a tow truck, then, for sure, you're able to upgrade your system's RAM.
(And even if you aren't, I'd say that you're still able to upgrade your system's RAM ;-)
When new batches of ABOU Python GPU tasks arised, I got in problems with my two Multi-GPU hosts.
My triple GTX 1650 GPU system had installed 32 GB of system RAM, while my twin GTX 1650 GPU system had 16 GB RAM installed.
Grossly, each ABOU task needs 16 GB of system RAM to expand its environment. Therefore, I had to take care for my Triple GPU system not to get more than two ABOU simultaneous tasks, and my twin GPU system more than one...
So I decided to upgrade my triple GPU system from 32 to 64 GB RAM (and by carom, my twin GPU system from 16 to 32 GB RAM).
First of all, please, seat down a few minutes to study your move. Otherwise, you may get a nasty surprise...
As an example, take a look to Gigabyte B365M H motherboard specifications. Two of my hosts are based on it.
At Memory specifications section, it's said that maximum allowed RAM size is 2 x DDR4 @2666 MHz DIMMs up to a total of 32 GB system RAM.
This system can't handle 64 GB RAM. Installing 2 x 32 GB DIMMs will cause that system even won't start!
Here are specifications for the motherboard of the two systems to upgrade: Gigabyte Z390 UD
It can handle 4 x DDR4 @2666 MHz DIMMs up to a total of 128 GB system RAM, with maximum size of 32 GB DDR4 modules. Good!
I opened my hardware piggy bank and purchased 2 x DDR4 @2666 MHz 32 GB DIMMs.
The starting configuration for my triple GPU system was this 4 x DDR4 @2666 MHz 8 GB DIMMs = 32 GB total system RAM.
Upgrading is as simple as opening the computer's case, extract the four existing DIMMs, and install the two new ones.
Here one tip, based in my experience:
When I install a new slot-based component (memory DIMM, graphics card, expansion card) I insert and extract and reinsert it three times.
This is what I call the slot and the module contacts "becoming friends". This usually will prevent present and future problems due to poor electric contacts.
If modules refuse to enter, try turning them by 180º... They are "mechanically codified".
Always verify at the end that DIMM is fully inserted into the slot, and lateral latches are all the way closed.
When in doubt, there is usually a section on motherboard's manual explaining how the memory modules are to be installed depending on each model.
Now, the new system's memory section looks like this.
As you can see, I've installed both memory modules at slots of the same (grey) color. Please, take care of doing this way for taking advantage of dual channel memory performance enhancement.
Also it is described at each motherboard user manual when multi-channel memory architecture is available.
And the four extracted DIMM modules from previous system, are now re-used to replace the existing 16 GB RAM at my twin GPU system to upgrade to 32 GB.
Here is its final new look.
After that, My triple GPU system has been able to successfully process three concurrent ABOU tasks without worrying about a lack of system RAM.
This is a screenshot of BOINC Manager while they were processing.
Also I took a nvidia-smi screenshot.
And the following Psensor screenshot, where the typical "GPU spikes" caused by the learning agents can be appreciated:
Even a 6% of system RAM remains still available at this full load situation.
In the same way, my twin GPU system is able to process two concurrent ABOU tasks without worrying about it.
It was worth it! 👍️ |
|
|
Aurum Send message
Joined: 12 Jul 17 Posts: 401 Credit: 16,953,018,481 RAC: 6,402,621 Level
Scientific publications
|
Woke today to find 8 Linux Mint 20.3 computers no longer able to communicate. When I put a head on them I can find nothing out of the ordinary except that it cannot communicate with the internet.
Strangely one computer can communicate over my LAN but cannot communicate with the internet.
Each has a static IP and I keep them updated.
I turned off all my switches and turned them back on starting from the hub. I've also rebooted several of them.
Any suggestions would be welcome. TIA |
|
|
|
For the commoner DHCP-allocated IP addresses, you don't just get an IP address. You also need (and get):
Subnet mask
Default gateway
DNS server
Of those three, the most likely culprit is the default gateway. Have you changed or reconfigured your router recently? |
|
|
Aurum Send message
Joined: 12 Jul 17 Posts: 401 Credit: 16,953,018,481 RAC: 6,402,621 Level
Scientific publications
|
For the commoner DHCP-allocated IP addresses, you don't just get an IP address. You also need (and get):
Subnet mask (255.255.255.0)
Default gateway (192.168.1.1, just checked and it still has this address)
DNS server (8.8.4.4,8.8.8.8)
Of those three, the most likely culprit is the default gateway. Have you changed or reconfigured your router recently? No changes to router or gateway but I did try rebooting them.
If the default gateway had changed then all computers would lose connection but 30 are still running fine.
All my computers are on a wired ethernet with the motherboard RJ-45 status lights on. I did unplug and plug the cables back in.
I ran Advanced IP Scanner looking for evidence of duplicate IP addresses even though I'd made no changes.
My wife and kids use DHCP wireless connections for their handbrains and laptops. All my BOINC computers have static IP addresses. DHCP has 192.168.1.2-99 available and I've never seen them try to use higher numbers.
Most of my gear is long in the tooth so every month or two I scrap out a PSU or a motherboard. One or two failing wouldn't have even gotten my attention but 8 with one that can communicate on the LAN but not the WAN is strange.
The deaf computers are split between two unmanaged 24-port switches. I'll try turning off all switches, router and gateway and powering up from gateway, to router to hub switch, then the other switches. If that doesn't bring them back to life I'll just turn them off until I can attempt a fresh build. But I'd sure welcome a better suggestion. |
|
|
|
check the cables too, just as a quick check. or even specific ports on the switch could go bad. try a different port or known/good port.
what is the result of 'ifconfig' from a terminal? or 'ip a' if you don't have the ifconfig package installed
____________
|
|
|
|
I connect to the Internet at home by means of a fiber optics line from my supplier.
From time to time, I lose the whole Internet connection with no apparent reason (not your case, a partial loss)
I then switch off everything but Ethernet switches, and start booting the Optical Node Terminal (ONT), then the Internet Router, and finally the WiFi access points. This usually solves my problem.
Recently, I noticed a linux-firmware update. If Linux OS, you could check for a common NIC at non-connecting hosts that would be affected by this update in some way.
Trying momentarily with any trusted USB - Ethernet or USB - WiFi device could help to diagnose something like that. |
|
|
Aurum Send message
Joined: 12 Jul 17 Posts: 401 Credit: 16,953,018,481 RAC: 6,402,621 Level
Scientific publications
|
Solved, I hope. A couple of days ago I shut down my network and powered it from the router to the hub switch... That seemed to fix things. I thought maybe a glitchy computer messed up the address tables in the switches.
Today I woke to find a big list of completed WUs that would not upload. Moved cables but problem followed the computers and not the cables. Shut down and rebooted network. Didn't fix it. For some reason I opened my router gui and clicked on one of the 7 hung computers. For some reason it had turned the parental control to Deny Monday on a few of them. I clicked Allow Monday and they worked. A couple showed Allow All. These two required that I click Deny All, wait for it to set, and then click Allow All. Then they worked. First I had tried All Devices Allow All but that only timed out before it completed applying it.
Now I'm wondering if my Charter Spectrum WiFi router model Sagemcom F@st 5260 needs to be replaced.
Thinking of buying a WiFi router with 8 LAN ports so I can get rid of the Spectrum rental and my hub switch. Something like this but it's expensive:
https://www.newegg.com/asus-rt-ax88u-ca-ieee-802-11a-ieee-802-11b-ieee-802-11g-ieee-802-11n-ieee-802-11ac-ieee-802-11a/p/N82E16833320374?Item=9SIAD6H9RM4620&quicklink=true
Or maybe this reburished one that's much cheaper: https://www.newegg.com/tp-link-archer-c5400x/p/N82E16833704584?quicklink=true
Maybe having the hub switch is okay so I could use a 4 LAN port router. Or maybe there's a WiFi router that can be attached to a non-WiFi 8 LAN router. |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1373 Credit: 7,998,731,143 RAC: 2,831,643 Level
Scientific publications
|
I have been having very slow website populations lately across the board so I just ran my two command line statements to refresh everything. Boom! now all the websites populate fast and normally.
sudo ip -s -s neigh flush all
sudo systemd-resolve --flush-caches |
|
|
|
an old router could certainly be an issue and have all kinds of weird problems.
if you feel able, you could build your own router from an old PC. it doesn't require much power. pfSense is very robust and stable router software.
I run my router on pfSense using an Intel Atom 8-core processor and 8GB ECC RAM. which is more than enough to run my VPN and several packet related services in addition to all the routing functions.
the only downside is the initial configuration, and you'll need a switch and some access points for wifi. but the plus side is you get the flexibility to choose your own wifi access points and you can put them wherever you want. getting whole home wifi coverage is a lot easier with a few well placed APs than with a single consumer grade router that usually has bad radio/antenna properties and anemic hardware.
____________
|
|
|
|
For some reason I opened my router gui and clicked on one of the 7 hung computers. For some reason it had turned the parental control to Deny Monday on a few of them. I clicked Allow Monday and they worked. A couple showed Allow All. These two required that I click Deny All, wait for it to set, and then click Allow All. Then they worked.
Glad that the mystery was solved, congratulations.
Most of the routers / access points have a management section with a configuration backup option.
I keep always an updated backup for all of these devices when everything is configured and working. I'd recommend this.
If I suspect that any parameter may have become altered, I restore configuration from backup, and doubt vanishes. |
|
|
|
Always there is a first time
On a routine temperature screening of my working hosts, today I found a graphics card based on a Nvidia GT 1030 GPU showing 82 ºC
A normal Psensor screenshot for that host looks like this. Around 54 ºC at full load for GT 1030 GPU.
That ASUS PH-GT1030-O2G graphics card is a 30 Watts low power consumption one, so the abnormally high temperature must have any explanation...
...Jumping jack fan, for example?
Once the card on the table, it can be seen that the blades had completely detached from the fan's body.
Digging in my backup fan collection, I found a PWM fan directly compatible with the damaged one.
After a thorough heatsink clean and replacing the fan, the graphics card is ready to work again.
I had never found so far anything like this... But always there is a first time. |
|
|
grepSend message
Joined: 4 May 23 Posts: 3 Credit: 3,342,500 RAC: 0 Level
Scientific publications
|
Woke today to find 8 Linux Mint 20.3 computers no longer able to communicate. When I put a head on them I can find nothing out of the ordinary except that it cannot communicate with the internet.
Strangely one computer can communicate over my LAN but cannot communicate with the internet.
Each has a static IP and I keep them updated.
I turned off all my switches and turned them back on starting from the hub. I've also rebooted several of them.
Any suggestions would be welcome. TIA
something like this happens to me occasionally and its almost always due to the network switch; if I unplug the power cable on the network switch and plug it back in, things usually go back to normal. |
|
|
grepSend message
Joined: 4 May 23 Posts: 3 Credit: 3,342,500 RAC: 0 Level
Scientific publications
|
Now I'm wondering if my Charter Spectrum WiFi router model Sagemcom F@st 5260 needs to be replaced.
Thinking of buying a WiFi router with 8 LAN ports so I can get rid of the Spectrum rental and my hub switch.
I took a different approach to this, as alluded to by others in this thread
My current setup looks like this;
- Netgate 1100 pfSense router https://shop.netgate.com/products/1100-pfsense
- some basic unmanaged network switches https://www.amazon.com/Ethernet-Splitter-Optimization-Unmanaged-TL-SG105/dp/B00A128S24/
- TP-Link AXE5400 Tri-Band WiFi 6E Router (Archer AXE75) https://www.amazon.com/gp/product/B0B3SQK74L/
The Netgate 1100 is the primary router for the apartment network. Its the first-party offering by the people who make pfsense, which I highly recommend.
The network switches extend the router's Ethernet coverage to all wired network devices.
The AXE5400 router is set to "Access Point Mode" and connected to one of the network switches, it provides the wifi coverage; I dont have a lot of space to cover so its plenty powerful. Its also kinda overkill, there are cheaper options available if all you need is a Wifi Access Point. Though I do appreciate the 6GHz coverage for the single device I own that currently supports it
If you are considering upgrades to your home network, then you might consider moving in the direction of splitting up the tasks between different devices, instead of just buying a single consumer grade router. The wireless coverage can come from a separate device than the Ethernet router, which will give you more options than searching for just a new "wifi router". Also, I chose the Netgate 1100 for this because I wanted to go with pfsense, but I did not want the hassle of having to piece together my own DIY pfsense router (there are tons of great YouTube videos out there about exactly this, highly recommended).
So far this setup is working well. I only have one device with BOINC, but I do have about 25 devices on the network, most with static IP's via the Netgate's DHCP server. |
|
|
|
Hi, I would update my 1070 gpu 8 GB 256 bit and I’m looking to gpu efficiency. I’m’ thinking to 3050 6 gb TDP 70 w or 4060 TDP 115. Could you home me a suggestion? Is GPU bandwidth impacting or more CUDA is better? Many thanks have a nice day
____________
|
|
|
makraczSend message
Joined: 9 May 24 Posts: 6 Credit: 3,000,131,725 RAC: 8,104,604 Level
Scientific publications
|
3050 6 GB would hardly be an improvement over 1070. You can have a look at my host: http://www.gpugrid.net/show_host_detail.php?hostid=627261
The 50xx series is due for release next month, so if you're not in a hurry you can wait and see if the 40xx series prices will fall. |
|
|
PascalSend message
Joined: 15 Jul 20 Posts: 87 Credit: 2,131,553,398 RAC: 9,534,763 Level
Scientific publications
|
rtx 4060 -gpu 8gb minimum ou les gpu serveurs comme rtx 2000 ada-rtx a2000 ou rtx 4000sff ada
____________
|
|
|
|
I updated this year one of my GPUs to an RTX 3060 12 GB.
TDP 170W, but I have power limited it to 140W.
I would now choose some RTX 4060 or RTX 4060 Ti for better performance per watt
GeForce RTX 4060 Family |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1373 Credit: 7,998,731,143 RAC: 2,831,643 Level
Scientific publications
|
New, 50 series mid to low end cards won't be seen in the market for 3-6 months after release of 5080 and 5090 at CES next month.
You'll have a bit of a wait.
But benefit will be lower 40 series pricing over that time period. |
|
|
|
I like my 4060 Ti's bang for the buck. It's a 3 fan card. There's no room for a second 3 fan card. If I got a 2 fan option of the same card, would boinc recognize both? If this is too far off topic, please just PM me. |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1373 Credit: 7,998,731,143 RAC: 2,831,643 Level
Scientific publications
|
Sure, no problem for BOINC. As long as you have configured
<use_all_gpus>1</use_all_gpus>
in the <Options> section of your cc_config.xml file and re-read your config files
in the Manager.
May not even be necessary as long as BOINC interprets both cards to be 4060 Ti's.
But sometimes same model card from different vendors don't get picked up as the same.
The statement in the cc_config.xml ensures both will be seen.
Verify they are both seen in the coproc_info.xml file to be sure. |
|
|
|
Thanks Keith. Always a wealth of information. I will wait for MSI pricing to get real again. I found a site that compares cards for gaming:
https://www.techspot.com/review/2685-nvidia-geforce-rtx-4060-ti/
That has the 3070 beating it more times than not. Maybe everyone else knows it but it surprised me. Also not sure of "game vs. coprocessor" differences. I do like that the 4060 Ti runs at 98% on less than 120W. |
|
|
PascalSend message
Joined: 15 Jul 20 Posts: 87 Credit: 2,131,553,398 RAC: 9,534,763 Level
Scientific publications
|
bonjour
tgp rtx 4060ti est de 160 watts.
https://www.techpowerup.com/gpu-specs/geforce-rtx-4060-ti-8-gb.c3890
____________
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1373 Credit: 7,998,731,143 RAC: 2,831,643 Level
Scientific publications
|
You can't really compare gaming benchmark table performance with how we use gpus in BOINC crunching.
Gaming is all about rasterization performance. With BOINC crunching we don't even use the rasterization portion of the gpu at all.
We only use the compute part of the gpu and in that regard, # of CUDA cores and memory bandwidth trump all.
So that is why the old 3070 is actually a better performer compared to the 4070 Ti with respect to BOINC gpu crunching.
The clocks got higher in the 4000 series, the core counts basically always stay the same in each generations echelon, but Nvidia keeps gimping the memory bandwidth each generation because of use of faster memory.
The rasterization bandwidth stays the same or improves, but it doesn't help the memory bandwidth of moving data into and out of the gpu when fed by the cpu.
Some projects could care less what the memory bandwidth is because a single task loads and unloads from the gpu in one shot. But some projects like this one and Einstein for example move a ton of data into and out of the gpu constantly.
In general, for BOINC gpu crunching you want to have at minimum PCIE X4 slots in use, X8 preferred, and have as wide a memory architecture in the card as possible.
The cards with 512, 384 or 256bit memory widths perform the best. HBM memory trounces GDDR memory completely. So the professional cards with that type of memory architecture perform the best and they also don't have gimped FP64 performance that Nvidia forces onto all the consumer cards.
All I can say is that the 4060 Ti is a more efficient card compared to a 3070. Each new generation of gpu silicon is always more efficient in power usage. So that helps out with the power bill.
My $0.02 of historical observation. Take it as you may. |
|
|
|
Thanks pascal for confirming that the 3070 does beat the 4060 and 4060 Ti. Very interesting.
KeithM makes me think about and look up things that I never knew existed. I like dell's HBM card: NVIDIA® RTX™ A800, 40 GB HBM2, full height, PCIe 4.0x16, Graphics Card save the price. 18,000 USD. It runs on 240 Watts!
Fascinating about the memory swapping. I'm a snow bird and will likely build another system in Colorado and cart my gpu(s) with me. Current box is a resurrected 11yo i5 with ddr3 memory. For CO, I was considering an AMD Ryzen 5 5600X 6-core which of course would need ddr5 memory. I could also spring for a Ryzen 5 7600 or faster memory or both something else altogether?
Roughly what would a 40% increase in CPU and memory speed translate to for GPU coprocessing? It sounds like memory speed after the ACTUAL GPU is 2nd priority.
|
|
|
PascalSend message
Joined: 15 Jul 20 Posts: 87 Credit: 2,131,553,398 RAC: 9,534,763 Level
Scientific publications
|
bonjour
je suis passé d'un i5 11400f avec 32 gigas de ddr4 2666 a un i9 14900 avec 96 gigas de ddr5 5600,je n'ai pas vu beaucoup de différences au niveau performance si ce n'est le nombre de thread du cpu.
la grosse différence de mon nouveau pc se trouve dans les 3 rtx 4000 sff ada qui ,je pense ,auraient fonctionner de la meme maniere avec mon ancien i5 et mon i9.
meme le changement de carte mere n'a pas beaucoup améliorer les performances(passage de pci express 4 en pci express 5).
Personnellement,je vous conseillerais d'investir surtout sur la carte graphique et moins sur la partie cpu-ram-carte mere sans tomber dans l'exces inverse.
N'importe quel pc milieu de gamme en ddr 5 devrait faire l'affaire.
hello
I went from an i5 11400f with 32 gigas of ddr4 2666 to an i9 14900 with 96 gigas of ddr5 5600, I did not see many differences in performance but the number of threads of the cpu.
the big difference of my new pc is in the 3 rtx 4000 sff ada which I think would work the same way with my old i5 and my i9.
even the change of mere card did not improve much performance (change from pci express 4 to pci express 5).
Personally, I would advise you to invest mainly on the graphics card and less on the cpu-ram-card part mere without falling into the opposite extreme.
Any mid-range pc in ddr 5 should do the trick.
____________
|
|
|
makraczSend message
Joined: 9 May 24 Posts: 6 Credit: 3,000,131,725 RAC: 8,104,604 Level
Scientific publications
|
This website collects GPU/CPU statistics from Folding at Home
https://folding.lar.systems/gpu_ppd/overall_ranks
I think it's only from their alternative client users and of course FAH points system is different than what we have here, but still gives you a sense of performance difference between GPUs for similar scientific application.
For CO, I was considering an AMD Ryzen 5 5600X 6-core which of course would need ddr5 memory. I could also spring for a Ryzen 5 7600 or faster memory or both something else altogether?
I would buy a 7600 (or another 7 series) just because of the AM5 socket, which AMD has promised to keep ‘alive’ for a few more years. |
|
|
|
Roughly what would a 40% increase in CPU and memory speed translate to for GPU coprocessing? It sounds like memory speed after the ACTUAL GPU is 2nd priority. Very little. (depending on the application and the workunit)
They were talking about the memory on the GPU itself.
For GPU crunching the ACTUAL GPU and its memory is the 1st priority.
|
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1373 Credit: 7,998,731,143 RAC: 2,831,643 Level
Scientific publications
|
With Einstein and to some extent here, the cpu core clocks matter for dropping the compute time down. The faster the memory load/unloads are the faster the task runs.
At Einstein for example on the O3AS gpu work units, a significant portion of runtime is accrued on the cpu when the gpu moves its 99% completed calculation results back onto the cpu because the FP64 gpu precision is insufficient for the science results to be validated.
So the faster cpu core clocks the better for that app. I can really see the difference between the same tasks run on 3080's between the slow 2Ghz clocks of the Epyc servers and the 5Ghz clocks of the Ryzen 9950X hosts.
But it always depends on how the application is written how it handles task computation on various types and classes of hardware. |
|
|
|
Sorry for the delay. Was out of pocket. Thanks for all the help! I think I'm clear now. I hope others are aided as well. I'm looking forward to assembling a system in April in Colorado. |
|
|
|
It is not possible to win always
In mid-november last year, one of my hosts started to restart spontaneously.
First every other day, then daily, and finally several times a day.
I first checked electrical contacts for every computer components, with no result.
I usually have a spare PSU.
I replaced it, but problem reproduced with the new one.
I reverted the new PSU to its original package, and reinstalled the previous one.
Suspecting of some RAM module intermittently failing, I ordered at local market two new 16 GB modules.
When I received them and replaced, the problem persisted.
Affected system is a twin GPU host.
I tested with one only GPU, and then only with the other one, with no effect.
Computer continued to restart randomly.
I replaced the original 8-core CPU by an spare confident 6-core CPU I keep for tests, and nothing changed.
Then I suspected of something going wrong on the disk.
I purchased a new SATA SSD, cloned the original disk, and replaced it and SATA data cable by new ones.
The problem didn't solve!
I know what you're thinking of...
"He is being lazy to replace the motherboard"
Yes, you're right!
I usually keep this as the last resource, for being the most laborious task.
You have to remove every components from old motherboard, unfix it, fix the new one, rebuild every connections, reinstall components, renew thermal paste for CPU, config BIOS parameters...
I didn't find a compatible motherboard for my setup at local market, so I ordered a new one from abroad.
After a two-weeks pause I received it, I replaced, I installed every components, I started the system for the first time, and...
Computer restarted with no time enough to configure all BIOS parameters!!!
And then, it kept restarting every few seconds.
The first I thought was: "This system is haunted". But it is not very scientific
The second I thought was: "Well I've replaced EVERYTHING and the problem is unsolved. It is not possible to win always"
But then: "Hey!!! Really EVERYTHING?"
Wait a minute...
What's the problem?... The system is spontaneously restarting = resetting
It was a flash. I disconnected RESET terminals coming from computer case to motherboard.
And the system has not restarted a single time since then, more than two weeks past.
Conclusion: RESET line coming from chassis was producing the problem.
Measuring statically its impedance, it is as expected to be: short circuit when RESET button is pressed, and open circuit when it is not.
It must have been catching some kind of electrical interference strong enough to randomly activate motherboard's RESET input.
Amazing.
The mentioned system is this twin GPU host #557889
Additionally, with some of the components recovered from this affair, and few others, I was able to renew an old retired system to this new twin GPU host #604216
It is not possible to win always... But this time I did!
🤗️ |
|
|
|
I had similar experience many years before. The microswitch was faulty in my case.
Beside electrical faults, or static electricity the deterioration and abrasion of the plastic buttons and button holes can make the button stuck in midway, causing random restarts or switch-offs.
Recently a DELL OptiPlex 3060 started acting weird. It took me a while to realize that I should not put the blame on Windows this time, so I run a RAM test. The original RAM stick made by SKhynix turned out to be faulty. It worked for 6 years. The other RAM stick (Kingston) is still working fine. I didn't have a failed RAM module in the last 10+ years, though I manage 100+ computers.
Back in the 80286 era we shipped a PC for a cheese wholesale company. We tested the PC thoroughly before we shipped it. They asked us to put it in the chilled area of their store-house. It worked for a day, then started acting weird. We moved it back to our office for testing, it worked fine. We put it in their office, it worked fine. We put it back to the chilled store-house, after a day it started acting weird again. We thought that it must be condensation, but there wasn't any. We spent at least a week swapping parts in and out, the problem persisted. We tested the AC input power and the DC output voltages with an oscilloscope for spikes, there wasn't any. We gave up, they gave up, and we put the PC in their office. It worked fine for 4 years then they've bought an upgrade. The upgrade showed similar symptoms in their chilled store-house. :) We never figured out what caused this behavior there. |
|
|