Server goes offline at random

Dell Inspiron 5675

  • AMD Ryzen 5 1400 quad core processor
  • 32 GB ram
  • 500GB NVME for OS
  • 4TB SSD for storage

Everything works great, login (Tor, Firefox) with LAN and remote.
Running Bitcoin Node, NextCloud, Core Lightning

Every 1 to 2 days the server turns off or stays running but no one home (cannot connect with Kiosk or headless). If I do a hard shutdown and restart it is back and going like nothing happened.

Are you leaving a monitor connected full time? You may try disconnecting the monitor, and reboot the server with no monitor attached. Give that a try, and let us know if it makes a difference.

Other things to check here are:
SSH into the server and capture sudo journalctl -xefa and see the latest thing that happens before it becomes unresponsive.

Is it actually completely shutting down?
Check that the output on your power supply matches the input on the plug of your machine.

1 Like

The system is headless

It is random. Just in case I cleaned the heat sink and applied new thermal paste. It is working so far

To make sure it’s not overheating, is a good place to start. If it does it again, power-cycle the device, and pull that log George suggested. That will give you some clues as to what’s going on.

Well it did it again…its about 24 hrs and goes dead (server appears to stays on). No ability to log on. setting up kiosk does not work (no power to usb or hdmi).

  • I did a hard reboot with kiosk set up and during startup the graphics card stopped running and was unable to use kiosk. I tried this several times with same result. However I can log in remotely.
  • I obviously have a bad graphics card.
  • Any recommendations on a simple card.
  • Can you open a terminal to SSH into server via remote access? If so, how?

I really appreciate yalls help. Overall, this has been a lot of fun…especially since I knew nothing about servers or really operating systems before this.

Hi there!

If you set up SSH before encountering this issue, you should try connecting via SSH to see if it works. If you have not then not much more You can do.

Thanks again to George, Rexter, and Homer for always helping the knuckle dragger.

I have successfully set up SSH and am in the journal.

I am expecting the server to shut down in the next few hours or so.

Once the server goes dead (on its own), will the SSH log show what happened historically once I log back into SSH after restart?

1 Like

The server shut completely off this time after approx 36 hours of run time.

Is there a way to see historical logs with SSH?

After restarting the server and connecting via SSH it is only showing activity from point I restarted.

Hi SBC!

First off, when this happens, check start.local or the server’s IP address in your browser to see if it’s in diagnostic mode. If you still get nothing, try SSH’ing in and run:

sudo journalctl -xefa

Make sure to do this before rebooting the server—you might get more info on what’s going on.

Is there a way to preserve/record the ssh log if the system goes down so it can be recalled after crash?

I did notice that in the BIOS, deep sleep control was active. Wonder if that caused the operating system to shut down over time. I disabled it.

That sounds like a good possibility. Not sure about all the possibilities with the logs beyond just reading them. If you scroll back through your system logs, look to see if there is any evidence from the previous crashes

I think my DIY server is snake bit…may need to start over from scratch

Now I am getting StartOS launch error:
Wifi Internal Error
wlp5s0: Error while getting interface flags: No such device

Update. Due the the instability in my system, I decided to take it apart, reformat all drives, cleaned everything, put it all back together and reload StartOS from scratch.

Everything seems to be good to go.

If I have a failure at this point, I am going to throw it all in a dumpster and buy a Start9 Server. :slight_smile:

2 Likes

Update.

System still becomes unresponsive anywhere from 12 to 24 hours.

Since my previous statement of throwing it in the round file, I have become curious as to the cause. I will still most likely buy a Start9 system but in the mean time I am till troubleshooting which may help someone else in the future.

  1. Pretty obvious to me that it is an issue with hardware or a Dell Inspirion BIOS problem (like timeout or sleep that does not like the StartOS processing)
  2. cleaned and new thermal paste CPU temp stable at 40 deg C (still became unresponsive)
  3. Replaced my Crucial 500GB M.2 NVMe with a WD 1TB NVMe for the StartOS and reflashed (still became unresponsive)…out $65
    4 I completely disconnected everything but bare bones. (still became unresponsive)
    a. led lights (yes it used to be gaming system)
    b. wifi card
    c. monitor, keyboard, mouse
    c. all front usb, card readers, optical drives, audio inputs, etc.
  4. Dell diagnostics telling me my Teams 4TB SDD (my data drive, not OS drive) is not completing self test. So something going on there. So I ran DiskGenius from USB boot and ran disk sector scan (4 hours) = all clear.
  5. Finally, I did notice that my 32GB (16GB x 2) RAM is DDR4 @ 3400 MHZ but my motherboard can only handle DDR4 @ 2400 MHZ. I do remember that someone said even if the mhz is over the system specs on the RAM, the motherboard will only run it at its limit so it should not be an issue.

Bottom line is I guess I replace the RAM. Found 32GB that meet the UDIMM specs on Amazon for only $48.

After all that and if the RAM replacement does not work, I think the only things it could be are:
1. AMD CPU glitch
2. Dell BIOS glitch
3. Dell Power source unstable

2 Likes

Thanks for documenting all your experimentation. If you do finally pinpoint the problem, let the community know. All the best…