| View previous topic :: View next topic |
| Author |
Message |
OnlyTheTony LXF regular

Joined: Mon Jan 08, 2007 11:51 am Posts: 303
|
Posted: Sun Jul 11, 2010 4:15 pm Post subject: [SOLVED] Infuriating network problem |
|
|
I've recently updated my home server from OpenSuse 10.3 to Lucid Server.
I replaced the motherboard with an Intel DQ35JO that I had lying around. I've dropped in a Core 2 Quad and 4Gb RAM.
Additionally I added a 3Ware/AMCC 9650-2LP RAID controller for the disks, which runs off one of the PCI-E x1 slots.
From the "old" server I brought across an Intel Pro 1000 PT Dual Gigabit PCI-E Server Adaptor which is utilising the mobo's PCI-E x16 slot. On the old motherboard (an Asus) this worked without issue.
The problem I'm having is that the NIC seems to be going into some kind of sleep mode once other computers on the network disconnect. Booting up any of the machines directly attached to the same gigabit hub (a Netgear ProSafe GS116) reinstates the connection.
The server logs don't show any indication that that the network link is dropping at any point - it just seems to be waiting for a signal from any LAN-connected device. If I try connect using any wireless devices through the wireless router (Netgear DGN2000) I get no response.
Once it's up and running it works flawlessly - but I didn't have any of these problems under OpenSuse with the older motherboard.
It helps - the hub is connected to the router by a single cable. All other wired network connections are made through the gigabit hub. Internet and wireless links are made through the router via the single link.
The server is used for web (public facing development server), emails (SMTP/IMAP), NFS/Samba (internal network only) and VPN.
It really is driving me mad - I'm considering replacing the motherboard to see if that has any effect so any help you guys could give would be REALLY appreciated.
T.
Edit: If forgot to mention it's using the 1.0.2-k2 driver. I've downloaded the latest e1000e driver 1.2.8 - I'll install that later and see if it makes a difference. I'll post the result on here in case anyone else has a similar problem.... _________________ If at first you don't succeed, call it v1.0
Last edited by OnlyTheTony on Tue Jul 20, 2010 9:59 am; edited 1 time in total |
|
| Back to top |
|
 |
Dutch_Master LXF regular
Joined: Tue Mar 27, 2007 2:49 am Posts: 2358
|
Posted: Mon Jul 12, 2010 1:05 am Post subject: |
|
|
Things to consider:
1) static IP, no DHCP
2) longer lease times
3) new kernel
My tuppence  |
|
| Back to top |
|
 |
ollie Moderator

Joined: Mon Jul 25, 2005 12:26 pm Posts: 2749 Location: Bathurst NSW Australia
|
Posted: Mon Jul 12, 2010 12:14 pm Post subject: |
|
|
Check the BIOS for power settings and check the power management settings in YaST to ensure the network Wake On LAN (WOL) is turned off. This is what shuts down the ethernet connection.
Ref: http://www.lesswatts.org/tips/ethernet.php |
|
| Back to top |
|
 |
OnlyTheTony LXF regular

Joined: Mon Jan 08, 2007 11:51 am Posts: 303
|
Posted: Mon Jul 12, 2010 1:47 pm Post subject: |
|
|
Thanks for your answers guys.
It turns out it was much more simple(?).
Intel motherboards have a bios-based "lights out" management system "Intel ME" which prioritises the onboard LAN adapter - for obvious reasons. Once I disabled "Intel ME" and switched the onboard LAN off it worked a treat. The connection has been fine ever since!
Can't believe I wasted a week trying to fix that!!! _________________ If at first you don't succeed, call it v1.0 |
|
| Back to top |
|
 |
OnlyTheTony LXF regular

Joined: Mon Jan 08, 2007 11:51 am Posts: 303
|
Posted: Tue Jul 13, 2010 1:51 pm Post subject: |
|
|
Okay.. I was wrong.
Even updating the driver to 1.1.2 (1.2.8 wouldn't build) hasn't solved the problem. Shortly after network connections are removed (either imap connection or NFS) the server's network connection just drops. There's nothing in the syslogs or even dmeg. Only re-establishing a connection from another desktop machine restarts it!
Dutch master - thanks for your input but it's already running on a static IP and I've updated the kernel to the latest ones in the repos.
The network hardware is the same as I used under opensuse 10.3 - so it's either an Ubuntu bug or a problem with the motherboard (which is nearly 2 years old so it's possible).
I'm open to any other suggestions here!! _________________ If at first you don't succeed, call it v1.0 |
|
| Back to top |
|
 |
wyliecoyoteuk LXF regular

Joined: Sun Apr 10, 2005 11:41 pm Posts: 3369 Location: Birmingham, UK
|
Posted: Tue Jul 13, 2010 3:07 pm Post subject: |
|
|
Could be a keepalive issue?
http://en.wikipedia.org/wiki/Keepalive _________________ The sig between the asterisks is so cool that only REALLY COOL people can even see it!
*************** ************ |
|
| Back to top |
|
 |
OnlyTheTony LXF regular

Joined: Mon Jan 08, 2007 11:51 am Posts: 303
|
Posted: Wed Jul 14, 2010 3:57 pm Post subject: |
|
|
It could be.
I've been doing some research and there were a lot of threads about the e1000e driver closing the connection - as of yet nobody's posted a solution.
To see whether it's an OS or driver issue I've deactivated the (expensive) Intel adapter and I'm trying the onboard gigabit LAN to see if that maintains the connection overnight.
If it is a keepalive issue what would I look for and how could I get around it? _________________ If at first you don't succeed, call it v1.0 |
|
| Back to top |
|
 |
nelz Moderator

Joined: Mon Apr 04, 2005 12:52 pm Posts: 8036 Location: Warrington, UK
|
Posted: Wed Jul 14, 2010 4:16 pm Post subject: |
|
|
Why are you using an external driver and not the e1000e driver in the kernel? _________________ "Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein) |
|
| Back to top |
|
 |
OnlyTheTony LXF regular

Joined: Mon Jan 08, 2007 11:51 am Posts: 303
|
Posted: Wed Jul 14, 2010 8:08 pm Post subject: |
|
|
Because the e1000e driver in the kernel was outdated so I installed a new version in the hope it would solve the timeout problem.
It didn't.
The thing is the problem still exists whether I use the Intel LAN adapter or the realtek onboard - which makes me suspect it's an issue with Ubuntu rather than any of the hardware or drivers.
I'm considering ditching Ubuntu for something like CentOS to see if this removes the problem.
It's a total pain - I never had this problem under opensuse 10.3 - the only reason I "upgraded" was because the install failed and I thought I'd take the opportunity to rebuild. I wish I hadn't.
If it's any help - dmesg returns:
[37402.040022] NETDEV WATCHDOG: eth3 (r8169): transmit queue 0 timed out
and the last few lines are:
[37402.080072] r8169: eth3: link up
[37426.080071] r8169: eth3: link up
[37456.080069] r8169: eth3: link up
[37498.080070] r8169: eth3: link up
[37540.080057] r8169: eth3: link up
[37582.080064] r8169: eth3: link up
[37624.080063] r8169: eth3: link up
[37666.080065] r8169: eth3: link up
[37708.080068] r8169: eth3: link up
[37750.080066] r8169: eth3: link up
As you can see it's not reporting the link as being down - just constantly coming back up.
My internet connection keeps dropping for some weird reason and I'm also wondering if the two are linked. I'm frustrated and confused. _________________ If at first you don't succeed, call it v1.0 |
|
| Back to top |
|
 |
wyliecoyoteuk LXF regular

Joined: Sun Apr 10, 2005 11:41 pm Posts: 3369 Location: Birmingham, UK
|
Posted: Wed Jul 14, 2010 8:42 pm Post subject: |
|
|
Seem to be a lot of posts saying that this is an APIC related bug.
https://bugs.launchpad.net/ubuntu/+source/grub/+bug/574281 _________________ The sig between the asterisks is so cool that only REALLY COOL people can even see it!
*************** ************ |
|
| Back to top |
|
 |
OnlyTheTony LXF regular

Joined: Mon Jan 08, 2007 11:51 am Posts: 303
|
Posted: Wed Jul 14, 2010 8:59 pm Post subject: |
|
|
Wylie, doing more digging I've come across that too. I've added "noapic" to the boot parameters and restarted. I've also reinstated the Intel card - I'll see how that goes....
Edit: Further research indicates that kernel 2.3.34 doesn't have this problem - so just waiting for that to hit the repos now. _________________ If at first you don't succeed, call it v1.0 |
|
| Back to top |
|
 |
OnlyTheTony LXF regular

Joined: Mon Jan 08, 2007 11:51 am Posts: 303
|
Posted: Thu Jul 15, 2010 11:38 am Post subject: |
|
|
Still doing it!
I've decided to backup everything and switch distros to CentOS over the weekend as the errors I'm getting on Ubuntu don't seem to be present on that!
Fingers crossed... _________________ If at first you don't succeed, call it v1.0 |
|
| Back to top |
|
 |
OnlyTheTony LXF regular

Joined: Mon Jan 08, 2007 11:51 am Posts: 303
|
Posted: Sun Jul 18, 2010 11:37 am Post subject: |
|
|
After 2 x motherboards, 2 x distros and several nights of sitting around until 2am sobbing I may have found the culprit.
It was nothing to do with the server at all - it would appear to be a problem with my desktop machine. The lan was always activated and didn't appear to be sending a disconnect signal to the server which consequently hung waiting for a response.
I've replaced the onboard LAN with a PCIe x 4 dual Marvell lan adapter - let's see if this solves the problem.
Oh, and I went back to Ubuntu because CentOS, whilst good, was far too slow on my hardware. _________________ If at first you don't succeed, call it v1.0 |
|
| Back to top |
|
 |
OnlyTheTony LXF regular

Joined: Mon Jan 08, 2007 11:51 am Posts: 303
|
Posted: Tue Jul 20, 2010 10:01 am Post subject: |
|
|
*****SOLVED*****
It turns out that it was the LAN adapter on the desktop PC that was causing the problem. Had no trouble with network connectivity since changing the onboard LAN for a PCIe card.
Thanks to everyone who offered potential solutions. _________________ If at first you don't succeed, call it v1.0 |
|
| Back to top |
|
 |
| View previous topic :: View next topic |
|