View Full Version : 10baseT performance question (hard one lol)
KermitTheFrag
8th April 2002, 09:39
Nice one for you here. This is a last resort post as noone else can answer it that I've found.
I have a customer/friend with a Sun Sparcserver 4/490 (big 4 processor mainframe jobby) and a bunch of tektronix X-terminals (8 to be precise). this "does the job" and "didnt cost much" so its staying as is. It also runs some obscure proprietry data collection and processing application that some bearded geek wrote a number of years ago so re-write on something newer isnt practical.
My problem is that the whole thing was on a 10base2 yellow string network (thick ether). I've persuaded him to replace it with a 10baseT based network (hubbed for the moment), on reliability grounds, using AUI>10baseT tranceivers and a decent Allied Telesyn hub. This helped the "terminal lag issue" a bit and the death rate of the network, but the network collides like nothing on this earth. next step was to replace with a switch, unfortunately the network gets WORSE (!) with the switch with lag and traffic (dont ask me why!).
Traffic is bootp (to load the xterminal firmware), dhcp and Xwindows ... nothing else. Runs over normal IPv4.
Any ideas... this is shoving billions of really small packets around, and for the life of me i cant make it run faster! :(
btw 100baseT isnt an option, and neither is hardware upgrades.
edit: changed topic to stop people being put off lol.
WhiteKnight
8th April 2002, 11:29
erm, just out of interest, why is 100base-TX out of the question.
I mean, a 100mb switch would be more up to the task of shifting lots of little packets.
You could just use the 10/100mb switch with 10mb devices ?!
I mean, the internal back bone stays the same speed regardless of weather the ports are at 10 or 100, and its gonna be at least 10x as fast as the one in a 10mb switch.
KermitTheFrag
8th April 2002, 11:39
it was a 10/100 switch. I tried Allied telesyn one and a netgear one so im not sure of the backplane capacity, but they are fairly common and should put up with 9 10baseT machines no problems. It was worse than the hub for some reason which is whats getting me. It's a wierd one :(
the machines only have AUI ports unfortunately which wont take 100mbits of data through them and the bugger wont move away from the system (my recommendation). A P3-500 is quicker than the full height 4/490 rack and has decent network prospects :(
afty
8th April 2002, 12:36
Is it possible that those boxen (terminals or 490s) have pretty awful TCP networking stacks?
The fact you have loads of collisions going on indicates not necessarily too much traffic, but possibly bad traffic. It could be sending fourty packets where one would do... or perhaps the networking hardware in the terminals doesn't cope well with collisions? AFAIK when collisions happen on modern NICs, they wait a random length of time and then re-send the packet, perhaps these crappy old terminals don't wait a random length of time, instead they wait a fixed interval, which would result in stupid amounts of collisions even with a very low amount of data.
Is there any way of measuring the throughput of one of the terms so you can see how many packets, and what size of packet is being sent? Maybe a network sniffer?
Failing that (or, even before that ideally) I'd pressure yet more for a re-write on a decent architecture. Sounds like the terms are doing quite simple tasks, could you simply run an 490 emulator, if there is a such a thing, and a terminal emulator on desktop PCs?
KermitTheFrag
8th April 2002, 12:56
Originally posted by afty
Is it possible that those boxen (terminals or 490s) have pretty awful TCP networking stacks?
SunOS 4.1.4, so its the BSD networking stack (albeit BSD4.3+extensions). The TCP/IP stack is pretty damn good :) ... the terminals I dont know as unfortunately. They are tektronix ones (circa 1994), so I cant comment. I know little about them apart from how they boot and the fact that they do work.
The fact you have loads of collisions going on indicates not necessarily too much traffic, but possibly bad traffic. It could be sending fourty packets where one would do... or perhaps the networking hardware in the terminals doesn't cope well with collisions? AFAIK when collisions happen on modern NICs, they wait a random length of time and then re-send the packet, perhaps these crappy old terminals don't wait a random length of time, instead they wait a fixed interval, which would result in stupid amounts of collisions even with a very low amount of data.
Is there any way of measuring the throughput of one of the terms so you can see how many packets, and what size of packet is being sent? Maybe a network sniffer?
Good plan. I'll borrow my good friend's funky OS2+portableish compaq 386 network analyser round there next time I attack their network. See whats really happening. I'd do it on the server but it doesnt have any kind of packet capturing driver by the looks, and of course someone through the damn OS manuals out and didnt install the manpage collections (great eh?).
Failing that (or, even before that ideally) I'd pressure yet more for a re-write on a decent architecture. Sounds like the terms are doing quite simple tasks, could you simply run an 490 emulator, if there is a such a thing, and a terminal emulator on desktop PCs?
We did look at that but they are stingy bastards. To rewrite would take 3 months, require an SQL server platform and 8 client PCs (their office could be organised so their existing 386's with wordperfect could be upgraded too). at a cost of about £30k. They could integrate the digital camera hardware in with the network tho which would save a lot of manual labour.
btw fyi etc, The system is actually for recording container details, damage reports and statistics for a shipping company. Basically they take photos of everything that goes through, then they archive the photos for 3 months, archive them on syquest disks and manually index them on the unix machine. If someone comes back and sues them for damaging a container in transit they look up the syquest disk the photos are on and say "look at the photos". Theres no skilled labour or costing for an administrator so MS isnt an option unfortunately as their software doesnt look after itself :(
quality eh?
Cabe
8th April 2002, 18:21
could it be that the AUI > 10bT converters are shagged?
Perhaps its worth looking into getting new ones?
Have you considered *shudder* Token Ring ?
Originally posted by afty
The fact you have loads of collisions going on indicates not necessarily too much traffic, but possibly bad traffic. It could be sending fourty packets where one would do... or perhaps the networking hardware in the terminals doesn't cope well with collisions? AFAIK when collisions happen on modern NICs, they wait a random length of time and then re-send the packet, perhaps these crappy old terminals don't wait a random length of time, instead they wait a fixed interval, which would result in stupid amounts of collisions even with a very low amount of data.
Not true. The random backoff algorithm is part of the 802.3 standard, and has been since at least 1985. It's nothing to do with "modern NICs".
If it were me, I'd stick a packet analyser of some sort on there, and take a look. If you can't manage a few thousand for a top of the line Fluke tester, a low-end BSD box and a copy of tcpdump and ntop are a useful start (seriously -- with two NICs, and acting as a transparent bridge, you can get an instant understanding of what's happening between any two points on the network).
I'd also look carefully at the environment, and make sure that there's nothing nearby that might be causing interference in the cabling. Is it all correctly shielded? Someone hasn't run a heavy cart over a critical piece of cable that's now very marginal?
N
Asterix
8th April 2002, 21:02
or how about we all go down the pub and get pissed?
bvark
9th April 2002, 00:13
Try a managed switch or a LAN tester a la Nik. You should be able to see which of the stations is actually causing the collisions - a single faulty station might be the problem. I'd suggest that if moving from 10base2 (which ran for a lot of installations for a long time) to 10baseT doesn't fix all the problems, single station faults might be a good place to look.
Exponential backoff is known broken in a few cheap NICs, so it might be broken in the cards attached to the Xterminals. Cisco switches (at least - I'm sure there are others) can work around this behaviour by setting variable backpressure (random collisions) towards the hosts.
You're right though - it's a hard one, and an interesting problem -keep us posted :)
KermitTheFrag
9th April 2002, 01:00
Ok (all though im on virtual holiday *grin* ) I'll post the results :) ... firstly thanks go to nik for suggesting ntop - i found the problem because of that. Spent all afternoon installing OpenBSD on a junk P60 though thanks to the SSH keygen taking hours tho. FreeBSD next time ;)
After a bit of playing, ntop and some luck, it appears to be a software problem as the terminals running the indexer X-sessions were getting large chunks of data thrown at them by the 490. I assume its small screen updates as theres a nice counter and "progress display". Unfortunately, it thrashes the 490's network stack with 2 or more indexers open (its not a very efficient SunOS release by the looks or has no tuning done on it). The thing was running flat out, high load. I resign on that - I'm not knowledgeable enough in BSD4.3 / SunOS 4 to tune it.
All down to shi77y code :( Log out the indexers and its fine! Of course the users never bloody log out (oh the security) so this hasnt been tested before. Something like:-
for( ;; ) { update_client() };
Until we resurrect the (possibly) long dead unix guru who wrote the system, theres absolutely SFA I can do, apart from tell them to log out when they're done and keep the lowest amount of indexers running.
Its a pile of **** and needs replacing. The only source code is on a QIC150 thats not been touched for 8 years :P
Thanks for everyones help :)
Dizzie
9th April 2002, 09:52
Even though I didn't help with this problem it provided an interesting read none the less :)
KermitTheFrag
9th April 2002, 09:55
lol true.
just to add a spanner to the works, I just got permanent job as the only NT administrator for mid sized company network (40 users + exchange + etc etc). Bang goes my life :(
Cabe
9th April 2002, 19:19
Kermit, need a PFY?
/me desperatly needs to get the heel outta retail.
KermitTheFrag
9th April 2002, 21:53
dunno if im gonna take it yet - depends on how bad a state their network is in...
vBulletin® v3.7.3, Copyright ©2000-2008, Jelsoft Enterprises Ltd.