Patterned ping spikes via Pingplotter

overdrive148 · Feb 16, 2015

Hey guys. So moved into a new place, changed providers to Cox and I'm using one of their Ubee (DDW365) combined modem/routers.

Through browsing and gaming I've been noticing hang-ups seemingly at random, new PC build that at the old place worked just fine. I fire up ping plotter to see what it looks like and I get this:

(larger version here)

The ping spikes are shaped like triangles, the top and bottom graphs are about an hour apart, and are stepped by each second. Each spike lasts about a second or two. What do you think would cause this? AFAIK everything in my pc is patched, no one is on my wireless (says my router dashboard) and I'm on a hard line secured directly to the box.

And if the answer is call Cox and beg them to listen to me, what can I say that won't confuse the hell out of the support guys?

lziegenhals · Feb 17, 2015

The first thing I would do is to check each hop to see where the spikes start. By default PingPlotter displays the timeline graph only for the destination, but you can display the timeline for any hop along the way by right-clicking on the desired hop in the top list and selecting "Show this Timeline Graph". Start with the first hop (your home router / cable modem at 192.168.0.1) to be sure it's not showing the spikes. If it is, the problem is somewhere between your PC and the router, inclusive.

If it's not in the first hop, look at each intermediate hop to see where it starts. If it starts on the second hop, that would be between your house and the first Cox router. Beyond that is Cox's internal network, then some peering or transit provider, and so on down the line to Google or wherever. (By the way, I should mention one important caveat and a bit of trivia: don't worry too much about occasional spikes at intermediate sites unless they correlate to the spikes you are seeing at the destination. Routers often respond more slowly with the ICMP "time exceeded" messages that PingPlotter tracks than they do to routed traffic. That's because those types of messages are often handled by the central CPU ("process switched") instead of the dedicated routing hardware that handles the transient traffic. That's why you will often see higher RTTs at intermediate hops than you do at the destination. So just look for the hop where you start seeing spikes that correspond to the spikes you are seeing at the destination.)

If you get a customer service rep at Cox that knows what they are doing (and that's a big if!), having the intermediate timeline graphs will help them track down the issue, or at least convince them the problem is not with your own PC.

overdrive148 · Feb 18, 2015

lziegenhals said:
The first thing I would do is to check each hop to see where the spikes start. By default PingPlotter displays the timeline graph only for the destination, but you can display the timeline for any hop along the way by right-clicking on the desired hop in the top list and selecting "Show this Timeline Graph". Start with the first hop (your home router / cable modem at 192.168.0.1) to be sure it's not showing the spikes. If it is, the problem is somewhere between your PC and the router, inclusive.

If it's not in the first hop, look at each intermediate hop to see where it starts. If it starts on the second hop, that would be between your house and the first Cox router. Beyond that is Cox's internal network, then some peering or transit provider, and so on down the line to Google or wherever. (By the way, I should mention one important caveat and a bit of trivia: don't worry too much about occasional spikes at intermediate sites unless they correlate to the spikes you are seeing at the destination. Routers often respond more slowly with the ICMP "time exceeded" messages that PingPlotter tracks than they do to routed traffic. That's because those types of messages are often handled by the central CPU ("process switched") instead of the dedicated routing hardware that handles the transient traffic. That's why you will often see higher RTTs at intermediate hops than you do at the destination. So just look for the hop where you start seeing spikes that correspond to the spikes you are seeing at the destination.)

If you get a customer service rep at Cox that knows what they are doing (and that's a big if!), having the intermediate timeline graphs will help them track down the issue, or at least convince them the problem is not with your own PC.

Great advice, I'll take a look and see what I find based on your advice. Thank you!

If I find a problem at a hop past my router, I should just tell Cox there's a problem on their side and that I have something that may help them figure out what it is that's downstream from my PC?

lziegenhals · Feb 18, 2015

overdrive148 said:
If I find a problem at a hop past my router, I should just tell Cox there's a problem on their side and that I have something that may help them figure out what it is that's downstream from my PC?

It depends on whether the customer service rep you talk to even knows what PingPlotter is! I don't have any direct experience with Cox, but the last time I had a similar issue with Time Warner Cable, the CSR knew about PingPlotter and recognized from it that the problem was not in my own network or equipment.

You might tell them that you have some output from PingPlotter that seems to indicate the problem is occurring several hops away (assuming that's what you find), and that you'd be happy to share the graphs with them if they would find it useful. You can even mention the specific host name of the hop where the problem starts, although you have to be careful about that because it's not 100% accurate due to possible asymmetrical routing (that's where the packets coming back take a different route than the packets going out; it can sometimes make PingPlotter output rather ambiguous).

If you get get a good CSR, the PingPlotter output may be useful to them. They have similar tools they can use internally, but it is often useful for them to see the output your direction, too. On the other hand, you may get a CSR that doesn't know PingPlotter from a hole in the ground, or who isn't interested in customers trying to perform their own diagnostics. I've dealt with a few of those before, though as I said, I have no experience with Cox specifically.

By the way, I assume you are seeing this on many/most sites, and not just Google as is shown in the example. It would be good to tell them that, too, and have PingPlotter output for a few of the sites just in case.

Good luck! These intermittent problems are the hardest (and the most frustrating) to track down!

overdrive148 · Feb 20, 2015

So, capturing the ping spikes exactly as they happen yields...

I took some screenshots as the ping spiking was happening. It looks like it's not on my side, but on Cox's or somewhere past that. The red line in the right side is average ping, and you can see where it's happening mostly. Blue X's are current ping.

I'll probably call Cox in the next few days and tell them I'm having ping issues and hope they care enough to help.

I also pinged a different site other than Google and lo and behold...

denverpilot · Feb 20, 2015

I wouldn't say you're having ping issues. I'd say you're having intermittent performance issues. At least until you talk to a real tech.

overdrive148 · Feb 25, 2015

Talked to the techs on the phone a couple days ago, they could not understand why I was pinging or what I was trying to fix. They pinged me, said my latency was fine. I got tiered up in techs and they still couldn't figure out what I was trying to tell them.

So they sent a tech out to my place about an hour ago and replaced the modem/router combo and gave me a new one (relatively, it had a different network name on it than the sticker but we figured it out).

The result? Ping spikes are still there, speed is still more than fine. At least in pingplotter.

The other thing? There are no ping spikes via wireless anymore. Not sure how that would've changed it but now I'm thinking it could be a hardware issue or something? The tech checked the line 3 times over his visit and saw nothing, said he didn't like the Ubee router/modem combos, and gave me a Netgear.

I don't even know where to start to figure out how to deal with the hardware side of it :dunno:

latest mobo drivers for the network adapters are installed. Cables are plugged in securely and new.

denverpilot · Feb 25, 2015

The problem is, ping spikes in and of themselves are meaningless. The original problem was odd hang-ups and slowness, right?

Many networks lower the priority of ICMP significantly and packet loss when pinging through such a network is common, for those ICMP packets.

But it doesn't mean anything necessarily about the cause of the hang ups and slowness.

What types of programs were "hanging" and is there any chance other things (including malware, hiding from view) were running on that computer at that time? Do the hangs always happen at the same time on all devices on the network? Etc...

overdrive148 · Feb 25, 2015

Yeah, odd hangups. I'd say more jitter than slowness (speed is coming in at like 62mpbs, over the 50 I pay for).

As for other devices, GF reported the Roku hanging on Netflix, but less often than the hanging I've had on my machine.

AFAIK malware free (fresh OS reinstall last week) but I'll run another check from Malwarebytes and see if that's not it. Also double checking drivers for everything are current but it could just be me barking up a nonexistant tree as to if its ping or not.

denverpilot · Feb 25, 2015

To hang long enough to kill the Roku sounds like a complete outage between the Roku and Netflix. There's a significant sized buffer there for it to eat from during latency or minor packet loss. It would have to be a fairly long outage to deplete that buffer unless she was right at the beginning of playback. And that happens to everyone...

docmirror · Feb 25, 2015

It's more than likely that Cox is running a protocol behind their multiplexer that will test everyone in the incoming mux and see whom they can slow down, and who needs their full BW. This is a common occurrence for people who are on a type of switch that uses a modified round-robin algorithm to see who gets serviced next, rather than apportioning via some kind of LRU or MRU method. Basically, it's a way of throttling without affecting your actual 'bandwidth' as provided by speedtest or one of the other speed verification methods.

While you are getting your defined bandwidth in terms of MB/S, some of that can be shaped so that you have short periods of null. Based on the chart of your latency graph, it looks like they will be slowing you down by successive approximation based on what you are using it and when you are using it. Most modern mux boxes aren't that intelligent to figure out that they can discriminate packet exchanges and prioritize, they just go round, and round and round doling out Pez to each mux participant.

Doubt it will get better until the ISP upgrades your switch at the CO to a more powerful product. L3 does it I know(cuz I built some of it), pretty sure ATT and TWC do it. Don't know about Cox, but surely AOL does it, cuz their spit is ancient. Really nothing you can do about it, as you're sharing the mux with maybe 100-400 other folks. See if it gets worse around 6PM when all the moms and dads get home and log on.

The other thing it could be is a problem with jabbers on the line, but that's pretty unusual and once they repl your router, the jabbers should have ceased. You'll need wireshark or another packet sniffer to find them, and that's pricey.

Hey, I just looked closer and you're routing from coxnet to L3. Bummer, both of them are likely testing to see where they can shed load, and you are at the short end of the pipe. Sorry....

overdrive148 · Mar 2, 2015

Thanks for the attempts to help me out guys, really appreciated.

You guys won't believe it but I fixed the problem.

After the tech came out I started thinking it was computer side because my laptop on wireless and wired would not show the same problems under the same conditions. I had a network card come in the mail today and I popped it in and the problem did not solve itself.

I started going through the network settings and looking for settings that were out of the ordinary between the laptop and my desktop. Uninstalled, reinstalled drivers, etc etc ad nauseum. I looked for a network analyzer program and couldn't find one, but did end up taking a look in the Resource Monitor.

Under Network, for Processes with Network Activity, I found firefox/system/svchost, the usual suspects, but one process under the moniker of WD, which google said was Western Digital related. I uninstalled the program from the add/remove and...

My bet is it was something trying to update or maybe even sync to the internet somewhere. There was no tab in the bottom right showing it was active or anything. What a bizarre problem. Also turns out I was reading the pingplotter set wrong and it was indeed happening at the first hop to my router/modem. What a bizarre problem :dunno:

denverpilot · Mar 2, 2015

Could have also been malware hiding in plain sight as Western Digital software. Done any malware sweeps of the machine lately?

The part that sent us down the wrong path was that you felt all machines behaved similarly.

Another thing to check in the future is to look and see if there's a way to view bandwidth utilized and from which machine on the cable company's router.

That machine may have constantly been eating up bandwidth doing some hating unknown (there are ways to sniff this stuff but you have to have the network set up to do it and another machine known to be good/behaving to do the sniffing with) and adversely affecting the other machines by simply starving them for bandwidth.

Hard to say, but glad you found it.

Was disk activity high on the machine prior to stopping this thing, and got quiet(er) afterward?

overdrive148 · Mar 2, 2015

denverpilot said:
Could have also been malware hiding in plain sight as Western Digital software. Done any malware sweeps of the machine lately?

The part that sent us down the wrong path was that you felt all machines behaved similarly.

Another thing to check in the future is to look and see if there's a way to view bandwidth utilized and from which machine on the cable company's router.

That machine may have constantly been eating up bandwidth doing some hating unknown (there are ways to sniff this stuff but you have to have the network set up to do it and another machine known to be good/behaving to do the sniffing with) and adversely affecting the other machines by simply starving them for bandwidth.

Hard to say, but glad you found it.

Was disk activity high on the machine prior to stopping this thing, and got quiet(er) afterward?

I ran a full Malwarebytes scan on it last night and it didn't pick up anything. The WD software I did remember installing after my clean OS reinstall (so I could offload backed up files from the HDD). I did feel that they were all having problems (GF reported problems on the Roku, I don't use it naerly as much and don't know how I'd diagnose it the same way). My bad for misreporting.

I only looked in the performance tab a couple times before this because I believed it was Cox's side but I did notice a higher disk activity before I hit the network tab and found the problem.

denverpilot · Mar 4, 2015

overdrive148 said:
I ran a full Malwarebytes scan on it last night and it didn't pick up anything. The WD software I did remember installing after my clean OS reinstall (so I could offload backed up files from the HDD). I did feel that they were all having problems (GF reported problems on the Roku, I don't use it naerly as much and don't know how I'd diagnose it the same way). My bad for misreporting.

I only looked in the performance tab a couple times before this because I believed it was Cox's side but I did notice a higher disk activity before I hit the network tab and found the problem.

No big deal on "misreporting" ... Stuff interacts and sometimes seems related. No biggie.

Had a lady at work convinced the printer was out to lunch. Closed some of her 40 open items and voila.... Her giant PDF could print. Go figure. Ha.

docmirror · Mar 4, 2015

That's a weird one. Glad you found it and it was local fix. Does sort of sound like the WD driver was hunting for an update. If it was always on the first hop, that's the key. Well done.

Patterned ping spikes via Pingplotter

overdrive148

En-Route

lziegenhals

Pre-Flight

overdrive148

En-Route

lziegenhals

Pre-Flight

overdrive148

En-Route

denverpilot

Tied Down

overdrive148

En-Route

denverpilot

Tied Down

overdrive148

En-Route

denverpilot

Tied Down

docmirror

Touchdown! Greaser!

overdrive148

En-Route

denverpilot

Tied Down

overdrive148

En-Route

denverpilot

Tied Down

docmirror

Touchdown! Greaser!