D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] default ubuntu mirrors

 

On 18/04/2020 18:20, Michael Everitt wrote:
Yeah, I can see all the signs point to something automatic that's supposed
to be "smart" being "dumb" somehow .. and this is where the KISS principle
always wins out, as I'm sure you know. Each layer of complexity*will*  work
*flawlessly*  in isolation, but as you add the layers up, you add potential
for a new "edge case" to emerge at each new 'stage'. I don't need to tell
you (if even I could) how to 'drill down' as you seem to have a good idea
where you're going (where many of us would struggle to know where to START)
.. so all I can do is wish you 'Good luck' on your investigations.... :p

You're dead right of course - and the depth of complexity here (my "home" network is pretty full on as I use it for testing everything that goes into production) is the issue. I've sniffed through all layers so far looking for the smoking gun but can't isolate the fault to any one thing. Last week everything was the same and worked flawlessly: the major changes since then are the LACP, some VLAN and the entire switch is new. But all known quantities. I really want to blame that bit... but the evidence doesn't support it.

I know I've said I _think_ it's about 3 different things by now but Vbox's handling of bridged VMs has been historically flaky - there were unofficial patches for years to work around bridged VMs not getting DHCPs over a wifi host adaptor for example. There are outstanding VBox bugs from years ago specifically that look a lot like my issue - perhaps a regression (6.1.6 also came out in the last week)? The thing is I'm picking on Vbox here because it's where the fault first showed itself - there are some similar, but subtly different, issues with my other hypervisors too. Gah!

> What happens, out of interest, if you dump all the geo- and round-robin
> crap, and hard-code in some known-good stuff? Is that gonna help/hinder any
> stable configs?

I can't - that's how the top level mirrors work. As I said, if I manually edit individual effected VMs to point to specific repos instead of the load balancers suddenly DNS stops timing out and they work. I tore my DNS to bits looking for the caching error but there isn't one. Thing is, I'm not doing that for obvious reasons - partly because it would suck as a sidestep and not an actual fix. Mostly however because VBox is endemic among my clients for work stuff, much as I'd like them to use a more grown up solution like KVM or even Xen. It's free, simple, cross-platform and "just works" - plus it's the normal hypervisor integration tool the more advanced ones are using for automation/orchaestration/CI stuff (think Vagrant, Boxes, Jenkins, etc). 9 out of 10 subcontracted devs they have turning up to do specific jobs rock up with Mac or Linux laptops and everything prototyped in VBox. So sucky or not, it has to work. And it has to work in a standard "proper" managed network which automatically means a server with bonded interfaces, VLANs and split horizon DNS.

Holy crap, I've just thought of something - if the load balancers are doing reverse lookups on me as they geolocate my VMs then my DNS server may feed them back different results depending on how they "see" my DNS.

I know full well at this point tcpdump and setting up a switch port mirror or two to dump full PCAPs is going to be needed to get to the bottom of this. Might even make my first post to r/sysadmins at this rate, I'm not above asking for help by any means!

--
The Mailing List for the Devon & Cornwall LUG
https://mailman.dcglug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/listfaq