[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

Re: [LUG] default ubuntu mirrors

To: list@xxxxxxxxxxxxx
Subject: Re: [LUG] default ubuntu mirrors
From: Michael Everitt <michael@xxxxxxxxxx>
Date: Sat, 18 Apr 2020 18:20:56 +0100
Autocrypt: addr=michael@xxxxxxxxxx; keydata= xsFNBFngC8gBEAC8/nQZrVrr8v0kaD4OLw8UftKfPQFEMGY/rnFA81M9IvdyPP8/8u/+9AGc DEN3i/LRvW0KUBdKIngcUY/p1M/sJqBspMOBaoOLp6K53/2uxcGXw62TulQJU+7a37Jukv2r HNSyZzM6II0myConmJa8ja1HfsiVoqDrqNigBF+Sts1kqG4xg8YeyOl1Tk+LZwC+ukzzutE9 pbpIL2snu5I6a6RNi9DtbB9FZKzkbXx8TlpMXrcorNryOLQHPRw6tir5Z8kpetiJgoEpKGBX botDOWLVW+s9XnwPzAFmL03gH+3reY+LfrQWQTDphfZIp75caZQUicQHpc1NUr+8bLr3n79A FCPY3CfWriGn17aqaaXDFfeYPJIlH8UmOXI41JqR47C5eYFbocA8A4k7cGVAdKJFWLy51165 dt7qZyvUQc/olzrZOrvoiWXA8ELg7pqxxObM4kl0502IHz9kb6Lt712HvfjH5yAP8zTYpetn sCPR9aVVSQsRgluNrQFlKpVmUXbeBLjw05UBEunS6prDwXOyZdn7t03LSOlK2nBGM+gtxg8l /0Nb1saYMGGN8qtO4RLFRiRBc20kNz01cC89PKRIXYlW9dRZNH1zebIUCAg+S4hSmmV4uvaZ XRADb2G+ZZ2jj9cNTTnI+X1/a19S8XjBZ4z+9+Hty4nhoB6fawARAQABzSRNaWNoYWVsIEV2 ZXJpdHQgPG1pY2hhZWxAMmUwY2VyLm5ldD7CwYAEEwEIACoCGwMFCwkIBwMFFQoJCAsFFgID AQACHgECF4ACGQEFAl2kr4IFCQeHCrkACgkQY+fGefTUMd3b5A/+JFNwCbdWUN9Ev2xtC6Fr QsmUSJUjMOUcShEr5yhgc9BvsmaofXUcnNtupiunRCISCcIJmK5nh0DPiMO0d2eUj4xJBKxT raRGU7QS19nKa+X5gZW87fz6FLPL6gFiR94VLTetbU2Pkwh+zW2Fz9Pa3U5qu+GWsLU4JsSE 5f8nrNYIrQEUXwMAaa7pBI1XKdoTle38fn7MTOYzp67K56ol7O+3gz+/8Toa+p3veuppJt1O gRur4rFpjrdyS5oKihZ8PXLLlcgN5ZYn/UnVwTnCtk/w39RY3u3M2yDmq6iEXT9LpLnfaQWs RDj6c8l1EAv0t7yxYVMAQykgIKcrLu8cky7T8NqVhyf5IQCMDB7XWsORBHIpfMs3y1NKODkc rAPbpIkHnzw1LYEOeMb15vwW3oqoeRu4jFkxTZTCfhxfg8uwiQmqbVUWLgYeIBabReTuwLHG 6enttTcothRO/D1T70kIoIcMAVL9UK9Sr/K0aRBbT5+pmdVA6bX04cflFWlT3IZlXlxK5OB4 Dt8NcxneZdBSUdt2WDObkO6XuFSuwJIxh/mfyVRWqfw4HGcpZ0CGj7f5Zi/BTn90Ol/FTSgM MNhpXovWX61w6EQe/Ak/qP4tjsdN5qiME3m1rmX0oTkt9XGogcrdPH/7r9R2Dtqv3jAtxb8q zBF6EbW89ch7GA7OwU0EWeALyAEQANjHmiRfZmup4Kk6IMJXtnR3wM4pWKVkoLexno/QD84n TplX8qFyAbHd1DDUBKxZEWTZP1NL8stMfQM7N89IKjqORPfL3ojd1Tn/IPb4U38Xx3FU8+ic F/YIMzv/4kT71B9zSnxL6SzZ/IJDrzfXE8odJPVa49vLwaQ9x1lhyZVOcFa6RVKROERVwzqT A2gEAD5AK88V5nOgrZ0H7qZKWT3vpqZQ6BIxZsNReny6f9PWvitJve8WVNHhv7WFqiLUaUx5 2PHfdsBRzcdUXWMiCPMtpAgzcKfhOqBF/wz+VrKShs4H/3/c3Iutb9fdsKV4qs6BPQ/qrhaP b8+mMJ/DQn6jlLPnZ8GfcbUFzhwVOZWYtGr5INx2ENZQK0pFB85TuJOQLJRLuQF6hDinvTKO Kp2F/xyt8UBD8BCY4tk/EkfSAz3wCGI0iuojMOy0+LoAsFXpjR5g0rlnWQ47W1TEXTbbFrtG q4tOBUPvHAaL0B0Mq1VWFXbrms5QYqF+3TfIKNe3rntgt4U6/MSYQeHV+/231QGstS3glsmh q9eZ35AB+MF6ddbElOPNhsEZxBC+U3mA4Bx7XfhcomJt1neU+TSoDzOV6cyIZuDLy5ljIZgu Dpy3OYGY9mkOyFotluRZcUkrtFawtKuoaDQdKwrM9RgSgway1/tN4WazEQtE3mWfABEBAAHC wWUEGAEIAA8CGwwFAl2kr5QFCQeHCswACgkQY+fGefTUMd3dgxAAoHZaetOnWaxPy8q8IRq9 JMXWV36USHdJ9W1MxFAKZkXnz4UVJ5boff+PszZYdA62pQgX83wi68pJoQOQLfAHmd0ZsA0s hukdlPaB+PnWY9qdP8BNFA9vuqH9FBx9Ckv7tUGHzZL7ihlZ3mOstAVRMoOk/BxwsMGQLgKb sq/iux8N+sBtBbOTyYKco1eSVVmPh8gYxkMh6WGmOQoYAQi031GQVEN3GBSNT870c0xMOK3i JMkI/TfpgMWPiCZiiL6kUvlndoLcDOs+zFjpOKEUqphwUQApkf+KWmPcdUsh3ZQKxCwn+Kvl VMp5YW/qil4SkF8BJethblSAjlik55frOz0sz6DUs6AWQSG9vFkqIPMSsEPOkZZrqsIxVikP q8iROwcnG6ywDmuL73LNTvypPK5cZouuDxQawcUfzon4pXWzfIrkz/6i3X65vmhQtVfKtb7/ hi3wSeX5d7jiU+ZOLX1BN153pLoODcmzqb92T+t5gaKWYMaEExs+h13GGYo8wqmuqypxMosm WV0kvAsygZF42pNIVw4/JgEUfhXeSRbBiaegF4uYM8b/Gn8OoII37i7nOrAMaXfvaChmt6LD X+TUtZHGSvAq8LY/7FuvricJF9V+o4I41h7zH+7qIqvBc1nGP6mgr4c3U7H3XpRtuaHwojfn rEbAnxvLsIsb5oE=
Delivered-to: dclug@xxxxxxxxxxxxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dcglug.org.uk; s=1586423162; h=Sender:Content-Type:Reply-To:List-Subscribe: List-Help:List-Post:List-Unsubscribe:List-Id:Subject:In-Reply-To:MIME-Version :Date:Message-ID:From:References:To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner:List-Archive; bh=zFpOkkG98ISbVBKzedtnCaSaFrC/1jQ1PDk6b0kFML0=; b=hpaOmhnlCI2xZ6rG8an2B/Ncse P7V/MUcb5XIwlKRb2DLmvgRbtZT/qKuL4Gc2DT+LIvzLlYkQhv+fX6acwTJjoGS+kWS7iuYKzlX/e VdIUIxegvwNO8XBHSA7raXeeh7v7QJ6ylSOMXRSyQCmGPCwvTPqwmr7hB9LDO3Sj1uQA=;
Openpgp: preference=signencrypt

On 18/04/20 18:12, comrade meowski wrote:
> On 18/04/2020 16:06, Michael Everitt wrote:
>
>> Hmm, but how is that affecting routing . MAC is physical layer, that should
>> theoretically make (virtually!) no difference? What problem is that causing
>> higher up the stack??
>
> I'll be honest chief, I wasted about 6 hours on this last night before -
> for the first time in years - giving up before I went crazy. Sometimes
> you just have to know when to move on to more pressing issues.
> Fortunately this is one of those things that got caught early in staging
> and so nothing in production is effected - I am going to have to fix it
> before long though, just got to smash the job queue a bit first.
>
> More eyes on is going to be the best way to figure out what's wrong and
> I've already got a few other people looking it over (not all the configs
> are mine so it's entirely possible it's nothing I've personally done) so
> I'll outline the issue as briefly as I can as replicated easily enough on
> my two main home workstations which is where I also first noticed the
> problem. I'll omit a lot of stuff for brevity.
>
> The stuff:
>
> Ubuntu 19.10 + 20.04 host machines
> Both have LACP ethernet bonds across multiple interfaces + separate
> management LAN, etc
> Enterprise switch has matching LAG group + VLANs tagged through
> Virtualbox hypervisor on both machines, lots of different VMs (all are
> bridged to the LACP bond0 but can be switched to other interfaces or
> tunnels)
> Both machines are multihomed with multiple gateways/routes/DNS available
> Also lots of egress SSH and VPN tunnels in use on both
>
> The symptoms:
>
> Fire up random VM on either. _Everything_ network related works fine in
> VMs: internet browsing, wget files, git clone, etc. What does NOT work is
> the built in operating system package management tools on _some_ distros.
> That's literally the only thing that doesn't work - DNS timeouts
> everywhere but only for some of the configured repos. Seen so far on
> Arch, Mint, Ubuntu and Debian. Slack, Fedora, RHEL and Ubuntu 20.04
> specifically and Windows are 'immune', haven't tested Macs or BSD or all
> my Linux VMs yet (too many of them just for a start).
>
> Digging:
>
> Switching between my egress methods whilst keeping the VMs in bridged
> mode has almost no discernible effect that can be presumed statistically
> significant despite this changing their VLAN ID, network/subnet, gateway,
> DNS and route. Sometimes a slightly different apt mirror times out but
> still results in general failure. _Some_ repos still work instantly every
> time - including all PPAs and random little personal repos (including my
> local ones). Flipping _any_ effected VM from bridged to NAT mode however
> instantly fixes them.
>
> Smoking gun?:
>
> On my system any traffic not specifically guided out otherwise "falls
> through" into my admin VLAN which is automatically dropped into a
> permanent Wireguard tunnel to my VPN provider. So any "naive" traffic,
> which the NAT'd VMs are included in, gets routed out through my VPN
> provider and not my local ISP connection - this comes obviously enough
> with it's own separate gateway and physically emerges elsewhere (but
> still in the UK). Any effected VM works perfectly egressing like this! At
> least I can rule out that all these mirrors are somehow down, I didn't
> think that could be possible after all.
>
> DNS Trouble:
>
> I feel the bug is here specifically, but may be wrong. Generally, every
> single network segment and VLAN is handled specifically by my own local
> DNS server. There is some complexity handling with it's upstream DNS
> which varies depending on the VLAN feeding it requests (split horizon DNS
> obviously so internal resources are available to everything but without
> necessarily leaking DNS requests through my ISP for private/work
> systems). Obviously I can change the VM client's DNS at will - and have -
> but this doesn't change anything!
>
> Sanity checking:
>
> I've remoted into some client networks and double checked similar setups
> in testing there - same results. But then they are nearly all pretty
> geographically local. Speaking of which...
>
> Tentative diagnosis:
>
> I *think* what is happening is something like this: the load balancers at
> the other end are messing me up somehow. It looks like DNS but clearly
> isn't that simple. Ubuntu/Debian/Mint all use geographically distributed
> load balancers on their big mirrors that not only pick a round robin DNS
> mirror to send back to the client but then that content is served from a
> specific box situated "close by" in network cost terms. THAT is the bit
> that somehow triggers the... fault? Bug? I'm not even quite sure how to
> class it. I'm not ruling out a configuration misstep either - it could
> even be a freakish combination of all of them.
>
> If I leave my VMs where they should be: on the production LACP bond and
> VLAN, behind my DNS server and routed through my normal gateway via my
> ISP I can make them work, it's just a pain and obviously it shouldn't
> need any adjustment whatsoever. Some VMs continue working flawlessly on
> default settings pulling from the _same damn mirrors_ (Ubuntu 20.04 is
> the stand out weird one here). On effected Ubuntu/Debian/Mint/Arch VMs if
> I manually edit their repositories away from top level mirrors with load
> balancers and choose specific servers, they immediately start working.
>
> So to be clear, there doesn't seem to be _anything_ wrong with my setup,
> even though I've changed a lot of stuff in the last week or so (the
> enterprise switches are new as are the bonds and some VLANs). I've also
> lived and breathed this stuff for years - I know how to configure all
> this stuff in my sleep. I also log _everything_ and the logs say nothing.
>
> The big fat glitch seems to be how certain open source top level mirrors
> firstly feed back a different round robin DNS to the client, guiding it
> towards say bytemark or mirrors.ac.uk. Then,_depending on the IP of that
> client, a specific server instance is chosen via geolocation. All of my
> normal VM traffic egresses out through my ISP so can be geolocated to
> down here in the South West pretty accurately. That hand off seems to be
> what is breaking in certain circumstances, for certain VMs. If I NAT my
> VM out it goes through the VPN tunnel and emerges in a London datacenter
> somewhere - and that gets perfect results. That seems to be the core of
> the problem. Geolocated by a top level load balancing FLOSS mirror to the
> South West and are a VM bridged through a LACP VLAN behind a local DNS
> instance? FAIL. Same VM geolocated by same mirrors to a London
> datacenter? PASS.
>
> Is it the package managers logic that is somehow wrong? After all, any
> effected VM can still access Google/Amazon/everything else on the entire
> internet just fine and they're most definitely geographically load
> balanced systems too! It's just the damned package managers. I'd nearly
> narrowed it down to blaming some weird Debian specific glitch (note the
> trashed VMs are all Debian flavours whilst Windows and Redhat derivative
> Linux continue without issues... until the Arch VM turned up broken as
> well). I really wanted to blame the newest stuff first - LACP
> specifically. But literally everything else is working perfectly. And the
> workaround VMs NAT'd out over the VPN that do resume normal behaviour?
> That traffic is passing over the very same LACP bond. Doh!
>
> Honestly mindblown by this one ¯\_(ツ)_/¯
>
> It's on pause for now either way - please bear in mind that I've actually
> left out vast amounts of technical detail here as well, it would take
> months to explain the whole lot fully! My intuition is that something
> like this *has* to be my fault really. There's an unforeseen glitch or
> loop or rogue cache somewhere and eventually I'll find it and curse myself.
>
> On with some easier jobs for now. Like solving world hunger or proving P=NP.
>
Yeah, I can see all the signs point to something automatic that's supposed
to be "smart" being "dumb" somehow .. and this is where the KISS principle
always wins out, as I'm sure you know. Each layer of complexity *will* work
*flawlessly* in isolation, but as you add the layers up, you add potential
for a new "edge case" to emerge at each new 'stage'. I don't need to tell
you (if even I could) how to 'drill down' as you seem to have a good idea
where you're going (where many of us would struggle to know where to START)
.. so all I can do is wish you 'Good luck' on your investigations.... :p

What happens, out of interest, if you dump all the geo- and round-robin
crap, and hard-code in some known-good stuff? Is that gonna help/hinder any
stable configs?

Attachment: signature.asc
Description: OpenPGP digital signature

-- 
The Mailing List for the Devon & Cornwall LUG
https://mailman.dcglug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/listfaq

Follow-up: Re: [LUG] default ubuntu mirrors
- From: comrade meowski

References:
- [LUG] default ubuntu mirrors
  - From: comrade meowski
- Re: [LUG] default ubuntu mirrors
  - From: comrade meowski
- Re: [LUG] default ubuntu mirrors
  - From: comrade meowski
- Re: [LUG] default ubuntu mirrors
  - From: Michael Everitt
- Re: [LUG] default ubuntu mirrors
  - From: comrade meowski

Prev by Date: Re: [LUG] default ubuntu mirrors
Next by Date: Re: [LUG] default ubuntu mirrors
Previous by thread: Re: [LUG] default ubuntu mirrors
Next by thread: Re: [LUG] default ubuntu mirrors
Index(es):
- Date
- Thread