The NSA on the Risks of Exposing Location Data

The NSA has issued an advisory on the risks of location data.

Mitigations reduce, but do not eliminate, location tracking risks in mobile devices. Most users rely on features disabled by such mitigations, making such safeguards impractical. Users should be aware of these risks and take action based on their specific situation and risk tolerance. When location exposure could be detrimental to a mission, users should prioritize mission risk and apply location tracking mitigations to the greatest extent possible. While the guidance in this document may be useful to a wide range of users, it is intended primarily for NSS/DoD system users.

The document provides a list of mitigation strategies, including turning things off:

If it is critical that location is not revealed for a particular mission, consider the following recommendations:

  • Determine a non-sensitive location where devices with wireless capabilities can be secured prior to the start of any activities. Ensure that the mission site cannot be predicted from this location.
  • Leave all devices with any wireless capabilities (including personal devices) at this non-sensitive location. Turning off the device may not be sufficient if a device has been compromised.
  • For mission transportation, use vehicles without built-in wireless communication capabilities, or turn off the capabilities, if possible.

Of course, turning off your wireless devices is itself a signal that something is going on. It’s hard to be clandestine in our always connected world.

News articles.

Posted on August 6, 2020 at 12:15 PM60 Comments

Comments

wiredog August 6, 2020 12:20 PM

” turning off your wireless devices is itself a signal that something is going on”
Unless you’re in the habit of turning them off for several hours a day when they’re not in use.

Mike August 6, 2020 1:35 PM

airplane mode normally.
Faraday pouch when you want to be certain.
Only enable BT, wifi, gps, specifically when needed.
disable any unneeded capabilities on an app-by-app basis.

Weather August 6, 2020 1:50 PM

Would a traffic simulator be useful, to give the impression you are still with the phone.

stine August 6, 2020 2:06 PM

Well, that means they can’t use General Motors vehicles with OnStar. That’s going to hurt.

echo August 6, 2020 3:38 PM

This is pretty similar to GCHQs advice just a bit longer because of detailed explanations for use with more invasive technology.

Some US websites are skirting and/or breaking GDPR in the popup permissions. I caught two over the past couple of days doing this including mandatory tracking of location data and fingerprinting behind weasel words trying to define these as “essential” when they are not. Looking at you Ars Technica and Zif Davis. My human rights are not to be stolen nor are they for sale. Also no US side T&Cs or ELUAs or contracts or whatever have precedence over law on my side of the Atlantic and any breach is actionable in UK courts and likely EU as well.

Weather August 6, 2020 4:16 PM

@echo re Ars techinca
For me its ever ten minutes to load and freezes scrolling down the page or it crash’s the browser.
Firefox is fully update.
Visit every now and again to see if it works again

echo August 6, 2020 4:52 PM

@Weather

I use an ad blocker. Honestly, the internet is unusable without one. Aside from getting rid of the visual clutter I think the thread blocker also blocks some of the frameworks and other stuff which is loaded up so runs quicker too. Personally, I never see web pages which justify the bloat.

I have also installed a freebie ramdisc which is set to be thrown away at shutdown. (Out of the ones I reviewed it’s fast enough and the 1GB limit is more than adequate.) This speeds things up a bit and as and when I get an SSD drive it will save wear. On Linux (and maybe OS X?) You can fiddle with a few things on the commandline and create a ramdisc at boot for free. It’s also extra handy if you’re using a bootable USB as those can be quite slow. Firefox is relatively easy to set up for a ramdisc as are Chrome and Opera. I have no idea about the others but you can find the instructions online.

I think some slow pings on some linked resources can cause a page load to stall so you may have something going on there. Maybe run a memory check too.

Anders August 6, 2020 6:02 PM

The level of damage depends…

In Donbass Russia targets Ukrainian soldiers
among other things by cellphone. They have there
EW mobile stations that track and ID Ukrainian
soldiers cellphones and then direct fire on them.
There your life depends on whether you can be tracked
or not.

Clive Robinson August 6, 2020 7:54 PM

@ Bruce, ALL,

Of course, turning off your wireless devices is itself a signal that something is going on. It’s hard to be clandestine in our always connected world.

It’s funny but I’ve posted a couple of time already today about these very issues…

And the two watchwords for dealing with these issues are “Awareness” and “Thoughtfulness”.

That is you need to be aware of any potential adversaries “Methods and Sources” and by thinking adapt your fieldcraft as appropriate.

Due to the use of “Technical Support” “test harnesses” added to mobile phones by phone “Service Suppliers” that first publically came to light with the CarrierIQ debacle –prior to the Ed Snowden trove revelations–, we know that virtually every user action was sent in “plaintext” across the internet to CarrierIQ’s servers. Which means anybody sitting on a router upstream of CarrierIQ could not just see the plaintext going by they could build it into databases of “behaviours”. Whilst doing this “covertly” would be the IC prefered way, the fact that is what CarrierIQ were doing made it a “Third Party Record” that anyone holding a National Security Letter could access which means not just LEO’s but all sorts of “others”.

Thus a thoughtfull person would not “turn off the phone”. Whilst they good drop it into some form of Farady shield, the sudden loss of comms would be noted in the network service providers records which is even more suspicious. They might go into a known “dead spot” or “RF Black hole” but these tend to be unreliable or get filled in unexpectedly. Thus a more thoughtfull person would experiment to find out what would most quickly drain their phone battery flat. One such could be playing a long Utube video, older phones tend to go from full charge to off in less than an hour, newer designs can take longer. The point is a thoughtfull person would find out well in advance, and from time to time establish the “habit”.

However the problem with that NSA advisory for most people is they are not “going in clean/cold” as their places of residence and work are usually known. Which renders,

    “Determine a non-sensitive location where devices with wireless capabilities can be secured prior to the start of any activities. Ensure that the mission site cannot be predicted from this location.”

More than a little moot for anything other than off charecter / specified activities planed well in advance supported by an existing “habit” as a “gateway behaviour”.

It means you have to in effect establish a “legand” within your own real identity and this is one of the hardest forms of field craft because of the most dangerous periods “transitions”. Trying to avoid them can mean permanently “living the legand” which is what “deep cover” or “No Official Cover” (NOC) operatives do. For most people it can be stresfull to the point of breakdown if care is not taken.

In the past before we all carried tracking beacons, most often it was only agent/handler meetings where “transitions” would happen. Thus minimising the stress by having “decompression” time etc.

That is the “case officer” / “resident” at a diplomatic mission would assume that they were under continuous surveillance for their day to day activities outside of the mission, but not inside amoungst their “read in” colleagues. Thus their first step for a case officer on going out would be to “put on” their legand and transition into “character”[1] inside the mission. Then second thing to do when outside is “to loose the hounds” as naturally as possible before meeting the agent[2].

As a modern day individual you tend to carry your own “hound” in your pocket/bag etc and this behaviour is pushed on us by social / peer / employment preasure, thus a thoughtfull person needs to establish “habits” against such preasure. One trick is “having a meeting” especially “a meeting with oneself”. That is it’s expected that phones will be turned off or be put in silent mode for a meeting. Leaving them in a locked desk draw in silent mode would thus be quite acceptable for time periods when in “meetings” even setting a VM to that effect. Thus how do you get meetings when you need them, well using “lunchtime” as an excuse to go into “meeting mode” is acceptable to most, but a trend in more recent times has been to put a meeting with yourself in your calander etc so you can get undisturbed time for concentration etc. Thus a thoughtfull person would see advantages in developing such habbits.

But with lunchtime sometimes leave your phone in your locked desk draw in the office and sometimes take it with you, also get another charger and sometimes put the phone on charge when it’s in your desk draw. Likewise don’t always go out for lunch, the habits you develop need to be like a comfort blanket not a noose.

Another habit supporting this is to sometimes pay cash sometimes pay card when you go out. But always make a note of your purchases in your diary often but not always putting the recipts in as well till the end of the week etc when you more formally write up expenditure as personal finances / expenses then sling most but not all personal recipts “in a shoe box”. Obviously expense recipts go off to finance.

The longer you have these habits, the more people will believe them if you are ever challenged in some way, but also the better organised your life will be in general so there is a plus point for them without any clandestine reason.

Those spouses with a lot more than most to loose if they get caught cheating develop such habits. As became clear when it kind of went wrong for a number of them in South Korea recently with COVID Contact Tracing apps. It was unexpected, there was a lot of peer preasure and quite a few did not think it through far enough before enabling such apps on their phones.

This tells you one important fact in life,

    You have to expect the unexpected.

Which in turn means you have to know,

    How to react both safely and minimally when it happens almost by instinct.

Which requires a certain advanced thinking mindset, the reason it has to be “minimally” is not only is it less obvious, but the closer to the truth it is, the harder it will be for others to challenge or find fault with.

Remember the first rule of lying is to,

    Never invent, always tell some truth, just tell “your truth” from a different perspective.

Again even when telling the truth say the minimum and do it in an ambiguous fashion, and where ever possible turn an answer into a question for your interlocutor. Thus preface an acusation with “Why would I…” or “Of what possible benifit…” or “Do you realy think they…” etc etc. Oh and don’t say daft things like “I don’t remember” because that is almost always a lie and can be shown as such, even when it’s not.

There are a whole load more things people should think and plan for. But all I will say is it’s not my place to advise cheating spouses what to do (unless they are cute and cheating with me 😉

[1] Case Officers rarely if ever use their real identity for meetings with agents[2]. Even when the agent knows the Case Officer is the “commercial attache” at the foreign countries embassy that in of it’s self is almost always a “legand” used as an “Official Cover” for the real more clandestine activities.

[2] Agents are in the hierarchy of inteligence gathering the lowest form of animal life. They are after all traitors to the country they are citizens of and in all probability will come to an unfortunate end. The reasons they become traitors are many but most come under one of the “Money, Ideology, Compromise, Ego” (MICE) headings. The next level up are “Contractors” called in to do certain kinds of more specialist technical work they are essentially “guns for hire” and have “No Official Cover”(NOC) and may even be citizens of the country being spied upon such as criminal of certain types. As a rule contractors are told nothing other than relates to a specific technical task that has to be carried out at a location they might not even be aware of (ie they may get “black cabbed” in the back of a fully blacked out van etc to and from the location). Contractors are also frequently “day trippers” or “out of towners” that is they are not local by a long way to the location they are to work in, whilst they may meet an agent or a NOC operative they rarely get to see those who have Official Cover, this is so that “Arms Length Managment with Official Deniability”(ALMOD) can be maintained. Often more regular Contractors will not get payed directly they will get “Pension Advice” that is told certain shall we call it “insider trader” information to make investments with. Or if they have a solid legand as a small company etc indirect purchases will be made for consultation and similar services. However some contractors are actually “saps” that is business owners / travelers who get persuaded to “keep their eyes open or closed” for “patriotic reasons”[3]. Promises may be made to them about their safety but they are mostly worthless, however their business might get government contracts and loans and all sorts of “Business Development” funds etc to in effect keep them going and the Intel flowing back. Which leaves the more interesting “baboons” climbing the IC agency tree, those with Official Cover and those who would normally be under Official Cover but are infact “illegals” Not under Official Cover (NOCs). These people are employed directly by the national security services (IC) of the nation aquiring the intel, Official Cover is effectively “Diplomatic Immunity” NOC is “Pray to the gods with your ear to the ground” and is often “deep cover” established over many years. Those with Official Cover are “Case Officers” and above using such titles and activities as “commercial attache” etc and to many Dipplomatic Mission staff that is exactly what they are, often with only one senior mission officer knowing their real role which is most definitely kept even from the Ambassador etc. Though some “idiots” get known from having failed in basic field craft. One such was “third secretary” Ryan Fogle at the US Embassy in Moscow back in may 2013. Some intelligence or case officers however only find out when all of their agents disapear or get executed in front of others they worked with as examples etc. As happened in China when the CIA lost a number of it’s agents, the problem is to work out if it was a human or technical failing… The FBI assumed “human” but then they would, in the CIA however others point the finger at “technical” failure via the Internet. Either way the agents are dead and “the message sent” by the Chinese Intelligence Services to not just their own people but foreign nations public as well.

[3] In the UK this practice of running saps all got blown out of the water by the fallout of the US “Iraqigate”. It was started by US Customs and clashed with a George W. Bush pet project, almost compleatly ignored by the US MSM Iraqigate which gave Iraq it’s nuclear weapons project tools and parts along with Project Babylon that gave rise to the “Supergun Scandle” resulted in the UK to a number of things one of which was the collapse of the Matrix Churchill court case and likewise Sheffield Forgemasters. This turned into a mess of gargantuan proportions that even Hercules would not have been able to clean up…

lydia August 6, 2020 8:44 PM

Well, that means they can’t use General Motors vehicles with OnStar. That’s going to hurt.

How so? AFAIK, pulling one fuse will disable it. Some of the minor systems (like the radio), they might go off, but it’s nothing to worry about.

Q August 6, 2020 10:26 PM

Why all the focus on only the google/apple spy devices?

The vehicles also expose your travel path. With tire pressure sensors transmitting all the time. And whatever other transmitters the vehicle has. But those are not the worst of the culprits. The license/number plate, combined the the ubiquitous readers for them, will leave a trace of your journey.

So what other options do you have? Walking/cycling everywhere? Sure, if you can disguise your bike, your face and your gait in some way that is different each time.

Fred August 7, 2020 5:37 AM

Claiming back a large chunk of your privacy in modern police states isn’t actually that difficult for the majority, they just have no willpower to change their habits long-term (i.e. they are sheep):

  • Cash, always cash for purchases. If they don’t accept cash, they don’t get your business.
  • Have a data harvester / personal profiler (cellphone) that never leaves your home. Use it for pokemon videos, trashy internet browsing of limited value, and innocuous calls/messages. Let them profile away the drudgery of your modern existence.
  • Sensitive topics or personal subjects take place in person, without any tech gadgets in the vicinity to eavesdrop.
  • When outdoors and cellphone-free/smartwatch-free/any electronic gadget-free, enjoy the wonderfully wide world outside of a small screen. Take notice of how remarkably self-absorbed your fellow citizens have generally become. They are constantly furiously swiping their machines and often oblivious to their surroundings. Carrying gadgets has become a chore, and if somebody happens to try and contact you when you’re out, guess what? They’ll just have to wait until you return. Back to the future (1980s) – how refreshing!
  • After leaving your home without any data harvesters on your person, unless you are in high population locations with scores of cameras using facial recognition, they have minimal knowledge of your movements when on foot. Especially since you’re not giving them any financial transactions as a secondary tracking method.
  • Yes, they’ll get snippets of your movements with license plate scanners etc. when driving, but it is imperfect – particularly in less populated areas.
  • If you want it/need it bad enough, use Master Clive’s knowledge of one-time pads to generate a set of physical IN-OUT pads with your essential communications buddies. Use WPS steganography and a preferred location to communicate i.e. “Fred” on the Schneier forums for instance. This is impervious to quantum computers, perfectly defensible (“I was just chatting about security y’all!”), cracking and most other BS the lazy spooks attempt, since they won’t bother breaking into your house for a suspected OTP unless they think you’re ISIL or whatever.
  • The Internet is generally a lost cause for privacy – but TAILS on a USB for use from computers in random locations isn’t a bad option just to stick it to the man.
  • If you really want privacy while learning about a new subject, try – shock horror – reading a book on the subject, which you bought using cash. Large secondhand book stores are perfect for almost any subject of interest.

The steps above can protect against a lot of easy profiling & data harvesting, as well as protecting essential comms and topics of interest if you can be bothered. They’ll still harvest loads of low-grade material like your favorite recipes, family gripes, entertainment interests etc. but you’ve got to feed the hungry beast a little to keep it on a leash.

Most won’t take any of these actions because, well, they are tech-addicted muppets that refuse to be mildly inconvenienced by losing access to electronic gadgets/services 24/7.

Clive Robinson August 7, 2020 8:42 AM

@ Fred,

Sometimes it helps to leave trails etc, a habit of being somewhat scatty / absent minded can be usefull as it makes predicting your behaviour difficult for observers to do much about you.

Because most of the time if you behave randomly all they gain is noise.

It’s a variation of what “close protection” details tell you about using different routes and times to go to work etc, because being predictable can get you kidnapped and killed.

With regards One Time Pads they have one or two disadvantages one of which I’ve mentioned before is their output is “to random”…

That is if anyone sees the actual OTP output they can identify it as being potentially a One Time Pad thus they will know what they are looking for.

Traditionally you do not use an OTP on it’s own but as “super encryption” that is you would use a lower grade encryption to do message encryption to make the plaintext unrecognisable as such then use the OTP for “link encryption”.

This later became “compress then encrypt” which whilst it saves bandwidth does not actually make the plaintext unrecognisable… The problem with “compression” is that there are two types. Those based on

1, Language letter frequency.
2, Message letter frequency.

As a general rule of thumb “message letter frequency” gives the greatest overall compression. However in most cases it leaves clear artifacts in the message statistics with minimal compression at the begining and greater compression towards the end. The problem with such artifacts is that they can be used as “little cracks to make bigger cracks” or like a hanging strand on the bottom of a knitted jumper give a thread by which the entire jumper can be unraveled…

Thus in certain circumstances those that work on language letter frequency have advantages. One of which is they can decompress what has very flat statistics into something that appears to have natural language statistics.

One such compression system is attributed to Russian Spys and is easy to use with pencil and paper.

In essence it converts a twenty eight charecter alphabet with natural language statistics into a stream of digits usable with a numeric One Time Pad. When used in reverse it takes the digits out of the OTP which have near totaly flat statistics and “decompresses” it back to a good approximation of natural language statistics. This gives the false impression that a simple insecure substitution cipher has been used with a transposition cipher thus misleading those attempting to break the system.

But importantly it does not look like data “encrypted” with modern encryprion algorithms. Such subterfuge has a number of advantages when it is automated systems looking to break an encryption system.

But as a note of warning,

    Do not compress plaintext that is to be used with non determanistic stream ciphers like an OTP

The reason a One Time Pad is secure is that there is no determanistic relationship between each character in the pad. That is each character is independent of any previous character and also equiprobable.

This means that any plaintext message of the ciphetext length or less is “equiprobable” and not differentiable from each other.

The flip side of this is “deniability”. If I’m seen sending a message and the ciphertext is recorded, if I’m using a determanistic algorithm to encrypt a message, means that if you obtain the key you can with a very high degree of probability prove I sent the message. Which means I’m vulnerable to a failure in security at the second party I’ve sent the message to, especially if “they get turned or become a traitor”…

However with a One Time Pad, all charecters in a plaintext are equally valid. Thus if the second party suffers a securiry failing or becomes a traitor the only key they can loose / hand over is the OTP used for that message in their possession. Which means that they can not implicate you. Because you can simply provide a diferent OTP which decrypts the ciphertext to an entirely different plain text. In fact therecare tricks you can do that will in a non experts eye make your OTP look more correct or valid than the second parties OTP.

Such considerations can be important, especially when you use an OTP to make “plaintext one time messages” that are sent as apparent plain text. I’ve mentioned these before so I won’t go into them again because the explanation is lengthy but with care such systems are fairly deniable, provided you remember a few critical rules, the most important of which is not to have correlation between a message and any other parties “observable actions” or your “observable actions”.

echo August 7, 2020 4:04 PM

@Clive

Sometimes it helps to leave trails etc, a habit of being somewhat scatty / absent minded can be usefull as it makes predicting your behaviour difficult for observers to do much about you.

Because most of the time if you behave randomly all they gain is noise.

It’s a variation of what “close protection” details tell you about using different routes and times to go to work etc, because being predictable can get you kidnapped and killed.

Security by obscurity house of cards made of of Swiss cheese brainfart loser loafer? I’m acing this.

Of course everything is really psuedo random and operates within constrains so you’re dealing with a stack of probabilities. Personally I also have the habit of changing my wardrobe semi randomly. It can be a skirt suit or something Parisian or denim skirt from week to week or depending on mood and what I’m doing from gardening to shopping to a meeting up to 2-3 changes per day. Now there are places and times I avoid because the threat level isn’t something I want to gamble with but also places I can go where someone following me about or causing trouble would look a complete dork.

This is the thing missing in a lot of peoples deliberations. Security can be gendered. Risk analysis and lifestyle are different. This can narrow some opportunities but open up others depending which way you look at things. In the television show “Condor” (based on the movie “operation Condor”) this kind of thing is explored in a very irritatingly stereotypical way. All the men are making tough action orientated “kenetic” decisions and being technical while women are just seductive honeytraps and ditzy swooning at the first sign of trouble tree huggers. Well sure but as with all things what is important isn’t always what is said but what isn’t said. And that’s where an element of surprise can be concealed. Cue “fear, surprise, ruthless efficiency, fanatical devotion to the Pope, nice red uniforms” sketch or grannies with HKs in their knitting basket which are also somewhat overdone tropes.

But back to the topic. I’m also pretty random with my phone. I honestly cannot stand the way people fiddle with the things all the time and the clock in my head is pretty reliable too. In fact the only reason I wear a watch is to balance out my jewellery plus it looks expensive which impresses some people. £30 from Argos plus my phones are either “pre-loved” or bargain basement. When out I’m either going somewhere or would pefer a real book or magazine to entertain myself, or indulge a habit of “people watching”.

Very Interesting August 7, 2020 4:42 PM

Turning off the device may not be sufficient if a device has been compromised.

Can anyone provide a reference for known malware/spyware that hijacks the ON/OFF switch on a phone such that the act of turning off the phone does not actually turn off the phone?

Jesse Thompson August 8, 2020 2:19 AM

@Mike, @Everyone

AIRPLANE MODE (by itself) DOES NOT PREVENT GPS TRACKING.

Airplane mode may be a fairly reliable way to prevent transmission of whatever bands of signal might interfere with airplane operations (up to and not beyond how much the airline cares and are likely to try to confront a passenger who actually used the mode), but GPS is a receive only technology.

That means that perhaps your cellphone might not be using that exact moment to broadcast what your location is, unless you destroy the phone prior to disabling airplane mode again it can very well simply record where you traveled and use it’s reconnected status to then tattle on that entire journey.


That said, I am going to recommend what’s probably blasphemy in these comment sections. Security minded folk should know what the strongest means of preventing information from being exfiltrated or abused are, but that does not mean one is best served by always using those methods.

In a majority of circumstances perfect/flawless security is unweildy in the face of knowing how to balance risks and how to neutralize the effectiveness of attacking any part of your operations that are imperfectly secure.

F/E for how many people is it a danger that their whereabouts or their itineraries are known? How important is it that words that you speak in conversation remain sacrosanct? What payloads are you trying to secure, and from whom? Most of our (direct) adversaries are not nation-states, and for zero give or take zero of us do nation states see us as a direct adversary.

Our direct adversaries might include thieves, scammers, personal adversaries who are usually jilted or meddlesome friends/lovers/family/peers/business partners, and if we’re involved in business then potentially adversarial customers or competitors.

Via the internet the first two of these are the largest potential threat due to global reach (eg, the most effective bad actor in the world has a small probability of reaching and damaging you in particular) but the skill ceiling is surprisingly low for every other one of the threat types described here. Vanishingly few of us encounter people in meat space with a combination of motivation to do us harm and greater technical capability than “I can’t find the ‘any’ key”.

So for most personal fieldcraft, I’d estimate that we need the strongest security available to defend against global-reach theives and scammers, some degree of enterprise-grade security against adversarial customers or competitors, but little more than Schneier’s Law levels of security against the other person-to-person threats mentioned above.

And if you do know any one personal adversary (motivated to potentially do you harm) you believe to be anywhere approximate to your technical equal then it’s best to call in reinforcements and enlist the help of one or more other folk at at least your level to devise defenses against them.

As always I do not advise Luddism as a primary security measure. One will find that all of the ills of the Earth pre-date both the Internet in particular and Electronics in general. Theft, assault, blackmail, extortion, corruption: I don’t think anyone would gainfully argue that either we or our property (either on average or in extreme cases) are in any greater personal danger from animate adversaries today than we would have been 100 years ago.

Consequently, renouncing technological availability has no protection to offer us vs how our ancestors lived.

The question is always “can I make this tech option profit me more than it might lose for me”. And focusing on “if the NSA wanted to bring me down, but could only do so by using this cell phone to track my location or spy on my conversations, then I might potentially be harmed by it” while ignoring all potential gain the device has offer is going to result in an immensely skewed sense of risk and reward.

Peloton August 8, 2020 9:41 AM

@ Clive Robinson

Can “proper” information about OTP be found in some sort of “definitive” guide?
Going through 15 years of blog archives to try and piece it all together is …

Clive Robinson August 8, 2020 1:58 PM

@ Peloton,

Going through 15 years of blog archives to try and piece it all together is …

Somewhat daunting/tedious?

However it’s not just individual comments but whole conversations some longer than the number of comments in other post pages.

Also some of those conversations whilst essentially saying the same basic thing do so from entirely different perspectives. Which helps give people more insight and thus ideas to apply the knowledge in other perspectives.

I guess at some point I will write it up but not sure what I’ll do with it if I do. One thing I do know is it will sure upset some people. But as history for atleast a millennium shows,that is the consequence of any independent thinking.

There is one thing I will avoid though is anything other than very simple maths to see why think of it this way,

You have two independent random streams of characters, and you XOR them together (or add them mode the charecter size). Ask yourself a question is the result any more or less random than the two independent random streams? There is an answer to this but…

Zaphod August 8, 2020 2:29 PM

Lots – LOTS of gold in this thread. I’m surprised it hasn’t been ‘cancelled’ yet. I, for one, am keeping an offline (paper) copy.

Thank you to the usual suspect!

Z.

SpaceLifeForm August 8, 2020 4:25 PM

@ Clive

Is this a rhetorical question? Why the ‘but…’?

I’ll bite.

“You have two independent random streams of characters, and you XOR them together (or add them mode the charecter size). Ask yourself a question is the result any more or less random than the two independent random streams? There is an answer to this but…”

You can not create entropy from thin air.
The XOR will not create entropy.

At best, you have only wasted half of your entropy. At best.
But, you may very well end up losing entropy, especially in the case where the two random streams are not as random as appears. You can never gain entropy.

You would be better off using the two streams and maybe interleaving the bytes. You will not lose any entropy that way.

echo August 8, 2020 5:53 PM

@SpaceLifeForm

This kind of thing partially explains my semi-interest in cosmology. Some of the problems seem similar to cryptanalysis plus you get pretty pictures even ifyou don’t understand 99% of the maths. I also think it may be a fair comparison technology wise. This lead me to question the assumption which has become fashionable lately which claims tier one intel agencies aren’t so far ahead today. I have no idea if cosmology is a good benchmark or not but the comparison seemed interesting.

Clive Robinson August 8, 2020 11:13 PM

@ SpaceLifeForm,

You can not create entropy from thin air.

That’s a statment not a proof[1] and secondly you have replaced “random” with “entropy”.

Now try a little “semantics flip” with the original question, that is replace the word “random” with “information”. So,

    “You have two independent information streams of characters, and you XOR them together. Ask yourself a question is the result any more or less information than the two independent information streams?”

After a moments thought you will realise that in this question the words random and information have the same meaning. That is the inputs to the logic gate are pertabated from a steady state by a signal that is a stochastic source that may or may not be statisticaly biased. The fact that it may or may not have meaning to an observer should not change anything.

Remember the laws of physics do not alow you to destroy information which is why we have Fredkin and Toffoli gates[2] that are reversible.

Now remember with a two input XOR gate if you take any one of the inputs forwards XORing the two outputs gives you the other original input. That is the two input XOR is in the forward direction a “Controled NOT Gate”.

Thus the output from an XOR gate can be shown to have both information inputs at it’s output as some kind of difference signal. Which most people can see fairly easily with a “graphical proof”.

The implication is thus something happens to the information from the two inputs that does not make it to the output. Effectively some inverse of the difference signal that accounts for the lost energy. The simple argument is that as work has been done the missing energy is part of the heat[3] in the logic gate substrate[4]…

Which brings us back to “entropy” in the thermodynamic sense but what of the information theoretic sense?

It all starts getting a little messy…

You start to realise that there is no such thing as “random” in effect it’s the sum of all previous interactions of information sources. At which point a dark room with a bed and a cool towel for the forehead starts looking good to most people.

Which is one of the reasons we have security problems in the first place. That is nearly all our “security proofs” are in effect superficial and based on the assumprion that underlying proofs exist and are valid

[1] Actually some would disagree with you, and say you can create what they call entropy from “thin air” or atleast the air movment in a restricted space… It’s how you get some of those minute timing differences in hard drives that some people call “entropy”.

[2] https://web.archive.org/web/20061017232512/http://www.digitalphilosophy.org/download_documents/ConservativeLogic.pdf

[3] The problem is that it “becomes heat” is only “eventually true” where eventually could be the better part of the life of the universe. That is we know that the energy is actually a signal and does not become heat untill much later when it is absorbed by a thermal mass, untill then it is a signal that is a side channel carrying the information with it. As we know photons can cross the universe carrying their signal for millions of years…

[4] Yes the words “gate” and “substrate” are “overloaded” and have different meanings depending on what level you are in the computing stack. Logic gates tend to be made of “Field Effect Transistors”(FETs) and –not bipolar junction– these days, and one of the FET’s inputs is a “gate”, the other two are known as the “drain” and the “source” and in a simple FET model they are interchangable. However in most practical implimitations they are not due to some more curious aspects of physics.

MarkH August 9, 2020 4:15 AM

@Clive, SpaceLife:

I’ve never thought critically about why the word “entropy” is typically applied to the bit strings used for crypto operations like pseudo-random bit generation — whether it’s the best word, or even makes sense at all.

I’ll try to wade in a little carefully, not having deeply studied the relevant subjects.

The maximum information content of a bit string is the number of bits, period. If you know in advance that the string is structured or patterned in some way, the content is less than the bit count. Obviously, if you know that 10% of the bits are always zero, then the conveyed information is much reduced.

A way to formalize randomness is in terms of predictability. The less predictable, the more random; if the probability that each bit will be one is exactly 0.5, independent of the values of any other bits, then randomness is at maximum, and the information content is equal to the bit count.

I think it’s fairly intuitive that in an XOR combination, it is sufficient that one of the bit strings be random. Even if the other input is highly patterned — always zero, for example! — the output will meet the definition for perfect randomness.

That’s exactly the concept underlying the infamous “one-time pad”. If the key stream is perfectly random, so also will be the ciphertext, regardless of the properties of the plaintext.

So, imagine that I have a perfectly random string of 1024 bits (possible to create in theory by tossing a “fair coin”). By dint of its randomness, its content is also 1024 bits.

My structured 1024 bits of plaintext might convey only one or two hundred bits, for example. But the XOR output ciphertext has an information content of 1024 bits.

When neither bit string is perfectly random, the analysis is more complicated. But I think I have it correctly, that the XOR output will generally not contain less information than the “richer” of its two inputs.

I think the case is put correctly above … I hope it may be enlightening or clarifying.

Clive Robinson August 9, 2020 7:51 AM

@ MarkH,

I’ve never thought critically about why the word “entropy” is typically applied to the bit strings used for crypto operations

It’s a doubly borrowed term[1]. First from “Information Theory”, which borrowed it from “statistical mechanics” (Thermodynamics).

It is derived from the Greek word for “transformation” it’s a fairly new idea (about a century and a half old). In thermodynamics it is considered to be a measure of “disorder”. However in information theory it’s actually a measure of “possibility”. But it’s not realy a measure of “randomness” as most people understand “random” in part because the definition of random is usually lacking rigour and is frequently cyclic such as “it’s something that looks random” kind of one of those “you’ll know it when you see it” a definitions that have a habit of usuall being wrong[2]…

To use the lego brick analogy when you have all the bricks stuck together in one block you in effect only have one object thus the number of possibilities you have with it are minimal this is the coherent or organised state and has minimum entropy (and if you’ve done it right minimum volume). As you break the block up the the bricks are nolonger organised in a coherent singular block, not only does the volume go up the bricks have more degrees of freedom thus more “possibilities” to be organised in. When all the bricks are all seperated then you have the maximum number of possibilities.

One thing to think about, a “closed but not issolated system” like a heat sealed glass tube with a liquid like water in it. Provided their is sufficient internal volume the water can be a solid like ice, a liquid, or vapour, depending on the thermal energy crossing into or out of the closed but not issolated system. The freedom of movment is minimum when it is a solid maximum when it is a vapour, and it can cycle through these states as often as you like. However when solid the individual molecules can be in any random order, however from the perspective of entropy that is irrelevant.

[1] But to answer your original question… It’s John von Neumann’s fault 😉

When Claude Shannon derived his equation for signal attenuation effects in telephone lines (he did work for Bell after all). He was stuck for a name to call it. On chatting to John von Neumann about the problem the conversation was later related by Claude Shannon as,

    I thought of calling it “information”, but the word was overly used, so I decided to call it “uncertainty”. Von Neumann told me, “You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage.”

Thus there you have it the most important consideration is that,

    “nobody knows what entropy really is”

[2] Humans have a lot of problems with random, we like a little but mostly we hate a lot but not as much as we hate it’s absence. Donald Knuth has a discussion about it in one of his books on Tex and Fonts, regular type script as produced by typewriters look harsh and unpleasent when compared to slightly more random ways of presenting type along a line, actually better than early proportional spacing.

echo August 9, 2020 9:15 AM

@Clive

The maximum information content of a bit string is the number of bits, period. If you know in advance that the string is structured or patterned in some way, the content is less than the bit count. Obviously, if you know that 10% of the bits are always zero, then the conveyed information is much reduced.

This is both correct and not correct. The mistake being made is the system or problem or precepts isn’t defined at the start. Compressed data and indexed data and perhaps even compressed indeed data will be longer than the literal string.

The mistake many amatuers not properly qualified in philosophy is they don’t get their precepts right at the beginning so they begin arguing over the terms and conclusions first. Thus, one small tap and their house of cards tumbles to the ground…

“Ah but” and “we assumed”…

This kind of problem is in physics and if anyone wants to be pedantic in maths because maths is the language of physics and some (not all) have the idea that if an idea cannot be expressed mathematically then it cannot be true and therefore this is not physics.

There are loads of issues with hidden variables and constants nobody can get to the bottom of whether it’s the deeper quantum level stuff or speed of causality. The standard model may be an achingly close and persuasive model but it has flaws and isn’t totally complete and has a few problems. Yes we have problems with random because of poorly defined understanding and yes we have problems with random because nobody is completely sure whether it is real or not real and yes we have problems with random because people use and abuse and trip over labels. In fact so much so after I pointed out on this blog the misunderstandings quantum physicists had with their own subject and understanding it and explaining it Ethan Seigels essay on the topic (which I linked to at the time) was published a few weeks later.

You have similar issues being caused today by “certified professionals” and job titles in the middle of a pandemic with virologists saying one thing and the usual know it all blowhards we call doctors throwing hissy fits simply because both used a different jargon word and NIH empire building and spats over status kicked in.

“Words can convey meaning but words cannot convey understanding”.

Please don’t go around calling or implying people are stupid without checking your precepts and learn some tact. People are people not machines.

myliit August 9, 2020 11:12 AM

Besides the comments above, I found the “News Articles” in the OP interesting. From a link in the last paragraph in the first news article:

“’Shattered’: Inside the secret battle to save America’s undercover spies in the digital age

[ … ] By this time [ late 2015 ], massive amounts of digital records were being stolen — by insiders like Snowden and by adversaries like China, which also targeted private companies like Anthem, Marriott and others, in addition to spearheading two breaches into the OPM, which were revealed in 2015. The full extent of that theft, which included personal disclosure forms, clearance adjudication data and perhaps other linked intelligence community databases, has never been revealed.

“Part of the discussions we had was, post-OPM hack, we didn’t realize that digitizing government records profoundly changed the threat profile,” says a former senior national security official. The intelligence community did not fully understand how much of its own information was stored outside its own walls until personal data began being stolen by China en masse, says a former senior intelligence official

[…]

The CIA and FBI both concluded that every person connected to these organizations’ “black side” [NOC, presumably] undercover programs had to be completely sealed off from the rest of their colleagues, say former officials. This firewall is an immensely complex undertaking in a world where electronic emissions from a single cellphone traveling, say, from CIA headquarters in Virginia to an unmarked office building nearby could blow multiple undercover operations. The FBI has also struggled with this transition. As of a few years ago, “none of this was completed yet, and none of it was even remotely being done easily,” says a former senior official.

[…]

But even publicly, some intelligence officials are lamenting the dangers posed to cover, though they disagree over whether the problem can be addressed with new programs or procedures. Many are pessimistic that tweaking existing approaches will suffice.

“We can’t protect identities anymore. Tech is going to make it almost impossible. I think we need a new paradigm,” said Eric Haseltine, the former head of the NSA’s research directorate, at a lunch event in Washington in late October, [2019, presumably] when asked about the problem.

“Our officers overseas are known,” he said. “That’s a hard pill to swallow.”“

MarkH August 9, 2020 12:45 PM

@Clive:

Thanks for the history — classic von Neumann, both ingenious and witty.

@echo:

I believe that the language you quoted is correct in terms of the discipline of information theory.

It defies common sense that the binary encoding of an encyclopedia volume contains less information than a random bit string of the same length. Even so, that’s my understanding of how information theory applies.

It’s helpful that you mentioned data compression. In the example of the encyclopedia, a lossless compression algorithm might reduce the bit string length by 90 percent. I suppose it would be intuitive for most people that both the long and short representations hold the same amount of information.

The compressed version, whether by visual examination or statistical measurement, is far less patterned, and thus far more similar to a random bit string … even though it’s not random at all, being in fact fully determined.

vas pup August 9, 2020 4:47 PM

@Mr H – Thank you for the link provided.
Interesting and useful – it is working as if you remove battery, but without such trouble plus same applied to your credit/debit/id cards, US Passport as well because crooks find out the way to scan information from those listed just by scanner in close vicinity to you. I have no idea how it is working with COVID social distancing requirement(1.8 M), but it is better to be safe than sorry.

SpaceLifeForm August 9, 2020 11:46 PM

@ Clive, echo, MarkH

Entropy – A black box that in theory can output random bits. We can NOT OBSERVE state inside the black box.

Random – A string of bits generated by the Entropy black box. We can now OBSERVE state. We can even measure it for it’s entropy. We can not discern which bits are entropy. But, we can measure that is has some entropy.

Information – An INTERPRETATION of the random string. A given bit string may look totally random to one observer, but have meaning to another observer.

Philosophy – A framework to manage information.

Insanity – The framework can no longer manage the information.

Clive Robinson August 10, 2020 5:10 AM

@ SpaceLifeForm,

Remember you are only “observing” the output of the black box.

Thus lets change it slightly…

I have a blue box with an AES encryptor in it, and also another box that generates Pi or E to as many places as is needed. Thus the output from the irational grnerator feeds the AES algorithm and that’s what comes out of my blue box. I now swap my blue box for your black box without you seeing, so as an observre of the box output what difference do you see?

Thus maybe your definition,

    “Random – A string of bits generated by the Entropy black box. We can now *OBSERVE* state. We can even measure it for it’s entropy. We can not discern which bits are entropy. But, we can measure that is has some entropy.”

Needs a little adjustment, as you’ve left out that awkward range of determanistic signals we can see some but not all of. But also our “tests” by which we measure are at best woefully inadiquate as they are primarily statistical or test for a specific type of simple determanism.

It’s why in the past I’ve talked of any such output having the following normalised (0-1) components,

1, Bias (domain offsets).
2, Noise (determanistic sources).
3, Faux entropy (chaos).
4, Real entropy.

And that you need to strip off the first three in any TRNG you are making.

Now the interesting thing to remember that if it were possible to quantify discrete samples and draw these up as a Venn diagram you would discover most points would fall in some broad overlap.

That is it’s not possible to get rid of the first three signal components to leave the desired fourth. The best you can do is “find an average and remove that” but that still leaves a component you can not remove[1].

Now what you might have missed is that the XOR process is effectively not just addative, it has a multiplying effect as well. It’s why we can use them as “phase detectors” in the likes of Digital Phase Lock Loops” (D-PLL). Thus the output of the XOR contains four output frequencies, that of the two inputs and their sum and difference frequencies[2].

Without going into the mathmatics you can show that the result from the practical addition of two random streams is whilst you can increase the first three signals including “false entropy”(chaos) you always end up with less real entropy…

Another way to think about it –but is not mathmatically correct– is that in any measurment domain you would normally only “expect to get” a “normalized RMS” result as you do with noise,

Out = C sqr(Ain^2 + Bin^2)

(C is a normalisation constant for the measurment domain and the number of input signals, for amplitude and two inputs it’s 0.5)

[1] The process of finding an average is in effect integrating which is the same as applying a “low pass filter”. Such a filter obviously favours the low frequency components of the signal, but it also has a frequency dependent phase delay as well as a time delay. Thus you can not just subtract one signal from the other.

[2] You can prove this to yourself with a graphical proof. It’s also the reason people make bad design decisions when making TRNG’s especially when building them onto chips especially with “ring oscillators” which suffer from “injection locking” as well. It’s probably the reason they allways hide their TRNG behind “magic pixie dust thinking” crypto hashes and the like.

SpaceLifeForm August 10, 2020 4:33 PM

@ Clive, echo, MarkH

So, we agree. You can not create entropy out of space (thin air).

Only the cosmos can. Allegedly.

Only Schrödinger’s cat knows for sure.

‘ Without going into the mathmatics you can show that the result from the practical addition of two random streams is whilst you can increase the first three signals including “false entropy”(chaos) you always end up with less real entropy… ‘

hXXps://www.americanscientist.org/article/quantum-randomness#

Allegedly.

hXXps://www.quantamagazine.org/how-randomness-can-arise-from-determinism-20191014/

hXXps://www.quantamagazine.org/real-life-schrodingers-cats-probe-the-boundary-of-the-quantum-world-20180625/

MarkH August 10, 2020 6:31 PM

@Clive, echo, SpaceLifeForm:

I don’t understand the meaning of SpaceLife’s formulation, of creating entropy out of thin air (or space).

In my limited understanding, entropy is a statistical property of a system. What does it mean, to say that such a property is created?

In thermodynamics, various physical processes may increase or decrease entropy. Do the physics books speak of creating entropy?

In information theory, “entropy” corresponds to what I called “information content” in a comment above.

To maximize the entropy of a bit string, each bit must be chosen by a non-deterministic process in which each the probability of any bit being 0 is equal to the probability of it being 1, which implies that it is memoryless (knowledge of any or all other bits in the sequence cannot be used to “predict” any bit with better than 0.5 probability of correctness).

As Clive mentioned, air can indeed be used as the basis for a true (physical) random number generator. But so can a fair coin or die.

Clive just presented some ways in which practical TRNGs deviate from this ideal, though the defects can be reduced by careful engineering including post-filtering.

In practice, you can get high entropy (perhaps 0.99 bits of entropy per bit of output). Good hash functions can “condense” the entropy, resulting in output with 1 bit per bit.


I take the liberty of repeating part of my previous post, rephrased to use Shannon terminology:

… in an XOR combination, it is sufficient that one of the bit strings have maximum entropy. Even if the other input is highly patterned — always zero, for example! — the output will have maximum entropy.

That’s exactly the concept underlying the infamous “one-time pad”. If the key stream has maximum entropy, so also will be the ciphertext, regardless of the properties of the plaintext.

In contrast, Clive has since written:

… the result from the practical addition of two random streams is … you always end up with less real entropy

I note first, that this seems (at first glance) to state the opposite of the assertion I made, and second, that its truth is not (at first glance) apparent to me.

As to opposition, Clive and I probably weren’t talking about the same thing. I suppose that Clive meant streams in which the entropy is significantly less than maximum (perhaps 0.5 bits of entropy per stream bit, or even less).

Also, Clive is distinguishing “real” entropy (the inherently unpredictable) from chaos (deterministic systems with very complicated time functions of state).

Mr Robinson, if you’ll take the time to respond, am I understanding what you wrote correctly? And if you disagree with what I wrote for the case of XOR having one maximum-entropy input, where is the error in reasoning?

Mr Anon August 11, 2020 1:26 PM

@echo

Based on my own experience wear of an SSD is a non-problem. . .

I have a Samsung T3 250GB external SSD purchased June 12, 2017. It cost $99. Until a month ago it was the boot drive for a Mac Mini. (The 2012 Mac Mini internal mechanical hard drive had an estimated 1% of its useful life remaining when I switched to the SSD.)

As of this writing the T3 has 13,828 hours of power-on-time. The total number of Logical Block Addresses (LBAs) written equals 32.3TB. (Note that TRIM has never been enabled because the external SSD is incompatible with TRIM.) The lifetime left indicator is currently 72%. It is currently being used to house a bootable clone of a 2018 Mac Mini internal PCI-E SSD.

If and when you upgrade to a computer with an internal SSD (which will most likely be compatible with TRIM) it should be capable of an even longer useful lifespan than the Samsung T3.

Stats compiled by DriveDX.

Clive Robinson August 11, 2020 3:16 PM

@ SpaceLifeForm,

You can not create entropy from thin air. The XOR will not create entropy.

No it won’t, also it diminishes the entropy as well.

Think of it this way you have four buckets two of which contain hot water these are the inputs to the XOR gate, one is currently empty that is the XOR gate output and the fourth well that is the XOR gate substrate.

Lets assume for simple math one input bucket is at 40C and the other at 20C.

It’s not hard to work out that as the buckets contain the same volume of water the result will fill two buckets (output and substrate) and the temprature will be,

(B1+B2)/2 =(40C+20C)/2 = 30C

With one half of quantity going to the ouput and the other half to the substrate.

So the “heat” can never get above the maximum of the two inputs, and half the work potential is being wasted via the substrate.

Another issue is to do with output range as I mentioned earlier the normalized RMS value in analogue circuit adding two noisy voltages, averages to,

Vo = k sqr(V1^2 + V2^2)

Thus for a system where the input range is the same as the output range the maximum usable gain without compression would be a half. Thus the constant k is a half. Thus if the noise signal in both inputs is maximum (1) then after addition of the normalised squares it would be 2 the square root of 2 is 1.4142… Thus after a gain of 0.5 the effective average will be 70.071%… It sounds good but in practice if the noise voltages get to the point their instantanious sum is greater than one them the output will go into gain compression which is very nonlinear behaviour thus the output “hits the rail” which in effect means any noise in that period drops to zero, thus any amplitide entropy value it might have had is lost.

However with “hiting the rails” comes another problem it also causes the output impedence of an amplifier to drop close to zero thus maximum current to the load goes up as thr impedence drops which causes effects on the power supply rails that take time to settle out when the output is no longer “hitting the rail”. Also with high frequency circuits where the output drives a transmission line it causes a pulse that reflects up and down the line due to the impedance mismatch untill it is eventually absorbed by either the amplifier impedence or the load impedence at the other end of the transmission line (this effect is used in the more expensive bits of fault finding kit called “Time Domain Reflexometery” or TDR analysers).

But what you say might this have to do with XOR gates?

Well the dirty little secret we try to keep from “digital engineers” is that the XOR gate like all digital gates is actually analogue in nature… Let’s assume that like most logic these days it’s a variation of Complementary Metal Oxide Semiconductor (CMOS) using N-type and P-type FETs. Further that as the XOR gate is a compound gate it uses “NOT gate” functionaliry (NAND or NOR) to get the XOR functionality. At the input of each NOT gate the sensing threshold is thus ~1/2Vss and the output threshold is likewise ~1/2Vss. That is when the input is close to the input threshold because the NOT gate is actually a very high gain analog amplifier circuit the output is going to at some point switch from zero to Vss or from Vss to zero volts but you do not know at exactly when thus you have a narow transition region. However in that narow transition window the output can jump back and forth untill the input has left the transition zone in certain types of combinatorial gate –those with feed back– this gives rise to an effect called “Metastability” which with latches and the like causes “soft lockup”.

Thus you could use this input transition effect to generate “noise” for very brief periods of time. However whilst the points at which things happen are random they are effectively fixed thus the noise is mainly “chaotic” not “random”.

All physical determanistic processes have these “transition periods” in some way or another, the question is what is the ratio between chaotic and random. Because given care chaotic becomes repeatable thus biased or if you prefer “predictable” (as the coin toss machine demonstrated).

MarkH August 12, 2020 1:04 AM

@SpaceLifeForm:

If I may reword what you just posted, entropy in thermodynamics is not the same as entropy in information theory.

Supposedly, Claude Shannon thought that borrowing the term from physics made sense because (a) their measures (how much entropy) are both defined statistically, and (b) the mathematical expressions are similar.

But they are quite distinct things, and to me the borrowing of the name is a sort of poetic license.

In my earlier comment, I tried to express that I don’t understand what it means to create entropy, whether it’s one kind or the other.

SpaceLifeForm August 12, 2020 3:09 PM

@ MarkH

We are on the same page.

We (collective we), as best as we can fathom, believe we can create entropy or at least collect entropy into a black box.

Via events such as mouse movements, timing jitter, lava lamp, etc.

But, we really do not know that

“entropy in thermodynamics is not the same as entropy in information theory”

We just do not know that for fact.

In fact, we can never know for certain at the cosmological level, i.e., the physics.

We (collective we) are part and parcel of the cosmos, the physics, and any OBSERVATION or MEASUREMENT leads to a ‘disruption in the force’.

THere are things that we can never know because we (collective we) are inside the cosmological machine.

We can not leave the machine and observe it from an outside vantage point.

Study Schrödinger’s wave equation.

But not too long. Because the longer you try to understand it, the more you will realize that perceived reality is strange.

Clive Robinson August 12, 2020 4:18 PM

@ SpaceLifeForm, MarkH,

Because the longer you try to *understand* it, the more you will realize that perceived reality is strange.

A hundred years or so ago, mathematicians were known to have a tendency towards madness[1], esspecially those who contemplateted infinity…

The contemplation of a subject beyond a certain degree can be clasified as some form of monomania, but does it automatically cause mental illness, I’d like to think not.

Because I’ve spent rather more than most on the sharp end of the “big game hunt” for entropy in the real world.

You say,

We (collective we), as best as we can fathom, believe we can create entropy or at least *collect* entropy into a black box.

I personaly believe that all processes due to limitations of measurment can exhibit cahotic behaviour at a usually quite small level rather more frequently thsn we might expect. I can actually demonstrate this with simple logic circuits.

However of one thing I personaly believe –though I might be wrong– is “chaos” is not the “true randomness” we are looking for, to use as seeds in cryptographic processes.

What we call “entropy” is a statistical or probablistic measure. One problem with statistics is they are good at finding corelations in data. The converse is not true, that is they can show that a particular statistical measure does not show a correlation, but that in noway is the same as saying that “all imaginable methods will not show correlation” which is in essence what we are looking for when we talk about “Random”.

Which brings us onto,

    “entropy in thermodynamics is not the same as entropy in information theory”

Well the process of measuring it is the same in both cases which is why they are both called “entropy”. Thus various people have over time assumed that they are sufficiently similar and nobody has yet provided proof that the measure is invalid in either case. Whilst that is a poor differentiator it confers at some level the fact they are amenable to the same type of modeling, thus have some all be it abstract commonalities.

[1] There is an argument that part of the reason was the partialy inbred nature of European Intelligentsia. Especially in those with one particular set of Jewish genetic inheritors. How much credence you give this depends on your point of view. That is though we know that inbreeding has definite disadvantages in European Royalty and other “blood purity” cults, it’s yet to be clearly shown that there might be significant advantages…

MarkH August 12, 2020 6:30 PM

@SpaceLifeForm, Clive:

I feel much gratitude for this discussion, because it is deep, lies at the very heart of cryptography, and I want to better understand the subject matter.

Where SpaceLife wrote the word “collect,” that makes more sense to me than “create.” Entropy is a quantifiable property of a random variable … or more specifically in application to cryptography, a sequence of quantized (discrete) samples of one or more random variables.

The quantity of entropy in such a sequence depends on both the number of samples, and the distribution of quantized values for the variable. The more nearly equal the probability of each discrete value, the greater the entropy.


From his previous comments, I assess that Clive has looked more deeply into the challenges of “true” random number generation (that is, hardware generation based on supposedly unpredictable physical processes) than anybody else known to me. I have profited by reading some of his past comments about the thorny problems involved in this process.

Clive has drawn an interesting distinction between chaos on the one hand, and true (or real) randomness on the other … and modestly offered that he’s not certain whether the distinction is valid.

My own opinion is that the although the distinction is not binary, it is probably of practical importance in hardware random number generators in which feedback is part of the physical process being measured.

In its idealized mathematical expression, a chaotic function is fully deterministic: although the function’s value may “look really random,” its future values can be computed for any time (and significantly, so can its past values).

In this sense, chaotic functions are like the linear feedback functions often used within the “random” library calls offered in programming languages: random-looking in the statistical sense, but fully deterministic1.

In physical realization, the picture is more mixed. Consider using (as a hardware random input) turbulence in fluid flow , which I think is a good example of chaotic behavior.

Ideally, specific motions in a turbulent flow could be predicted by a sufficiently powerful fluid dynamics simulation. I suppose, however, that collisions of individual molecules — which are presumably beyond any accessible computation — are capable over time of influencing the evolution of turbulence sufficiently that it will be detectable and measurable.

To my mind, these submicroscopic phenomena qualify as “truly random,” and mean that with any imaginable computer, predictions of future turbulence would become less and less accurate as the time scale grows.

If I’m on the right track, then measurements of turbulent flow include a mixture of the deterministically chaotic and the truly random.

My surmise, then, is that it’s legitimate to use the lava lamp as a random source. However, the measured values can be expected to be partly patterned, because chaotic behavior is predictable even though it looks completely random.

That random data are “contaminated” is no problem in itself, because any cryptographic hash function (including old broken ones like MD5) can be used to “concentrate” the entropy [with the important proviso that no bits of entropy be reused].

However, in order to do this properly, it’s necessary to know how much entropy the source has. How to measure/compute the proportion of chaotic patterning, is something I have no idea how to do …


1 Such linear feedback functions have a second property that is very important in cryptography, namely that they are easy to invert. In contrast, Blum-Blum-Shub (or even the infamous Dual EC, which is believed to be secure with two modifications from the NIST standard) are fully deterministic, but believed to be computationally infeasible to invert.

MarkH August 12, 2020 6:40 PM

PS to Clive:

I hadn’t heard of this “mathematical madness.”

It reminds me of Nietzsche’s observation that if you look long enough into the abyss, the abyss looks also back into you …

If there’s any truth to the association, causality might run the other way: it seems plausible to me that people with unusual combinations of mental weakness and strength — likely to manifest as psychological disorder — are better able to marshal the enormous and prolonged concentration required to crack such abstruse questions.

Clive Robinson August 13, 2020 11:04 AM

@ SpaceLifeForm, MarkH,

As has been mentioned one way people judge entropy is by compressability.

In essence if you can find a program that will generate a chosen string with less length than the chosen string then the string is obviously compressable thus contains some measure of “redundancy”.

Well have you ever thought about how you would find such a string generating algorithm?

That is there are very many ways an arbitary string could be shortened but actually very very very few compression algorithms, and it takes very little to show they are far from optimum[1].

Well have you ever thought about how you might come up with an algorithm?

Well it took a little time to find it again but have a look at this paper, it might give food for thought,

https://arxiv.org/abs/1809.02942

[1] As an example take a block of plaintext like the text of 1984 or some such. First run it through algorithm A and make a note of the resulting compression, then do the same with algorithm B. Now try runing it through A and then the result through B, then do it the other way that is B first then A second. I used to do this years ago using a runlength compression algorithm then a dictionary algorithm. In some cases I could compress the likes of 5-15Mbyte of C source code files down enough to get on two or three floppies.

MarkH August 13, 2020 4:20 PM

@Clive:

Amazing, that you knew/remembered/found this paper. It doesn’t look like easy reading, but seems to specifically address the question of entropy in chaotic phenomena.

Wow!

SpaceLifeForm August 14, 2020 2:51 PM

@ MarkH

“The quantity of entropy in such a sequence depends on both the number of samples, and the distribution of quantized values for the variable. The more nearly equal the probability of each discrete value, the greater the entropy.”

Maybe. See the double-slit experiment.

Why is there a pattern?

Maybe reality is just a bunch of waves.

SpaceLifeForm August 14, 2020 3:12 PM

@ Clive

I believe one can always compress a given bitstring (of sufficient bitlength) via an analysis of the given bitstring and application of suitable algorithm(s). The key is knowing the algorithm(s).

Compression via secrecy.

myliit August 15, 2020 5:35 PM

re: open Wi-Fi & IP location issues

For example, https://www.openwireless.org

Is sharing one’s home or business wi-fi, with neither a password nor a captive portal, etc., afaik, like some individuals, libraries, businesses, and Wegman’s have done still worth considering? Since they have done it, it might appear that the liability issues aren’t insurmountable.

With our President a wannabe Fascist, or something like a 21st century version of one, future, ubiquitous, Open Wi-Fi has some appeal.

In general, i am thinking about homes and apartment complexes (both tenants and owners/managers) and businesses and the world in general

How about leaving my iphone with open wi-fi on if possible, without a password, while at coffee shops, etc., if i have “unlimited” data?

iirc, our host used to have an open wi-fi network. Maybe he might revisit this topic someday or somebody knows some good sources for current cons. The pros, imo, may be easy to identify

Main concern— liability and law enforcement organization issues. Anything else a host should worry about, with or without probabilities or costs, would be appreciated.

Any thoughts?

For liability, is a captive portal, password, or something like that helpful.

It would be nice if things like emergency call devices (for falls, for example) could work with previously joined wi-fi Networks (not require sim connection; eliminating captive portal) … or cellular over wi-fi, laptops, tablets, ipods, etc., …

Is there a easiest or laziest way to do this with comcast or verizon Home isps, for example? Cost is a issue, of course. How big a hassle to identify heavy data users? And so on …

Anders August 15, 2020 7:00 PM

@myliit

Actually i equalize this with running the TOR exit node.
But the problems would be even smaller – TOR could be used
from any point in the world, your open WIFI only from the
signal range.

There would be a lot of legal problems (downloading copyrighted
software/movies, hacking corporations, making bomb threats,
making kill threats, downloading child porn etc). However, people
still run TOR exit nodes from their home connections. There must
be legal solution.

And in light of Belarus protest where internet was cut off
this kind of backup / mesh connection is actually very
important. One solution is Fallback + mesh WIFI.

http://www.creativeapplications.net/objects/fallback-alternative-platform-for-real-time-news-during-internet-shutdowns/

This is actually very interesting topic – how you fight against
totalitarian govt internet cutoff? What are the solutions?

ps. Real story, happened here in Estonia. One guy had a open
WIFI. Someone used it to hack one portal. Police arrived and confiscated the WIFI owner computers. Found child porn on the
hard drive. He was charged but not for hacking but for the
possession of child pornography which is here more serious crime.

myliit August 15, 2020 9:41 PM

@Anders, echo

“… And in light of Belarus protest where internet was cut off
this kind of backup / mesh connection is actually very
important. One solution is Fallback + mesh WIFI.

http://www.creativeapplications.net/objects/fallback-alternative-platform-for-real-time-news-during-internet-shutdowns/

This is actually very interesting topic – how you fight against
totalitarian govt internet cutoff? What are the solutions?…”

These topics are probably more important than routine sharing, perhaps at considerable risk to sharer, of open wi-fi connections. At least where Cellular service is available. In addition, iirc, echo doesn’t like to hook up to other people’s open wi-fi, or something like that.

I found these websites, but am clueless about what might be good, or make sense, here in the U.S. or in other countries.

https://en.wikipedia.org/wiki/Mesh_networking

https://en.wikipedia.org/wiki/Wireless_mesh_network

https://en.wikipedia.org/wiki/Bluetooth_mesh_networking

MarkH August 16, 2020 4:56 AM

@SpaceLifeForm:

… one can always compress a given bitstring (of sufficient bitlength) via an analysis of the given bitstring and application of suitable algorithm(s)

It’s simple to proove, that no one algorithm can (losslessly) compress every bit sequence.

I suppose that with a little looseness in the definition of “algorithm,” it’s easy to show that for every possible long bit sequence, an algorithm can be constructed the compresses it at least a little.

However, there might need to be an infinite variety of such algorithms to compress every possible sequence — or more practically, a super-astronomical number of distinct algorithms to compress every possible sequence of some given length (for example, one megabit).

If that were the case, the “compressed file” would need to incorporate instructions (essentially, the specific algorithm) for uncompression.

While that is technically feasible, the extra length needed to include the algorithm would probably mean that in the general case, often the “compressed file” is at least as long as the original.


In my understanding wave phenomena like the interference pattern in the double-slit experiment are (in principle) fully deterministic and highly regular.

In contrast, entropy in information theory is defined in terms of a function of random variable.

So far, I haven’t visualized a connection between the two. Will you explain further?

myliit August 16, 2020 8:57 AM

@Anders

“Actually i equalize this with running the TOR exit node.“

With less risk, if I understand you correctly, one could dilute their tor traffic by running a non-exit tor node, too.

“But the problems would be even smaller – TOR could be used
from any point in the world, your open WIFI only from the
signal range.”

Open Wi-Fi can provide relatively high throughput, redundancy especially with overlapping ISPs, etc., …, all stuff you are no doubt aware of …

otoh, leakers or whistleblowers, journalists, dissidents, etc., of course might want to use tor, tor browser, etc., or Tails. My current threat model involves calling our lying President a piece of sh!t, who is surrounded by sycophants, …, and I don’t think tor will protect me from that, regardless.

Anyway with fingerprinting, sites visited, persistent cookies, or the like, changing one’s IP address may provide limited benefit against nation state actors, of course.

myliit August 16, 2020 9:20 AM

@All

Above, or below, I posted: “re: open Wi-Fi & IP location issues

For example, https://www.openwireless.org

to justify trying to use this thread, regarding location issues, to explore best practices for businesses and individuals to share their openwireless.org type wi-fi connections. [ and now mesh connections, too ].

I think this is an important topic, but unfortunately widespread sharing may never happen.

Any thoughts?

Anyway, basic concerns abou wi-fi, or open wi-fi, include things found in these links:

https://underspy.com/blog/the-risks-of-using-public-wifi/

https://usa.kaspersky.com/resource-center/preemptive-safety/protecting-wireless-networks

https://www.lawtechnologytoday.org/2016/01/risks-unsecured-wifi-hotspots/

https://www.lifewire.com/is-it-safe-to-use-an-open-wireless-network-2378210

Anything missing from a user’s, or provider’s, perspective?

Anders August 16, 2020 10:15 AM

@myliit

I brought the TOR exit node similarity example because i see
similar problems here – people do bad things through it. Sooner
or later someone uses your open WIFI for some bad purpose.
As i mentioned here, in Estonia police confiscates your
computer and all your data, this is something i really
want to avoid.

If you look at the TOR Legal FAQ, then you see that there’s
similar problems in US.

2019.www.torproject.org/eff/tor-legal-faq.html.en

They specifically advice not to run TOR exit node from home.

Does your threat model accepts home computer seizure?

Someone August 16, 2020 1:50 PM

So regarding this public WIFI thing
why dont you make the public side autoroute to a TOR Network, you have shared a WIFI for the public and you dont have to be afraid of where the public
surf if they want to do something that is considered bad and the logs hit your IP

Making a TOR Router isnt difficult

myliit August 17, 2020 3:38 AM

@Anders

“Does your threat model accepts home computer seizure?”

I’ve told people that ianal and before they run an open wi-fi network that they need to be prepared for police to knock on their door in the middle of the night. One person responded “cool” and has run one for five years with no known, to me, police visits.

I also told them to tell the police you are running open wi-fi and you decline to talk more until you talk to a lawyer. eff.org or openwireless.org might also provide legal support, for interesting cases, or lawyer recommendations

Of course, federal, state and local police by now probably know what tor and open wi-fi are. … But they could make one’s life miserable, with merit, or without merit (planting evidence, for example), of course.

Anyway, potential newspaper articles about arrest and confiscation of granny and granny’s computer equipment may provide some deterrent, when the average citizen probably knows that Starbucks, McDonalds, etc., run free wi-fi, too. …

myliit August 17, 2020 4:00 AM

@All

Our host has written a bunch on open wi-fi, his open wi-fi home network, openwireless.org, etc., and it would be wonderful if he would consider another post. A quick duckduckgo search yielded:

https://www.schneier.com/blog/archives/2008/01/my_open_wireles.html
My Open Wireless Network

https://www.schneier.com/blog/archives/2008/08/terrorists_usin.html
Terrorists Using Open Wireless Networks

https://www.schneier.com/blog/archives/2006/08/stealing_free_w.html
Stealing Free Wireless

https://www.schneier.com/blog/archives/2005/03/anonymity_and_t.html
Anonymity and the Internet

https://www.schneier.com/blog/archives/2006/06/schneier_asks_t.html 2006

“Schneier Asks to Be Hacked

Maybe I shouldn’t have said this:

“I have a completely open Wi-Fi network,” Schneier told ZDNet UK. “Firstly, I don’t care if my neighbors are using my network. Secondly, I’ve protected my computers. Thirdly, it’s polite. When people come over they can use it.”

For the record, I have an ultra-secure wireless network that automatically reports all hacking attempts to unsavory men with bitey dogs.”

myliit August 17, 2020 5:52 AM

@Someone

Ok. How about a back of napkin design summary with software, hardware, etc., and pros and cons or things to consider.

For example, I assume a lot of grandpa’s or grand daughter’s, etc., traffic would be blocked since it is coming from tor.

MarkH August 21, 2020 6:10 PM

Scattered through this thread, has been a side-discussion of randomness and entropy which started from Clive’s learned warnings about the use of One-Time Pad (OTP) encryption.

It has stimulated me to want to understand the concept of information theoretic entropy better, a process which has been slowed by frequent poor sleep …

As has been mentioned in various comments, the notions of entropy and randomness are baffling and resistant to “commonsense” reasoning.

Having accumulated a few spells of sharper-wits thought on the matter, I’ve arrived at some preliminary observations:

  1. Though I haven’t proved it, I’m fully confident of the truth of Clive’s statement (rephrased here) that the XOR combination of two random sequences always has less entropy than the combined entropy of the input sequences.

To borrow the mathematical term for a deterministic operation combining two functions, I expect that any convolution (not only via XOR) of two random sequences to produce a same-length sequence will have this same property, of yielding less entropy than the total of the input sequences.

  1. Bias in a random bit generator (corresponding to “loaded” coin or die which doesn’t have an equal distribution of results) diminishes entropy far less than I would have expected.

This is shown clearly in Figure 7 of Claude Shannon’s original paper presenting the definition of information theoretic entropy, titled “A Mathematical Theory of Communication.”

For example, if a true random bit generator1 is twice as likely to output 1-bits as 0-bits — a truly awful bias — the entropy is still greater than 0.9 bits per bit.

Even if the bias is 8 to 1, each output bit has more than 1/2 bit of entropy!

So, while the designer of a TRBG will naturally strive to make the bias as small as practical, perfection is hardly necessary.

  1. In the cases of a pair of random bit sequences, or the output of a TRBG, if the estimated entropy per bit is less than desired, a hash function can be used as an “entropy concentrator” to yield a shorter sequence with greater entropy per bit.

[Note that in his original comment on XOR, Clive wasn’t writing about attempts to “collect” entropy, but rather a property of One-Time Pad encryption.]

  1. Shannon’s entropy is essentially defined in terms of functions of a random variable, which are a typical example of mathematical idealism.

Excepting the output of a good TRBG, virtually all information sources have a high degree of pattern and structure.

People do speak of the entropy of patterned data (for example, English language is estimated have about one bit of entropy per character), but it seems to me that such assessments depart from the strict definition of entropy.

My present notion is that the assessment of entropy (or at least, some metaphor for it) in partly patterned data is a very deep question, and perhaps more art (or philosophy) than science.

If anybody can point to a good source for study of this matter, I shall be grateful!


1 The notion of biased random numbers may be disorienting to computer geeks, who are perhaps accustomed of thinking of “random” as inherently meaning “statistically uniform.”

Those are, however, independent attributes. The digits of pi, or the output of a suitable LFSR (provided the sample is shorter than its period) show perfect statistical uniformity … but are fully deterministic, which is the polar opposite of random.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.