(Roughly) Daily

Posts Tagged ‘data

“Facts are stubborn things, but statistics are pliable”*…


FL covid


Data visualizations that make no sense...




work from home

More at “WTF Visualizations.”

* Mark Twain


As we celebrate clarity, we might spare a thought for the mathematician, biologist, historian of science, literary critic, poet, and inventor Jacob Bronowski; he died on this date in 1974.  Bronowski is probably best remembered as the writer (and host) of the epochal 1973 BBC television documentary series (and accompanying book), The Ascent of Man (the title of which was a play on the title of Darwin’s second book on evolution, The Descent of Man)… the thirteen-part series, a survey of the history of science–  from rock tools to relativity– and its place in civilizations, is still an extraordinary treat.  It’s available at libraries, on DVD, or (occasionally) on streaming services.




“A better world won’t come about simply because we use data; data has its dark underside.”*…




Data isn’t the new oil, it’s the new CO2. It’s a common trope in the data/tech field to say that “data is the new oil”. The basic idea being – it’s a new resource that is being extracted, it is valuable, and is a raw product that fuels other industries. But it also implies that data in inherently valuable in and of itself and that “my data” is valuable, a resource that I really should tap in to.

In reality, we are more impacted by other people’s data (with whom we are grouped) than we are by data about us. As I have written in the MIT Technology Review – “even if you deny consent to ‘your’ data being used, an organisation can use data about other people to make statistical extrapolations that affect you.” We are bound by other people’s consent. Our own consent (or lack thereof) is becoming increasingly irrelevant. We won’t solve the societal problems pervasive data surveillance is causing by rushing through online consent forms. If you see data as CO2, it becomes clearer that its impacts are societal not solely individual. My neighbour’s car emissions, the emissions from a factory on a different continent, impact me more than my own emissions or lack thereof. This isn’t to abdicate individual responsibility or harm. It’s adding a new lens that we too often miss entirely.

We should not endlessly be defending arguments along the lines that “people choose to willingly give up their freedom in exchange for free stuff online”. The argument is flawed for two reasons. First the reason that is usually given – people have no choice but to consent in order to access the service, so consent is manufactured.  We are not exercising choice in providing data but rather resigned to the fact that they have no choice in the matter.

The second, less well known but just as powerful, argument is that we are not only bound by other people’s data; we are bound by other people’s consent.  In an era of machine learning-driven group profiling, this effectively renders my denial of consent meaningless. Even if I withhold consent, say I refuse to use Facebook or Twitter or Amazon, the fact that everyone around me has joined means there are just as many data points about me to target and surveil. The issue is systemic, it is not one where a lone individual can make a choice and opt out of the system. We perpetuate this myth by talking about data as our own individual “oil”, ready to sell to the highest bidder. In reality I have little control over this supposed resource which acts more like an atmospheric pollutant, impacting me and others in myriads of indirect ways. There are more relations – direct and indirect – between data related to me, data about me, data inferred about me via others than I can possibly imagine, let alone control with the tools we have at our disposal today.

Because of this, we need a social, systemic approach to deal with our data emissions. An environmental approach to data rights as I’ve argued previously. But first let’s all admit that the line of inquiry defending pervasive surveillance in the name of “individual freedom” and individual consent gets us nowhere closer to understanding the threats we are facing.

Martin Tisné argues for an “environmental” approach to data rights: “Data isn’t the new oil, it’s the new CO2.”

Lest one think that we couldn’t/shouldn’t have seen this (and related issues like over dependence on algorithms, the digital divide, et al.) coming, see also Paul Baran‘s prescient 1968 essay, “On the Future Computer Era,” one of the last pieces he did at RAND, before co-leading the spin-off of The Institute for the Future.

* Mike Loukides, Ethics and Data Science


As we ponder privacy, we might recall that it was on this date in 1981 that IBM released IBM model number 5150– AKA the IBM PC– the original version and progenitor of the IBM PC compatible hardware platform. Since the machine was based on open architecture, within a short time of its introduction, third-party suppliers of peripheral devices, expansion cards, and software proliferated; the influence of the IBM PC on the personal computer market was substantial in standardizing a platform for personal computers (and creating a market for Microsoft’s operating system– first PC DOS, then Windows– on which the PC platform ran).  “IBM compatible” became an important criterion for sales growth; after the 1980s, only the Apple Macintosh family kept a significant share of the microcomputer market without compatibility with the IBM personal computer.

IBM PC source


Written by LW

August 12, 2019 at 1:01 am

“Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it”*…




You’ve probably heard of kilobytes, megabytes, gigabytes, or even terabytes.

These data units are common everyday amounts that the average person may run into. Units this size may be big enough to quantify the amount of data sent in an email attachment, or the data stored on a hard drive, for example.

In the coming years, however, these common units will begin to seem more quaint – that’s because the entire digital universe is expected to reach 44 zettabytes by 2020.

If this number is correct, it will mean there are 40 times more bytes than there are stars in the observable universe…

The stuff of dreams, the stuff of nightmares: “How Much Data is Generated Each Day?

* Dan Ariely


As we revel in really, really big numbers, we might spare a thought for Edgar Frank “Ted” Codd; he died on this date in 2003.  A distinguished computer scientist who did important work on cellular automata, he is best remembered as the father of computer databases– as the person who laid the foundation for for relational databases, for storing and retrieving information in computer records.



Written by LW

April 18, 2019 at 1:01 am

“I’ll let you be in my dreams if I can be in yours”*…




From Glenn Macdonald (in his capacity as Spotify’s genre taxonomist– or as he put’s it “mechanic of the spiritual compases of erratic discovery robots that run on love”)

This is a mapping of genres to words, and words to genres, using words that are used distinctively in the titles of songs. A genre’s words are ranked by how disproportionately they appear in that genre’s songs’ titles compared to all songs. A word’s genres are ranked by the position of that word in each genre’s word list. 1525 genres and 4712 words qualify.

Visit “Genres in Their Own Words”  And while you’re there, explore the genre map and the other nifty resources at Glenn’s site, Every Noise At Once.

* Bob Dylan


As we slip on the headphones, we might spare a thought for Sir George Henry Martin; he died on this date in 2016.  A record producer, arranger, composer, conductor, audio engineer, and musician, Martin began his career as a producer of comedy and novelty records in the early 1950s, working with Peter SellersSpike Milligan, and Bernard Cribbins, among others.  In 1962, while working at EMI/Parlophone, Martin was so impressed by Brian Epstein’s enthusiasm, that he agreed to record the Beatles before seeing or hearing them (and despite the fact that they’d been turned down by Decca).

Martin went on to produce 23 number ones on the Billboard Hot 100 chart, 19 of which were by The Beatles.  Indeed, Paul McCartney referred to Martin as “the fifth Beatle.”  He also produced chart topping hits for McCartney (“Say Say Say” with Michael Jackson and “Ebony and Ivory” with Stevie Wonder), Elton John (“Candle in the Wind”) and America (“Sister Golden Hair”).


George Harrison, Paul McCartney, George Martin, and John Lennon in the studio in 1966


Written by LW

March 8, 2019 at 1:01 am

“Induction for deduction, with a view to construction”*…


Mushroom cloud from the world’s first successful hydrogen bomb test, Nov. 1, 1952

At RAND in 1954, Armen A. Alchian conducted the world’s first event study to infer the fissile fuel material used in the manufacturing of the newly-developed hydrogen bomb. Successfully identifying lithium as the fissile fuel using only publicly available financial data, the paper was seen as a threat to national security and was immediately confiscated and destroyed…

How a bench researcher used publicly-available market data to unlock the secret of the H Bomb: “The Stock Market Speaks: How Dr. Alchian Learned to Build the Bomb” (pdf).

* Auguste Compte (attributed by John Arthur Thomson in a quote at heading of the chapter “Scientific Method,” in his Introduction to Science


As we comb the columns, we might recall that it was on this date in 1883 that the S.S. Daphne sank moments after her launching at the shipyard of Alexander Stephen and Sons in Glasgow.  The 500-ton steamer went down with 200 men on board– all of them working to finish her before the shipyard closed for the Glasgow Fair.  Only 70 were saved.



Written by LW

July 3, 2017 at 1:01 am

<span>%d</span> bloggers like this: