Discover more from Mir's Data .Report
Paul Graham has 49.3 times more "AI tokens", but does that actually mean what we think it does?
And do we actually understand as much as we claim
1. The challenge of AI understanding
Understanding language remains one of the most complex challenges in artificial intelligence. While AI can simulate understanding through pattern recognition and sophisticated algorithms, the question remains whether it truly "understands" the context and meaning of the information it processes. The idea of understanding may be subjective, as maybe the point below shows.
"Chatbots cannot think like humans: They do not actually understand what they say. They can mimic human speech because the artificial intelligence that powers them has ingested a gargantuan amount of text, mostly scraped from the internet."
This is a troubling statement. Because if anything, the experience with ChatGPT has shown that predicting the next word is maybe the crux of what human communication is about.
2. A Thought Experiment
Last week I did an experiment: I did NOT actually write the newsletter, as I normally would. Instead, I collected a bunch of content that I have come across that week and gave it all to ChatGPT to make sense of and give me a well-polished newsletter.
I did NOT actually write the newsletter, as I normally would. Instead, I collected a bunch of content that I have come across that week and gave it all to ChatGPT
Whether it did a good job, or even understood what it was doing - I don't know. I guess, you can be the judge.
But in some ways the sponge method for “understanding” is nothing new. Consider a VC investor as an example. For simplicity, someone who invests in technology, but actually lacks domain expertise. Over the last 3 months, this hypothetical investor talked to 100 companies doing some form of AI.
From these conversations, our investors collects the gist of verbiage. This verbiage leads to confidence and then leads to a belief in own unique understanding of what’s going on in this space. A month passes, and our hypothetical investor hears the words that seem to reflect all the verbiage from the last few months, but the verbs are said in just the right form as to sound super convincing.
You’re welcome - in human form, such “understanding” is super convincing. We subscribe to it. We value it. We spread it across Substack, HackerNews, and Ycombinator. And if we’re a Family Fund, we even give these people money. But if this is all packaged into AI, supposedly, none of us would feel very strongly about such an investment strategy - or would we?
3. Comparing influences on AI: the value of unique ideas
Washington Post this week released a widget that measures the number of tokens extracted from each website for Google's C4 Data set. I did a couple searches,
and discovered that Paul Graham's essays blog has 49.3 times more tokens than my own old personal blog.
However, when considering the impact of each token and the uniqueness of ideas, it is possible that the ideas from my own blog, which may be more reflective of common experiences, could be more influential than the unique and contrarian ideas from Paul Graham's blog.
Consider that Paul Graham's ideas are actually quite unique and contrarian (at least, they were when he first published 20 years ago). Let’s suppose the rest of the web does not yet reflect those ideas because there is not anyone like Paul Graham out there.
But let's say my ideas are mediocre at best, and basically a mere reflection of my experience existing in the digital world. I, like most of us, just write whatever I am experiencing as part of my work, life - and much of that happens to be a similar thing that others experience.
In that way, the tokens extracted from my blog are also very similar to the tokens extracted, for simplicity’s sake, 493 more times from the rest of the web. All of us then go to ChatGPT; ask it to give us some response, and the ideas associated with my blog are, if we do simple math, about 10x more influential than those of Paul Graham. Of course, I understand that it is not as simple as that, and yet it is close.
Seems bizarre, but if we go back to the VC investor example, if anything it shows that faking “understanding” with a sponge method, is maybe what most of us, humans, in fact do most of the time. And then we give ourselves too much credit for that understanding.
All of this does not as much give an answer as it raises a more serious question of what we truly mean by "understanding." Is understanding perhaps something that has a unique point of reflection and cannot be easily replicated, or is it something else?
About the Author
In my former life I was a Data Scientist. I studied Computer Science and Economics for undergrad and then Statistics in graduate school, eventually ending up at MiT, and, among other places, Looker, a business intelligence company, which was acquired by Google Cloud.
Mir's Data .Report is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Stay tuned for more insights into the world of data and AI.