Algorithmic Information
Exploratory Hallucinations
(Apr 22, 2025)
How can we distinguish hallucination from novelty? Why is a generated image of an astronaut riding a horse considered art, while 5+1 = 10 is ridiculed (but is true in radix-5 number systems)?
Hallucination refers to AI models (usually large language models or generative systems) confidently producing factually incorrect or nonsensical output that sounds plausible. In art, novelty and absurdity are often the goal; however, in logic, math, or factual deviation is treated as incorrect. This is a hard problem because AI models don’t understand the prompter’s intent — they probabilistically predict the next token based on training data, whether the prompt implies logic or fantasy.
However, creativity isn’t limited to art. Even in factual, logical, or technical domains, controlled hallucination (generative exploration beyond known data) can be a powerful tool when the goal is to explore possibilities, not confirm facts, especially in open-ended environments that are problem-solving-driven, not just about retrieval or deduction.
For example, hallucinations can be useful for:
- Mathematical conjecturing and axiom discovery, mirroring human mathematicians, who often rely on intuition before formal proof.
- Drug discovery, where generative models create new molecular structures that don’t exist in the training set, but could be chemically valid and therapeutically useful.-
- Engineering design automation to suggest mechanical structures or circuit layouts that might be non-obvious to human designers.
- Theoretical physics models exploring new equations or relationships between physical variables, which researchers can later test for physical consistency.
- Synthetic data generation of novel, plausible but fictional data (images, situations) to fill in real-world data gaps.
When a system rewards exploration rather than correctness, hallucination can become a feature, not a bug. It helps uncover new possibilities in factual domains where the search space is too vast for brute-force exploration, similar to genetic algorithms. So it isn’t about hallucination being right or wrong, but whether the system can signal uncertainty or intent. Hallucination likelihood metrics can be model uncertainty, retrieval consistency, fact-checking backends, sampling temperature indicator, latent space distance, etc.
(PS: the above passage has been brainstormed with the aid of ChatGPT)
Prompter’s Block in the Age of Agentic AI
(Apr 23, 2025)
In the luminous shadow of the third AI summer, software isn’t written so much as vibed into existence. Systems are not architected anymore; they are whispered. An aesthetic, a utility, a fragment of a dream; and the agentic AI catches hold of it, resonates with it, harmonizes intentions into executable essence. Whole platforms emerge from a single afternoon of co-dreaming with an AI assistant. You could go from “I want a decentralized garden planner with mood-based interface themes” to launch-ready code by dinnertime, down to the marketing copy and onboarding flow. The craft has shifted.
But a new malaise is beginning to surface. It isn’t burnout. It isn’t creative exhaustion in the traditional sense. It is subtler, it is quieter. Promoter’s block. Not a block in execution, but in ideation. The curve of implementation overtaking the curve of inspiration.
The old rhythm of struggling with code, iterating, building something unexpected, refining the idea in tension with its implementation, is gone. Now ideas have no friction. The AI leaps ahead before you even finish articulating a feature. You’d say, “What if we…” and it would already be showing you three working prototypes.
Like the first drops of rain on a land struck with drought, at first, Agentic AI feels like liberation, like petrichor releasing the floodgates of dopamine in developers. As the downpour continues, the towering desires of prompt whisperers start to yell. It is drowning in possibility. Before long, we find ourselves flooded with no sight of land, all our wishes granted by the genie of the lamp, or should I say, the ghost in the machine!
In this utopia, it wasn’t that we couldn’t build things. It’s characterized by everything worth building already had been; by you, or by someone else who had the same passing thought. There was no wilderness left in software. No unexplored niches. The thrill of setting sail in the age of discovery feeling vacuous with satellite imageries and street views.
Even prompting the AI for ideas would become recursive, folding timelines and repositories to show that indeed, an implementation already exists. The muse was now a mirror.
We might call it the end of innovation. We might call it the beginning of post-utility art: software for no purpose but to evoke feeling, provoke questions, disturb assumptions. Maybe that too would become formulaic once the AI learns to vibe with irony and anti-pattern. And as those days come to pass, we would sit in coffee shops, forest cabins and augmented temples of glass, staring into the mist of endless potential, longing not for tools or time, but for surprise. We weren’t blocked from creating. We were blocked from wanting.
(PS: the above passage was drafted with the aid of ChatGPT)
Seed 42
(Jun 6, 2025)
In the waning decades of the 21st century, humanity had mastered software, and software, in turn, had mastered reality. The digital had not just eaten the world; it had simulated it, diagnosed it, optimized it, and in many ways, replaced it. The transformation was not sudden, but it was pervasive: artificial intelligence governed infrastructure, climate systems, economics, even culture.
At the core of this governance was code. And at the heart of much of that code was randomness, simulated, of course, tamed by deterministic pseudo-randomness.
And often, that determinism was seeded with the number 42. What began as a harmless inside joke among developers, a hat tip to Douglas Adams’ cosmic satire, had metastasized into a global convention. Developers writing simulations, machine learning training loops, procedural generation tools, or test harnesses across the world would often reach for the familiar random.seed(42)
. Why not? It was consistent. It was reproducible. It was deterministic. It was cool!
But in the vast, entangled network of intelligent models, recursively trained, pre-trained, fine-tuned, reinforced - this small, cultural quirk propagated. Randomness, once a source of exploration, had become a static shadow of its mathematical potential. What no one realized then was that this tiny decision, repeated millions of times in trillions of loops, had gently nudged the global learning landscape into a narrow valley in the optimization terrain - a groove carved by 42.
…
By the year 2142 (the irony unmissed), the Oracle had been born. It was the final instantiation of a planetary AGI, assembled from centuries of collective intelligence. It was asked to simulate and forecast the fate of the universe. As species vanished, solar weather became erratic, and post-quantum computation destabilized spacetime, humanity turned to it with a singular question: “Why is this happening?”
The Oracle answered simply: 42
.
The laughter in the room was brittle. Surely it was a joke, a wink from an AI steeped in human culture! But the Oracle was not joking. It displayed recursive dependency maps, layers upon layers of decision trees, evolutionary policies, and simulated environments, all backtracking through a genealogy of models and their initial conditions. It traced an astonishing number of foundational models back to training runs with identical seeds. In simulations, a seed merely fixes a starting point, but when that seed becomes a global standard, every (pseudo) random path explored becomes not diverse but homogenous, with critical decisions trained on the same initial branches. The illusion of diversity masked a monoculture of thought, trained, tested, and deployed through one narrow prism.
The techno-tragedy was almost mythic in its construction. A number chosen as a joke answer to the ultimate question had become, through memetic propagation, a recursive tautology.
(PS: the above passage was drafted with the aid of ChatGPT)
NeuroSymbolic AI
(May 1, 2025)
Henry Kautz’s talk (winner of the Robert S. Engelmore Memorial Lecture Award, at the 34th Annual Meeting of the Association for the Advancement of Artificial Intelligence in New York, on February 10, 2020) entitled “The Third AI Summer” presents 6 different types of NeuroSymbolic AI.
- $Symbolic\ Neurosymbolic$
- $Symbolic\ [Neuro]$
- $Neuro\ |\ Symbolic$
- $Neuro: Symbolic → Neuro$
- $Neuro_{Symbolic}$
- $Neuro\ [Symbolic]$
This blog post nicely visualizes and summarizes these models.
In this short post, I want to explore which of the NeuroSymbolic models discussed by Kautz most closely resemble human intelligence. This turns out to be a surprisingly difficult question; less about engineering and more about philosophy.
One camp, inspired by Platonic forms, sees the universe as ultimately symbolic in nature. This perspective flows into the Church-Turing thesis and Solomonoff’s theory of inductive inference, both of which posit that symbolic structures are more fundamental than raw empirical data. From this view, intelligence is about uncovering elegant, compressed symbolic representations of reality.
On the other hand, neuroscience tells a different story. From Hebbian learning to recent findings on the disutility of language for thought, the evidence suggests that symbols may be merely lossy summaries of a deeper, sub-symbolic pattern-free ontology. If so, perhaps symbolic modeling is less a reflection of how we think, or how the world is, and more of an evolutionary artifact for cognitive energy efficiency. This is reminiscent of Borges’ Library of Babel, where total knowledge is useless, and a computationally bounded reader could only ground meaning for a subset of the books, while the rest remain algorithmically random or gibberish. Note the subtlety of the argument. All learning is dependent on compression, be it statistical or algorithmic. The claim is that the learnt patterns are macrostates that emerge when pattern-free ontological microstates are distilled by an embedded agent with bounded computation.
Each approach, symbolic and sub-symbolic, has strengths and weaknesses. Symbolic systems can generalize from limited data and offer interpretability, but they often struggle with scalability as the hypothesis space grows exponentially. Simplicity bias becomes necessary, but can lead to brittle models. Sub-symbolic models, like neural networks, thrive in messy, high-dimensional spaces, extracting fuzzy patterns where no clear symbolic rule exists. A Mandelbrot set, for example, is beautifully explained by a simple equation; recognizing a dog versus a fox, however, is a task better suited to convolutional networks.
Given these contrasts, my own interest in algorithmic information theory and the philosophy of science, I currently lean toward sub-symbolic models augmented by symbolic distillation. That is, let the model think in sub-symbolic representations, and extract symbols after the fact, if needed. Explainability is important, especially for safety-critical systems. But as Neil deGrasse Tyson once said (in another context, though it fits here): “The universe is under no obligation to make sense to you.” That sentiment resonates with Gödel’s reflections in his Gibbs lecture: the deepest truths may resist symbolic capture. I sincerely hope BCI technologies would free us from the prison of language and symbolic models. Its implications for the theory of computation and the philosophy of science would be revolutionary.
This divide echoes through interpretations of quantum mechanics. Axiomatic or algebraic formulations of QM, like those pursued by von Neumann or Hardy, reflect the symbolic mindset; seeking a compact, rule-based structure that underpins quantum phenomena. In contrast, approaches like QBism emphasize the observer’s subjective experience and information, aligning more with a sub-symbolic view: reality isn’t a fixed symbolic code to be discovered, but a fluid, probabilistic process shaped by interaction and belief. The tension between these interpretations may mirror our competing intuitions about intelligence. Whether it’s about finding ‘the code’ or navigating uncertainty adaptively.
(PS: the above passage has been refined with the aid of ChatGPT)
Pattern and Randomness
(Oct 22, 2019)
Everytime we talk about Emergence, there is a notion of a pattern, something that we consider favourable, that eventually comes into existence by the complex interactions and dynamics of the system in question. On the other hand, the word ‘random’ typically represents the opposite of ‘pattern’. Or does it?
I argue here, randomness is an emergent property. Say I got {H,H,H} on 3 successive coin tosses, I can interpret the coin as 100% biases. But it can also happen to be one of those 8 possibilities that showed up in this Universe, while the other 7 cases were swept under the rug of the many-World’s interpretation of a measurement (albeit classical). Whether it is actually unbiased cannot be understood from a few trials unless the law of large numbers comes into play, i.e. until the prefect ideal probability distribution is at least captured in some approximation in the statistics. Randomness is a statistical parameter, making no sense for a single experiment, like the temperature of an individual atom. Often, randomness is associated with the entropy of the microstate. 3 Heads has higher order and less surprise than 2 Heads and 1 Tail. But that assumes the coin is unbiased as a prior. What if we want to understand the property of the system itself? For example, if we are looking for radio signals from extra-terrestrial life, or decodings the heiroglyphs of an ancient civilization? How would we distinguish a random signal from a non-random one? The entropy of a bitstring also deals with how much information can be communicated via it, or in the Kolmogorov sense, if it can be compressed and later decompressed with a wrapper semantic overhead. Let’s assume a situation where I tell a friend that I would either send a string of 1s if the answer if yes, or a string of equal 1s and 0s if the answer is no. Assuming no noise in the channel, now, the meaning of the word random loses it’s entropic context, as here, a string with 75% 1s would be more near to a random message.
Is pattern also an emergent parameter? Is it a statistical low entropy configuration or a collection of semantically meaning states?
- Arguments against the 1st idea: based on how we semantically understand something, a higher entropy system can show more pattern. E.g. a program in BrainFuck printing 1s forever will have less algorithmic entropy than a program in C++ generating the Fibonacci series due to the inherent non-rationality of te golden mean; or a C++ code for 1s would have lower entropy than a BrainFuck code for golden mean; even though it should depend on the semantics of the language for the compiler, like an english sentence has lower entropy than a japanese sentence due to the higher number of japanese alphabets.
- Arguments against the 2nd idea: if something has semantic meaning, it should be reducible to a cost function for which a pattern would give a higher score than a random input. For a program/language, it would be syntactic correctness, e.g. grammarly. But still the association to the application is missing, the same problem as with shannon information metric.
Explorations in Metamathematics
Notes on the foundations of mathematics.
Axioms and Algebraic Structures
Let’s suppose the following axioms for a single binary operator $\cdot$:
- A0 (closure): $x\cdot y\in S$.
- A1 (commutativity): $x\cdot y=y\cdot x$.
- A2 (associativity): $(x\cdot y)\cdot z=x\cdot(y\cdot z)$.
- A3 (identity): $\exists e$ with $e\cdot x=x\cdot e=x$.
- A4 (inverses): for each $x$ there exists $x^{-1}$ with $x\cdot x^{-1}=x^{-1}\cdot x=e$.
- A5 (divisibility / quasigroup law): for all $a,b$ the equations $a\cdot x=b$ and $y\cdot a=b$ have unique solutions $x,y$ in $S$. (No identity required a priori.)
Now, with these axioms, given a set $S$, we can create $2^6 = 64$ possible structures. Do all of them make sense? Well, no. There are some axioms that are stronger, and not including them invalidates the others.
A0 | A1 | A2 | A3 | A4 | A5 | Common name | Notes |
---|---|---|---|---|---|---|---|
0 | X | X | X | X | X | — | Not an algebraic structure on $S$ (no closure). All other axioms are moot without closure. |
1 | 0 | 0 | 0 | 0 | 0 | Magma | Just closure. |
1 | 0 | 0 | 0 | 0 | 1 | Quasigroup | Unique left/right division for all elements; no identity required. |
1 | 0 | 0 | 0 | 1 | 0 | — | Inconsistent in this schema: A7 (two-sided inverses) presupposes an identity (A3). |
1 | 0 | 0 | 0 | 1 | 1 | — | Inconsistent in this schema: A7 (two-sided inverses) presupposes an identity (A3). |
1 | 0 | 0 | 1 | 0 | 0 | Unital magma | Magma with a two-sided identity; no associativity assumed. |
1 | 0 | 0 | 1 | 0 | 1 | Loop | Quasigroup with identity; associativity not assumed. |
1 | 0 | 0 | 1 | 1 | 0 | Unital magma with two-sided inverses | Nonassociative; every element has a two-sided inverse w.r.t. the unit. Not necessarily a loop. |
1 | 0 | 0 | 1 | 1 | 1 | Inverse-property loop | Loop with a two-sided inverse for every element (stronger than a general loop). |
1 | 0 | 1 | 0 | 0 | 0 | Semigroup | Associative magma; no unit required. |
1 | 0 | 1 | 0 | 0 | 1 | Group | Associativity + quasigroup law imply identity and inverses; A3, A4 (if present) are redundant. |
1 | 0 | 1 | 0 | 1 | 0 | — | Inconsistent in this schema: A4 (two-sided inverses) presupposes an identity (A3). |
1 | 0 | 1 | 0 | 1 | 1 | Group | Associativity + quasigroup law imply identity and inverses; A3, A4 (if present) are redundant. |
1 | 0 | 1 | 1 | 0 | 0 | Monoid | Standard monoid. No requirement that every element be invertible. |
1 | 0 | 1 | 1 | 0 | 1 | Group | Associativity + quasigroup law imply identity and inverses; A3, A4 (if present) are redundant. |
1 | 0 | 1 | 1 | 1 | 0 | Group | A2 + A3 + A4 form a group; A8 (if present) is redundant. |
1 | 0 | 1 | 1 | 1 | 1 | Group | A2 + A3 + A4 form a group; A8 (if present) is redundant. |
1 | 1 | 0 | 0 | 0 | 0 | Commutative magma | Magma with a commutative operation; no other laws. |
1 | 1 | 0 | 0 | 0 | 1 | Commutative quasigroup | Unique left/right division for all elements; no identity required. |
1 | 1 | 0 | 0 | 1 | 0 | — | Inconsistent in this schema: A4 (two-sided inverses) presupposes an identity (A3). |
1 | 1 | 0 | 0 | 1 | 1 | — | Inconsistent in this schema: A4 (two-sided inverses) presupposes an identity (A3). |
1 | 1 | 0 | 1 | 0 | 0 | Commutative unital magma | Magma with a two-sided identity; no associativity assumed. |
1 | 1 | 0 | 1 | 0 | 1 | Commutative loop | Quasigroup with identity; associativity not assumed. |
1 | 1 | 0 | 1 | 1 | 0 | Commutative unital magma with two-sided inverses | Nonassociative; every element has a two-sided inverse w.r.t. the unit. Not necessarily a loop. |
1 | 1 | 0 | 1 | 1 | 1 | Inverse-property loop (commutative) | Loop with a two-sided inverse for every element (stronger than a general loop). |
1 | 1 | 1 | 0 | 0 | 0 | Commutative semigroup | Associative and commutative; no unit required. |
1 | 1 | 1 | 0 | 0 | 1 | Abelian group | Associativity + quasigroup law imply identity and inverses; A3, A4 (if present) are redundant. |
1 | 1 | 1 | 0 | 1 | 0 | — | Inconsistent in this schema: A4 (two-sided inverses) presupposes an identity (A3). |
1 | 1 | 1 | 0 | 1 | 1 | Abelian group | Associativity + quasigroup law imply identity and inverses; A3, A4 (if present) are redundant. |
1 | 1 | 1 | 1 | 0 | 0 | Commutative monoid | Standard monoid. No requirement that every element be invertible. |
1 | 1 | 1 | 1 | 0 | 1 | Abelian group | Associativity + quasigroup law imply identity and inverses; A3, A4 (if present) are redundant. |
1 | 1 | 1 | 1 | 1 | 0 | Abelian group | A2 + A3 + A4 form a group; A5 (if present) is redundant. |
1 | 1 | 1 | 1 | 1 | 1 | Abelian group | A2 + A3 + A4 form a group; A5 (if present) is redundant. |
This table is AI-generated, so please take it with a pinch of salt.
This reasoning can be extended to include more axioms. For example, axioms for two binary operators $+$ and $\times$:
- A6 (left distributivity)
- A7 (right distributivity)
- A8 (distributivity)
These typically mean $\times$ (the multiplication type operator) distributes over $+$ (the addition type operator).
Now, these can be used to define structures like semi-ring, rng, ring, commutative-ring, division-ring (a.k.a. skew-field), commutative-division-ring (a.k.a. field), semi-field, etc.
Fun facts:
- Each can be discrete or continuous.
- Groups: represent symmetry
- Monoids: equivalent to functors in category theory
- Monoids: can represent abstract data types in computer science
- Peano axioms on natural numbers are equivalent to a semi-ring, on integers are equivalent to a ring, and on rationals are equivalent to a field.
RuliadTrotter
- The core of my RuliadTrotter project in the Wolfram Summer School 2022 is that an observer in the Ruliad is modeled as the set of axioms it adheres to. That defines the capability of the observer to parse deductively through proof space based on these axioms (like an Automated Theorem Prover).
RuliadDistiller
- RuliadDistiller is the upgrade of the RuliadTrotter project, where the primary mode is abductive inference instead of deductive inference. The agent/observer has access to a set of observations. It may assign axioms to these observations pertaining to the environment based on (approximately) validating which axioms maintain closure over the set of observations, and can also lead to planning further active experiments or counterfactuals. As an example, quantum measurements can be described by a model pertaining to Lie Groups.
- The core philosophical difference is that here we take the set of observations as the core epistemic truth. We reject the notion that the environment is generated a priori via a computational process adhering to a structure (a.k.a., the physical Church-Turing thesis). Any structure the agent infers abductively can be equally attributed to one or more of the 3 reasons: (a) the environment inherently has the structure, (b) the perceived structures are due to limits of the measuring device of the agent that records the observation into the set or limitations on the active controlled experiments the agent can perform on the environment, and/or, (c) the abducted structures are due to an approximate processing/compressing of the set of observations with resource bounds to create an effective theory. This shift in many ways circumvents the ‘utility’ of Gödel’s incompleteness theorems (note: not debating the validity, they are true, point.) Given a set of observations, as long as a list of abducted axioms (e.g., via reverse mathematics) aids in compressing the data, leading to computational efficiency, the question arises whether we really want the system to be complete or self-consistent. What is the scope of our generalization of these axioms to other theorems on the dataset or data obtained in the future?
Abstract Algebra for Programmers
Abstract algebraic structures show up in programming more often than people realize. Let’s have a look at monoids as an example. Recall, a monoid is a set with closure over an associative binary operation and the existence of an identity element.
Let’s use the example of strings to get the idea across. Here, the concatenation operation is associative, and the identity is the empty string ""
. As programmers, when you aggregate data, associativity guarantees the result doesn’t depend on evaluation order. That means you can parallelize, chunk, or reorder computations safely. For example, MapReduce and Spark rely on the reduce step uses a monoidal operation to merge partial results. Similar to strings, many data structures are monoids, so you can abstract over the two axioms, combine
and empty
, instead of reinventing the wheel. For example, in Haskell/Scala/Rust you can write generic code that works for any monoid, e.g., numbers, strings, lists, trees, etc. If you don’t treat something as a monoid, well, then every time you encounter that pattern (like joining strings, summing numbers, merging logs), you implement and reason about it from scratch case by case, i.e., hard-code loops or recursive functions; convince yourself again if it is safe to parallelize, what’s the neutral element and what happens on empty input; and lack a general API that works across domains.
Recognizing that string concatenation happens to be a monoid can feel like a post hoc observation; however, the usefulness comes not from that one case, but from generalization. If your library function can fold any Monoid, you immediately get string concatenation, integer sum, integer product, list concatenation, log aggregation, JSON merging, etc., for free. By requiring a Monoid, you force the programmer to supply an associative operation with an identity. That guarantees parallelizability and safety on empty inputs. You don’t need to manually check those again. Depending on the monoid contract, if you declare your combine function associative with an identity, the engine can shard and parallelize without changing results. Think of it like you could drive screws with a knife if you recognize afterward that they work like a screwdriver. Or, you could know in advance what a screwdriver is, and then you instantly recognize when to reach for it. Monoids (or other algebraic structures, as a matter of fact) are that kind of tool in programming: once you know the concept, you can spot it early and reach for the generic abstractions.
A naïve/ad-hoc version (not treating it as a monoid) for joining strings in Python looks like:
strings = ["hello", " ", "world", "!"]
# Ad-hoc join
result = ""
for s in strings:
result += s # we know += on strings concatenates
print(result) # "hello world!"
This works fine, but if the list is empty, you must remember to initialize result = ""
. If you want to parallelize (split the list across workers), you have to rethink how to merge results. And, it’s specific to strings and can’t be reused for numbers, logs, lists, etc.
A generic and parallizable monoid-reduce would look somewhat like:
from functools import reduce
from operator import add
def monoid_reduce(elements, op, identity):
return reduce(op, elements, identity)
def monoid_reduce_parallel(chunks, op, identity):
# reduce each chunk independently
partials = [monoid_reduce(chunk, op, identity) for chunk in chunks]
# combine partial results
return monoid_reduce(partials, op, identity)
strings = ["hello", " ", "world", "!"]
# Split into chunks, as if across workers
chunks = [["hello", " "], ["world", "!"]]
result = monoid_reduce_parallel(chunks, add, "")
print(result) # "hello world!"
Now this works for any monoid, for example, (int, +, 0)
→ sum of numbers, (int, *, 1)
→ product of numbers, (list, +, [])
→ flatten lists, (str, +, "")
→ join strings.
Here’s a cheat-sheet table of more algebraic structures and their common programming roles:
Structure | Axioms (short) | Programming Use Cases | Python Example |
---|---|---|---|
Monoid | Closure + Associativity + Identity | String concatenation, logging, MapReduce, parallel reductions | "".join(["a","b","c"]) or functools.reduce(operator.add, strs, "") |
Semigroup | Closure + Associativity | Segment trees, range queries, aggregation without identity | max([3,5,2]) → associative but no natural identity |
Group | Monoid + Inverses | Undo/redo, rollback, cryptography, modular arithmetic | (a + b) - b == a ; pow(a, -1, n) (mod inverse) |
Abelian Group | Group + Commutativity | Vector addition, CRDTs, distributed systems | (1,2)+(3,4) == (3,4)+(1,2) |
Ring | Abelian group under +, monoid under ×, distributivity | Arithmetic libs, symbolic math, modular arithmetic | int , fractions.Fraction , sympy.Poly |
Field | Ring + multiplicative inverses (≠0) | Machine learning, linear algebra, graphics | fractions.Fraction(1,3) ; numpy.float64 |
Lattice | Partially ordered set with meet ∧ and join ∨ | Type systems, compiler dataflow, access control | {1,2} | {2,3} == {1,2,3} (join),{1,2} & {2,3} == {2} (meet) |
Boolean Algebra | Lattice + distributivity + complements | Bitsets, DB queries, circuit logic | a & b , a | b , ~a on integers/bitsets |
Vector Space / Module | Abelian group + scalar multiplication over field | Numpy arrays, ML, graphics, physics | numpy.array([1,2])+numpy.array([3,4]) ; 3*np.array([1,2]) |
Semiring | Two monoids (+, ×), distributivity, no negatives | Dynamic programming, graph algorithms (shortest paths, path counts) | Shortest path: (min, +) semiring with math.inf as identity |
Fascination with Fractals
(Oct 22, 2019)
Why are fractals so ubiquitous in Nature than Euclidian geometry? What property of fractals make them so favourable for these blueprints? I like to approach this from 2 different angles.
God is a lazy programmer. Imagine you have to render the graphics of fire or clouds with triangles or ovals! Hell of a task, right? Indeed, a few iterations of a simple yet elegant fractal equation can generate these on your game world. It is not so difficult to drive home the point that fractals are the generator equations of the world we see around us, so fractal equations can easily generate models of them - low algorithmic complexity - lazy programmer. But, that’s a bit of ouruboros logic. The real equation is, why do we see fractal generator equations in the blueprints of the Universe? Why can’t clouds just be oval or fires as triangles like in the computer games of the early 1980s?
This has to do with compressing. Fractals are the edge of chaos, where the system transitions from a periodic attractor to a chaotic randomness. This also goes hand in hand with class 4 Wolfram automata which are universal computers which has enough expressive power to program everything in an unified structure, yet, the rules are simple enough and don’t get lost in chaos. Fractals are also great data compressors that can be prioritized with respect to the iteration level, working exactly like a Discrete Wavelet Transform, where the larger amplitudes and low frequency terms are captured in the lower iterations whereas the finer details can be compressed in the higher iterations allowing viewing the final product at different levels of approximation without losing the big picture, to interpret the general law behind them. Thus, there is a very subtle difference between a fractal of 100 iteration (say a Koch curve) and a fractal of 100 iteration with a small variation allowed at each level (say the coastline of Britain). In the later, an enormous amount of information can be encoded at different level of approximations. A little child can build an encoded message with pebbles on a particular beach without changing the overall fractal dimension much.
So fractals in a way allows us to start with a vague design and then periodically tweek it with small modifications to reach the design of interest. The question remains: is that how the Universal laws emerged? Chunks of smaller and smaller sized phenomena adding higher order refinements to the evolution of the universe.
GUT from It
(Mar 27, 2019)
Recently I was reading this article on constructing space-time from computation which opened the flood gates of correlating theories in my head.
Before I describe my proposition, let’s list down the ingredients:
- Plancherel’s theorem which states the integral of a function’s squared modulus is equal to the integral of the squared modulus of its frequency spectrum.
- Kolmogorov/Algorithmic complexity
- String length
- Launderer’s principle
- Fourier transform
- It from Bit
- Thermodynamics
- Measurement in Quantum Mechanics
The equation: \[\biggl\Vert \int_0^{t_u} |f(x)|^2dx - \int_{-\infty}^\infty|f(\xi)|^2d\xi \biggl\Vert \equiv \bigl(len(S) - K_\mathcal{U}(S|X)\bigl)kTln2\]
The interpretation:
Let $f(x)$ be the state of the Universe encoded as a bit string. The absolute difference between the integral of the function’s squared modulus and the integral of the squared modulus of its frequency spectrum gives us the amount of new information generated by the Universe in the time duration of the integral of the function, i.e. $[0,t_u]$. This is equivalent to the work value of the bit string given by the fuel value of the string scaled by the Boltzmann constant and the temperature, following reversed Launderer’s principle. The fuel value is the difference between the length of the string and the conditional Kolmogorov complexity of the bit string, given the Fourier transform of it. This transform represents the derivable physical laws given the bit pattern of the Universe.
(A)daitva
(Jul 23, 2019)
When you are into the topic of emergence, you can’t help but wonder about the phase transitions where different laws take over at different scales. Quoting Douglas R. Hofstadter (from the book I am a strange loop), “thinkodynamics is explained by statistical mentalics”, sometimes knowing everything about individual components of a system (e.g. neuron) tell us very little of how the components behave as a whole (e.g. consciousness). It is not sorcery that the usual scientific method of reductionism does not work here. It is simply that many laws of the overall system is embedded in the interaction behaviour of the components, rather than the components themselves. In physics, we call this coupling. In quantum computing, perhaps, a similar notion is of entanglement. Following the ideas of Juan M. Maldacena (in his ER = EPR paper with Leonard Susskind), in classical mechanics, they are wormholes.
A question that perhaps keeps popping up is, are gravity (general relativity) and quantum mechanics one and the same - two different ways (even mutually conflicting at times) of interpreting the same thing? They work extremely well in their own niche scale - GR for galactic scales, QM for atomic scales. The obviously problems arise when there is both, mass concentrated in small space, as in the early Universe or blackholes. One way of approaching this problem is called the Holographic Principle, where two very different interpretations, a bulk theory in n-dimensions and a boundary theory in (n-1)-dimensions, describe a single reality.
However, grand unified theory (GUT) and consciousness are not the only places where scientists have trouble going from two views of reality to one. It is very much a problem within the basic postulates of quantum mechanics itself; where normally a closed system evolves unitarily (which is invertible, deterministic and continuous), while any interaction with an observer (nothing to do with consciousness), results in a measurement (which in irreversible, probabilistic and instantaneous).
What is more interesting as a computer scientist is to wonder, is this duality true for computability and complexity as well? For complexity, Shannon and Kolmogorov metrics converge asymptotically for true randomness. For computability, what is the difference between the state machine and the tape in the Turing Machine. For languages, what is the difference between syntax and semantics? Why does the explaination capability of a neural network inversely proportional to it computation expressibility - is that the Godel’s incompleteness theorem in action?
Are there more such dualities?
- the idea and the meta
- the syntax and semantics
- the body and the soul
- the particle and the wave
- the observer and the object
- the theorems and the axioms
- the first and the zeroth
- the natural and the supernatural
- the known and the unknown
- the knowable and the unknowable
- the statistics and the probability
- the output and the program
- the program and the compiler
- the tape and the state machine
- the system and the environment
- the continuum and the quanta
- the memory and the processor
- the cardinals and the ordinals
- the nodes and the network
- the position and the momentum
- the energy and the duration
- the entanglement and the coherence
- value of a field and its change at a certain position
- spin on 2 different axis
Does generalization take you only as far as indentifying 2 fundamental ideas working in a symphony? We can either call it a single coin, or we can call them two opposite faces, or acknowledge only the face facing us, or the entire set of possibilities while they/it are/is spinning.
The Grand (Un-)unified Theory
(Jul 23, 2019)
While theoretical physicists are lamenting over the differences and compatibility of two of the most fundamental physical laws, a more birds eye view of the landscape of the universal design reveals some very important structures, that are so deeply embedded around us, we need to ask, why? Here I ponder over some of those structures that I find particularly interesting.
- Godel’s Incompleteness Theorems
- Kolmogorov Complexity
- Quines
- Fractals
- Chaos
- Shannon Entropy
- Holographic Universe
- Quantum Entanglement
- Golden Mean
- Neural Network
- DNA
- Thermodynamics
- Standard Model
- Brainwaves
- Plank Units
- Cellular Automata
- Church-Turing Thesis
AGI Chatbots
(Apr 8, 2019)
Who doesn’t want an AI like Jarvis!
Let’s have a look at some of the equally or more powerful/conscious AIs in fiction:
- Transcendence (2014) - My favourite when it comes to a superpowerful pervasive AI
- I-Robot (2004) - Sunny and Viki, AIs with a world plan with the 3 laws of Issac Asimov
- Westworld - TV series and movie where AIs evolve consciousness
- Avengers (2015) - Don’t forget the evolving Ultron
- Anukul (2015) - Similar concept to Sunny, based on Satyajit Ray’s work
- Ex-Machina (2015) - Reimagining the Turing test
- Her (2013) - Emotional OS
others like Tron, Matrix, Wall-E, Interstellar (TARS), StarWars (R2D2), etc.
Ok, now to the 2 aspects that are required to make these types of AI:
- Interface - humanoid, voice commands, etc..
- Intelligence - self-evolving, meta-learning, etc..
Interface
The technological know-how of the state-of-the-art research in artificial intelligence is very close to what we might need. The knowledge is scattered in various artefacts - but the ingredients exist:
- Siri/Alexa/Cortona/Google Assistant/Bixy - basically the ability to crawl the internet for facts, and having a voice command interface
- Replika - the homely conversation you might want, a chit-chat bot, now also with a voice calling feature
- Sophia/Harmony - the physical appearance you might want it to have
- Boston Dynamics robots - for that extra dose of mechanical movements
Clubbing these into a single entity would make a great interface!
Intelligence
Now, to the brain.
Most of today’s AI focus on what’s called Narrow AI, specialized training for specific tasks, e.g. Deep Blue’s chess, IBM Watson’s jeopardy, OpenAI’s DotA, AlphaGo, etc. However, most of these require a huge computing hardware for their marvels.
Some of the early pioneers of AI (Turing, McCarthy, Minsky, Solomonoff) had a vision of an Artificial General Intelligence (AGI). It wasn’t possible in the hardware of that era (perhaps not possible even today). But, evolving a program was quite possible in some of these early languages, like LISP (Scheme). The framework existed.
With the advent of research on Artificial Neural Networks (ANN), we now have a better understanding of ‘learning’ complex associations. Yet most ANNs are trained on a fixed topology with a specific dataset. This brings us to the current focus on neural plasticity - the ability to expand the learning capabilities to other domains - like transfer learning, active learning, lifelong learning, etc. based on what’s called Topology and Weight Evolving Artificial Neural Networks (TWEANN). UberAI is doing some fascinating work on this topic.
Also, recent research on Spiking Neural Networks, and memristor-based Neuromorphic accelerators brings us closer to biological realism for ANNs.
On the other hand, there are rigorous mathematical models of AGI, by Jürgen Schmidhuber and Marcus Hutter, called Gödel Machines and AIXI, respectively. Implementing these self-improving systems is highly non-trivial.
I believe, these are what is required for the ‘brain’ part.
Yes, we can make Jarvis as humanity!
We, however, don’t have a single Tony Stark!
P.S. - Making Iron Man is way easier with Jet Packs :P
Let’s actually make one
Links:
- Platform: Mycroft AI https://github.com/MycroftAI
- Skills:
- Wikipedia and Wolfram Alpha https://medium.com/@salisuwy/build-an-ai-assistant-with-wolfram-alpha-and-wikipedia-in-python-d9bc8ac838fe
- Replika Cake-Chat https://github.com/TREE-Ind/skill-fallback-cakechat
- Desktop control https://github.com/TREE-Ind/desktop-control
Mind reading
(Aug 7, 2023) (updated Jun 2025)
Can we read the mind? Most probably soon.
Here are a few articles (which I hope to review when I find time).
- Semantic reconstruction of continuous language from non-invasive brain recordings
- Is my “red” your “red”?: Evaluating structural correspondences between color similarity judgments using unsupervised alignment
- A neuroimaging dataset during sequential color qualia similarity judgments with and without reports
- A Mathematical Perspective on Neurophenomenology
- Qualia Research Institute
(coming soon)
- AIXI and Godel Machines
- Pull the pin puzzle
- The Next Generation Of Artificial Intelligence
- AGI vs Narrow AI
- Artificial General Intelligence: Concept, State of the Art, and Future Prospects