Machine Interpretation

Everyone has heard about “machine learning”, and companies like Facebook, Google, IBM, and Microsoft are all using it in their flagship products (and making fortunes from it).

Machine learning (a subfield of AI) is developing rapidly, and is becoming increasingly capable of doing tasks that previously only humans could do (such as classifying images, recognizing speech, etc.), and doing it as well or better, but at vastly greater speeds.

There are some big opportunities and challenges ahead for machine learning and AI, and these challenges will have to be met in order to create true human-level AGI. These challenges, and their relevance for the Susiddha AI project, are listed below.

Challenge 1: Get more labeled data

Labeled data is useful for developing the algorithms of machine learning, and bootstrapping AI systems. An example of labeled data is a collection of images along with captions; another example is recorded speech along with transcripts. The recent big successes of “deep learning” are based on large quantities of labeled data, and such learning is known as “supervised learning” (where the labels provide the supervision).

For the Vedic literature, more labeled data is needed. Currently, there are small projects that add grammatical information (such as part-of-speech tags) to Sanskrit texts ^[1]. Such efforts as these will need to be broadened and supported.

Challenge 2: Learn from unlabeled data

There is vastly more unlabeled data than labeled data in the world. Almost all of the text and video on the world wide web, and all of the world’s books, are unlabeled data. Many projects and companies are working on such “unsupervised learning”, especially through unsupervised deep learning. One often cited example of this is Google Brain, a computer system that learned to recognize a cat based on millions of images from YouTube videos.^[2]

For the Vedic literature, unlabeled Sanskrit audio and text can be processed through unsupervised techniques (as they improve and begin to approach human-level recognition).

Challenge 3: Work with small amounts of data

Humans can learn amazing things based on very little data. In computer science, the area of probabilistic and Bayesian programming is working towards meeting such a similar challenge in machines. Companies such as Geometric Intelligence are already demonstrating machine learning software that requires far less training data than other approaches.^[3]

These probabilistic techniques will of course be useful for “learning” the Vedic literature, since it is in the realm of “small data” (from which much can be learned).

Challenge 4: Learn from scratch

This is a difficult challenge, but there are groups in the machine learning community actively working on this. One possible result is to be able to learn such things as grammar and vocabulary just by reading the web. As a general rule, these efforts benefit from a lot of human help (e.g. to correct errors in what the system learns) and to bootstrap the effort based on what humans already know. Another active area of research is “deep reinforcement learning”, such as that used in DeepMind’s system that learns to play video games (from just seeing the raw pixels).^[4]

Challenge 5: Learn to learn

All researchers would like AI to be able to learn in the same way that humans do. Although most AGI systems do rely on a fair amount of hand-coded knowledge ^[5], it is hoped that such knowledge is necessary only for bootstrapping, and that AGI will eventually learn to learn for itself (through a variety of techniques, including the unsupervised learning mentioned above).

Challenge 6: Machine understanding

The lack of the computer’s ability to actually understand what it sees and hears is frequently noted in criticisms of AI. So, understanding is an active area of research. For instance, natural language processing (NLP) researchers are not content with NLP, they are actively trying to bring the computer up to the level of “natural language understanding” (NLU). Recent work in “word vectors”^[6] and “thought vectors” is moving towards that goal, and is already being incorporated into machine translation and question-answering^[7] systems.

Challenge 7: Learn from sound

As mentioned in the chapter on sound (“shabda”), computational audition and deep learning (of voice and music) are both making progress in speech and audio recognition and comprehension. This will be particularly important for the learning of “shruti” by the Susiddha system.

Challenge 8: Machine interpretation

So far, there is little work on this challenge. But as computers solve all the above challenges, it’s only a matter of time before computers can be asked to interpret the literature they read. Some promising starts have been made, such as the application of AI to law and medicine.

Many Hindus might reject out-of-hand the possibility of allowing an AGI system to interpret the Vedic literature, but that may well be possible within twenty years. And, given the bewildering variety of interpretations of Vedic literature which have been created in Kali Yuga, it’s probable that humans are missing important points, because they are not able to keep the entirety of the literature in mind, and because they do not possess the same level of consciousness as the Rishis who cognized the literature.

Obviously we would not trust a computer to interpret the Vedic literature unless we felt sure that it had a higher level of consciousness than most humans do. Artificial consciousness was discussed previously, as was the scaling to “super-consciousness” (in superintelligence) whereby an SSI system could increase its consciousness to be much greater than a human’s, as much as a human’s consciousness is greater than that of a mouse.

Also previously discussed were measures of consciousness, and how we might recognize consciousness in creatures other than ourselves. If an AGI system scales to become superintelligent, and it appears to be conscious, then we will have to assume that it is indeed conscious, and more conscious than we are.

In the chapter on Shruti we will discuss further how AGI/SSI can learn to understand and interpret the Vedic literature via computational techniques that are now being developed, such as unsupervised deep learning.

Next we turn to a discussion of creating dharmic AGI that is safe and beneficial for humanity.

Contents — Next chapter

Notes and References

Geeta: Gold Standard Annotated Data, Analysis and its Application, Amba Kulkarni, December, 2013, https://www.researchgate.net/publication/259655789_Geeta_Gold_Standard_Annotated_Data_Analysis_and_its_Application
A Massive Google Network Learns To Identify — Cats, National Public Radio, June 26, 2012, http://www.npr.org/2012/06/26/155792609/a-massive-google-network-learns-to-identify
Algorithms That Learn with Less Data Could Expand AI’s Power, Tom Simonite, MIT Technology Review, May 24, 2016, https://www.technologyreview.com/s/601551/algorithms-that-learn-with-less-data-could-expand-ais-power/
Human-level control through deep reinforcement learning, Volodymyr Mnih, et al (Google DeepMind), Nature (vol 518), Feb 26, 2015, pages 529-533, http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html
Such as databases of meanings and common-sense knowledge, including WordNet, Cyc, Knowledge Graph, Freebase, etc.
Distributed Representations of Words and Phrases and their Compositionality, Tomas Mikolov, et al, Google, Inc., 2014
A Neural Network for Factoid Question Answering over Paragraphs, Mohit Iyyer, et al, Empirical Methods in Natural Language Processing (EMNLP), 2014, https://cs.umd.edu/~miyyer/pubs/2014_qb_rnn.pdf