You can explore the related code snippets here on tech.io.
Deep Learning has introduced methods capable of extracting latent variables from an amazing variety of inputs. Raw facial image data can be quickly converted into features such as emotional expression, face orientation, or even suspect identity. Deep Learning has similarly proven capable in the application of motor control and near-term action planning. With these technologies, and an abundance of training, we are discernably closer to Sci-Fi level Artifical Intelligence. However, there remains a large gap between the input and output applications. Here we propose a Natural Language model capable of interfacing between the two applications in the most general sense. The basic outline is that natural language becomes the output of recognizers and input of planners. The model must also take into account the appropriate usage of various models available.
To start let’s imagine an app that maps the emotions of a user’s face caught through a webcam onto that of a generic face. This app could be written as a simple mapping of happy/sad represented as a number: positive is happy, negative is sad. This is a bit restrictive because emotions are more complex. Specifically, there are considered to be four base emotional states that can be expressed through facial expression: happiness, sadness, anger, and fear.
To expand our range of emotions we could instead create a structure of floating point numbers, one for each emotion. For each emotion positive would represent positive confidence, zero would represent no confidence, and negative would represent negative confidence.
This model is still lacking if we want to include other information such as face orientation. To encode this information we would need another structure for possible information regarding face orientation. To maximize generality we should also consider the case where this information could be missing or partial. At this point we need to consider a more formal grammar. Eliding the specifics of such a grammar, let’s just continue with the assumption that this grammar exists (any human language would suffice to encode this information, for example).
The final step would be to implement inference, action, and planning. To achieve this we should reflect on what we have so far. The current model is basically just features with confidence values. To complete inference and planning we need two more values associated with each feature: justification direction, and justification distance.
These last two values are hard to understand without hands-on examples. The simplest to explain of the two is “justification distance”. The process of creating a feature confidence value can be either long or short. It is generally accepted in philosophy, math, and other disciplines that long proofs are more likely to be wrong, unless they were designed to be elaborate. For this reason it helps us to mark confidence values with a length ranging from very short (atomic) to very long (astronomic). At first swing it might seem reasonable to just change the confidence values to account for this phenomena. However, truth and justification length often vary independently, so encoding them together would cause important information to be lost.
The final feature associated value would be justification direction. This is simply the semantic direction of inference. If the feature should be true due to inference, then the direction is forward (from assumptions to conclusion). If the feature is observed to be true, then the direction is backward (from conclusion to assumptions).
That is all that this model needs. I’ll be working on integrating these features into my OpenAI bots, so watch for updates. These techniques are very necessary for unsupervised learning and that is the most common task available, so there is plenty to explore.
This post was originally published on medium.com