In this article, I aim to provide in-depth explanation for Scope and related concepts like scope-lookups, hoisting and scope-chaining, etc in JavaScript.
Before we begin, it is important to understand the technical foundation of the language, most notably, the JIT-compiled nature. I already wrote some articles on topics: A technical introduction to JavaScript and JavaScript runtime and code lifecycle which you can checkout. Also, note that we are going to take a bottom up approach for understanding the concepts. Hence first we will walk through (briefly), a sample code life-cycle and then try to understand the concepts. With that said, lets get started!
As JavaScript is a JIT-compiled, it goes through two steps before finally getting executed: The Compilation Step (or Creation Phase) where the source code is turned into a non-executable intermediate code called byte-code and then The Interpretation Step (or Execution Phase) where the byte-code is converted to machine-code for execution.
Below we will walk through these two processes using some sample examples. They are already covered in depth in the articles mentioned above.
JavaScript Compilation: Creation Phase
When JavaScript code is in it's compilation step it creates an Abstract Syntax tree (AST) using tokens. When the AST is being created, it contains empty objects for identifiers or variables, instead of the value contained by them. Below is an estimated AST diagram for a sample code:
In the above diagram, the proxy represents the empty objects. As we will see, these proxy objects are later replaced with identifier's memory reference.
Along with AST creation, the compilation process also creates Environment Records, also referred as just environments. Environment Record is a specification type, an object like structure containing key-value pairs. These environment records hold information regarding identifier bindings, with some other meta data. Most importantly, they contain identifier's memory reference, which is later used to fetch the value contained in it. It also contains an outer
variable, pointing to other environment record. This extra information is used for creating scope chains. We will explore them later in the article.
There are many types of environment records.A Lexical Environment is an environment record which represents lexically written code. For our use-case we are going to use the terms environment record and lexical environments interchangeably.
As we can see from the diagram above, there are two types of environments created: Global and Local. A Global environment is always created when the code execution is started. Local environments are created when compiler encounter code-blocks in the code. Code-blocks are code snippets inside curly braces ({...}
).
However, not every code block encountered by the compiler creates a lexical environment. These are only created under certain conditions, like when the compiler encounters a code-block associated with a function declaration or a code-block containing let
or const
variable declarations or for a code-block associated with catch
construct, etc...
To summarise, an environment record is created for code-blocks under certain conditions. Whenever these environments are created the identifier information is stored in these environments.
Environment Record is a specification type, meaning it represent an abstract concept in the ECMA-262 specification and is hard to accurately represent as vendors are free to implement them as they like provided their behaviour remains unchanged. For more information on environment records, refer to this document of the specification.
Once the compiler has a finished generating AST and all the relevant environment records for the code, the next step in compilation process is to start generating byte-code using the information present in environment records. This step is also called parsing.
In parsing, the AST is traversed, node-by-node, and byte-code is generated by filling the proxy object in AST, with identifier's memory reference (and not its value) by referring the corresponding environment record (a lookup). These identifier references are later used for resolving actual values held by identifiers in execution phase (another lookup).
Once the byte-code is generated, the creation phase ends and the execution phase begins.
JavaScript Interpretation: Execution Phase
The byte-code is given as input to the interpreter which is responsible for the execution, i.e. converting the platform independent byte-code to platform dependent machine-code (0s and 1s). This is where Execution Contexts come into picture.
Similar to lexical environment, Execution Context is an abstract concept (again, a specification type) used to track the execution of a JavaScript program. We can define Execution Context as a "runtime form" of lexical environment with some extra information to execute a JavaScript code successfully. Each execution context contains the following:
- A lexical environment
- Value of
this
When a chunk of byte-code is being interpreted/executed to machine-code, the corresponding execution context (i.e. a lexical environment and this
binding information) is pushed onto a stack data-structure called Execution Context Stack, otherwise called as call-stack. The execution context on top of this stack is called a running execution context, because it represents the code being run at the current moment of time. For each running execution context the interpreter start performing lookups for values, contained in identifier references (added in the byte-code previously).
In the diagram above, the blocks highlighted with orange represent how execution contexts are used to fetch runtime values required to executed the program. The highlighted block in top of call-stack is represents running execution context.
The JS interpreter fetches the value using identifier's memory reference. The byte-code is then interpreted to machine-code and the code is finally executed.
Now that we understand both the phases we are ready to explore JavaScript Scopes and different terminologies associated with it.
We are going to dissect the following sample code:
var c = 10;
function foo(a){
if(a === 10){
let b = 20;
console.log(b,d);
}else{
var e = 20;
console.log(e,c)
}
var d = 100;
}
foo(c);
Scopes
When we run the above code, the first step it goes through is compilation. We already know that in compilation, AST and lexical environments are generated.
Scope refer to the visibility of an identifier in a particular code-block (or in specification terms, a lexical environment). From this perspective, Scopes are nothing but a static representation of lexical environments.
We also discussed that lexical environments are created for code-blocks with some rules, let discuss two most important conditions where scopes are generated by the compiler.
Compiler creates a new scope for a code-block which contains at-least one let
and const
declaration or when a code-block is associated with a function declaration.
A code-block with let
or const
declaration gets its own scope (lexical environment), hence we say let
and const
are block scoped, but if we use var
, the identifiers are associated with the most recent 'outer' functional lexical environment. Hence var
is always functionally scoped.
The following diagram represents the creation of scopes for different identifiers for our sample code above. Note that I have intentionally left the outer
field empty. We will come back to this when exploring scope chaining:
-
c
andfoo
are created in Global Lexical Environment (Global Scope), represented in blue. - Because
foo
is a function a new lexical environment is created for the code block associated with it, represented in green. This containsa
,d
ande
but notb
. - For the
if
statement, a new lexical environment is created because it is associated with a code-block containing alet
declaration (block scoped). Hence,b
is present inside this lexical environment instead offoo
's, represented by purple. Note that this is not true forelse
because it doesn't contain anylet
orconst
declared variables.
That's all what scopes are. A word which represents the visibility of an identifier in the code or a representation of static lexical environments.
Hoisting
Before defining hoisting, let us understand the memory allocation process used by JavaScript as it plays a crucial role in the definition.
In compilation phase, identifiers are allocated memory, however not every identifier is treated the same. Identifiers associated with function
and var
declarations are allocated memory and are initialised. Identifiers associated with let
and const
are allocated memory but are not initialised with any value. They are said to be allocated in a Temporal Dead Zone or TDZ. Because there is no initialisation for these variables, they cannot be 'accessed' before they are initialised.
function
declarations are straight away initialised to the associated code-block. var
are initialised with undefined
. For example in our sample code var c
is initialised with 10
but compiler anyway initialises it with undefined
once and while execution, when interpreter gets to the line var c = 10
it is re-assigned to 10
.
Because of this behaviour var
variables are accessible in the statements with a value of undefined
even before their actual declaration or assignment happens in the code.
console.log(msg) // undefined
var msg = 'Hello World'; //undefined replaced with 'Hello World'
With this understanding, let us define Hoisting.
Hoisting is a phenomena to use an identifier before declaring it, causing side-effects in the program. Note that the definition is generally opinionated but the above provided definition is quite generic and should work for most of the opinions.
Newer constructs like let
and const
are hoisted but not initialised with a value and hence cannot be accessed before they are declared.
var
identifiers are hoisted and initialised with undefined
value.
Identifiers associated with function declarations are also hoisted, but are not initialised with undefined
, instead they are assigned the code-block represented by the function declaration.
In English language, hoisting means lifting objects with ropes and pulleys. But JavaScript doesn't actually lift or pull any variables or functions up in the code-structure, contrary to what we generally read about hoisting. No declaration is 'moved' on top of scope or in-fact anywhere. It is just a naive way (and misleading) of explaining the concept.
Hoisting happens per scope (lexical environment), so for any code-block whose lexical environment is being created, the block is first 'scanned' for all the declarations and all variable declarations are allocated some memory. undefined
is assigned to identifiers declared using var
. let
and const
declared identifiers are placed in a TDZ instead, without any initialisation.
Now that we understand hoisting, let us understand a hoisting scenario in our sample code. When we run the sample code:
var c = 10;
function foo(a){
if(a === 10){
let b = 20;
console.log(b,d); // 20 undefined
}else{
var e = 20;
console.log(e,c)
}
var d = 100;
}
foo(c);
d
is hoisted and initialised with a value of undefined
for foo
's lexical environment (scope), and hence prints undefined
because when d
was accessed in console.log
it hasn't been re-initialised with 100
.
Scope Lookups and Scope Chains
Whenever a JavaScript code is compiled and executed, a global scope is always created. Remember that execution contexts are nothing but lexical environments (scope) with some extra information (this
binding) and resolved identifier values instead of identifier references.
There are two types of lookups performed by JavaScript. The first one happens at compile time, which looks up identifier name against a memory reference to check what memory reference is allocated to the corresponding identifier name. The second look up happens at runtime, where these memory references are resolved for the actual values they contain.
Lets try to elaborate on this with our sample code example:
A LHS lookup happens in creation phase (or compile time), when we lookup for an identifier memory reference against its name (c
, b
, d
...etc). The proxy objects in AST are replaced with these memory references in byte-code, fetched after a successful lookup.
A RHS lookup happens in execution phase, when we want to resolve an memory reference of an identifier (which was looked up in the above step) for the actual value contained by that identifier (10
, 20
, 100
...etc).
An unsuccessful lookup results in ReferenceError
in strict mode, because it means the name of the variable we are looking for has never have been declared in any lexical environment of the code. In non-strict mode, unsuccessful LHS lookup will automatically create a var
variable with the name of identifier being looked up.
// strict mode
let a = 10;
console.log(a) // 10
console.log(b) // lookup fails for `b`, ReferenceError
To understand the concept better, lets take some examples of LHS and RHS lookups from our sample code.
var c = 10
will result in a LHS lookup for c
. When foo(c)
or console.log(e,c)
is called with c
as argument (assuming the else
block executes), a RHS lookup will happen to pass the copy of the value contained in c
as function argument to foo
or console.log
.
Similarly, let b = 20
will trigger a LHS lookup, console.log(b,c)
will result in a RHS lookup.
To summarise, LHS lookups 'checks' for memory references against identifier names. A RHS lookup 'checks' for the value contained in a memory reference, already verified by LHS lookup.
A rather simpler (but naive, and sometimes confusing) way to understand these lookup are these:
LHS lookup is also called Left Hand Side lookup and RHS, Right Hand Side, because LHS happens for resolving identifier lookups on 'left-hand' of a statement (i.e while initialisation or re-initialisation) while RHS happens when identifiers are on the right-hand side (when we actually use that variable).
Now that we have discussed lookups, let us talk about Scope Chains. We established that in creation phase, JavaScript compiler creates lexical environments, which are just a static representation of scope.
While creating lexical environments, the compiler associates different lexical environments with each other using the outer
variable.
This association is based on nesting structure of code-blocks in the code-structure. For eg. foo
's lexical environment will have an outer
variable pointing to the Global Lexical Environment. Similarly, the if
's lexical environment's outer
variable will point to foo
's lexical environment. We will now complete our diagram by filling the value of outer
:
This association of lexical environments (or scopes) help JavaScript interpreter travel through different scopes when a lookup is not successful in the current scope using the outer
reference.
For example, in our sample code, if we run call foo(c)
,
when the interpreter reaches console.log(b, d)
, it tries to resolve for d
in the if
's block scope. The lookup returns unsuccessful because d
is not defined there. It will then try to look into outer lexical environment using the outer
field of if
's lexical environment, which is pointing to foo
's lexical environment, because if
block is nested inside foo
. It finds it there and returns with a successful lookup.
Hence, scope chaining refers to a concept which represent accessible nested scopes of a JavaScript program as a result of writing lexically scoped code, which is utilised to resolve identifiers in a foreign scope, if the identifier is not resolved in the current scope.
Shadowing
Shadowing refers to an event where the JavaScript engine resolves the value of a locally scoped identifier with same name as identifier in a foreign or outer scope. For example:
let message = 'Hello Global!'
function printMessage(){
let message = 'Hello Local!'
console.log(message)
}
printMessage(); // Hello Local!
The message
local to the scope of printMessage
function has said to have shadowed message
in global scope.
Conclusion
Lexical environments, scopes and execution contexts advocate similar ideas at different time-period of the program. There is no such term as Scope in ECMA specification. Scope is a passed around term generally meant for visibility of the identifier inside a code-block. However lexical environments and execution contexts are specification types. In a very generic sense, lexical environments are static representation of scope, while execution contexts are runtime representation of scope.
Until next time, happy hacking :)
Top comments (0)