Introduction
In many programming languages, using the assignment operator (=
) to duplicate arrays or maps (or similar data structures) does not create a new, independent copy. Instead, it creates a new reference to the same underlying data. This can lead to unintended side effects when the original data is modified. In software terminology, we use terms "deep copy" and "shallow copy" to describe different methods of duplicating objects:
- A deep copy of an object creates a new object and recursively copies all objects found in the original. This means that the new object is a complete copy, with no shared references to the objects in the original.
- A shallow copy of an object creates a new object but inserts references into it to the objects found in the original. Therefore, the new object is a copy of the original object's structure but not of the objects that the structure references.
In this short article, I would like to show how using the wrong copying approach could lead to unintended consequences in our programs.
Simple example
NOTE: We use Golang to present it, but the same behavior could be seen in other popular languages like Javascript, Python, etc.
What happens when we assign a hash map to another one?
package main
import "fmt"
func main() {
print("SHALLOW COPY")
headers := map[string]string{
"Content-Type": "application/json",
}
fields := headers
fmt.Println("Fields:", fields)
fmt.Println("Headers:", headers)
fmt.Println("ADD AUTHORIZATION")
fields["Authorization"] = "Bearer 123"
fmt.Println("Fields:", fields)
fmt.Println("Headers:", headers)
}
We initialized a headers
map, assigned headers
to fields
, and eventually added Authorization
to fields
. Our intention was probably to copy values from headers
to fields
and made changes that affect fields
but not headers
. However, we get something like that:
SHALLOW COPY
Headers: map[Content-Type:application/json]
ADD AUTHORIZATION
Fields: map[Authorization:Bearer 123 Content-Type:application/json]
Headers: map[Authorization:Bearer 123 Content-Type:application/json]
By adding Authorization
to fields
we modified not only fields
but also headers
. The reason is that the assignment operator did shallow copy and we did not duplicate the object but only copied the reference to headers
. Consequently, headers
and fields
pointed to the same object.
package main
import "fmt"
func main() {
fmt.Println("DEEP COPY")
headers := map[string]string{
"Content-Type": "application/json",
}
fmt.Println("Headers:", headers)
fields := deepCopy(headers)
fmt.Println("ADD AUTHORIZATION")
fields["Authorization"] = "Bearer 123"
fmt.Println("Fields:", fields)
fmt.Println("Headers:", headers)
}
func deepCopy(m map[string]string) map[string]string {
newMap := make(map[string]string)
for k, v := range m {
newMap[k] = v
}
return newMap
}
We replaced the assignment operator (shallow copy) by deepCopy
function (deep copy). Now headers
and fields
just after making a copy have the same values, but are represented by two different objects.
DEEP COPY
Headers: map[Content-Type:application/json]
ADD AUTHORIZATION
Fields: map[Authorization:Bearer 123 Content-Type:application/json]
Headers: map[Content-Type:application/json]
The headers
is not bound to fields
and any modification of fields
will not affect headers
.
Concurrent access
The above example was trivial and straightforward. However, things get tricky when handling many concurrent requests in our applications. In such scenarios, errors become non-deterministic, and identifying the cause of unexpected behavior becomes much harder to debug.
package main
import (
"fmt"
"time"
"math/rand/v2"
)
func main() {
fmt.Println("CONCURRENCY")
headers := map[string]string{
"Content-Type": "application/json",
}
c := client{
headers: headers,
}
operations := 10
for i := 0; i < operations; i++ {
simulateLatency()
go c.sendRequest(fmt.Sprintf("Bearer %d", i))
}
time.Sleep(1 * time.Second)
}
type client struct {
headers map[string]string
}
func (c *client) sendRequest(key string) {
fields := c.headers
fields["Authorization"] = key
simulateLatency()
fmt.Println(fmt.Sprintf("key: %s, fields: %s", key, fields))
}
func simulateLatency() {
interval := rand.IntN(5)
time.Sleep(time.Duration(interval) * time.Millisecond)
}
In this example, there is a small time gap between setting Authorization
field and printing values (or sending a request over the network in a real system). This gap and using a shallow copy of the headers
map can lead to a situation where another concurrent operation overwrites the Authorization
field. In the end, the Authorization
field is not set to the expected value.
When we run the code, we get something like this:
CONCURRENCY
key: Bearer 0, fields: map[Authorization:Bearer 0 Content-Type:application/json]
key: Bearer 1, fields: map[Authorization:Bearer 1 Content-Type:application/json]
key: Bearer 2, fields: map[Authorization:Bearer 2 Content-Type:application/json]
key: Bearer 3, fields: map[Authorization:Bearer 4 Content-Type:application/json]
key: Bearer 5, fields: map[Authorization:Bearer 4 Content-Type:application/json]
key: Bearer 4, fields: map[Authorization:Bearer 4 Content-Type:application/json]
key: Bearer 6, fields: map[Authorization:Bearer 6 Content-Type:application/json]
key: Bearer 7, fields: map[Authorization:Bearer 7 Content-Type:application/json]
key: Bearer 8, fields: map[Authorization:Bearer 9 Content-Type:application/json]
key: Bearer 9, fields: map[Authorization:Bearer 9 Content-Type:application/json]
As we expected, some of the requests have incorrect Authorization
field, because headers
has been shared between concurrent operations.
The solution here is simple: use a deep copy instead of a shallow one. Although using deep copy seems straightforward when dealing with concurrent access to shared resources, even experienced developers can make mistakes. It is crucial to be extremely careful when duplicating mutable structures to avoid errors.
Conclusion
Failing to understand the difference between deep and shallow copying can lead to unintended side effects, especially in concurrent or multi-threaded environments where a shared state can be inadvertently modified. It is crucial to choose the correct type of copying based on the needs of your application to ensure data integrity and prevent subtle bugs.