Overview
If you are using Amazon WorkDocs as your managed cloud based content management and/or storage system, and if you are planning to automate tasks such as integrate it with other document/content storage systems, then you must have come across the use case of uploading a file. After some automation, you should be able to do this at scale. In a future post, I will share a detailed reference architecture on how to build such an integrated system.
The following sections demonstrate the various aspects of the app, starting with setting up a simple Node.js app. However, there are some prerequisites
- AWS IAM user with sufficient privileges e.g. I am using my development account and I have created a user with admin privileges without AWS Management console access, and I rotate its access keys regularly. For more, read AWS IAM best practices
- an existing Amazon WorkDocs site
- Install and configure AWS CLI with named profiles
Initialize npm project
I have used the following commands to initialize a new npm project
➜ mkdir workdocs-sample && cd workdocs-sample
➜ npm init
➜ npm install aws-sdk axios form-data got
➜ touch index.js
After initialization, my folder structure looks like this:
➜ workdocs-sample ls
da-quiz-storage-result.pdf
index.js
node_modules
package-lock.json
package.json
yarn.lock
Initialize the WorkDocs client
Setup AWS credentials in index.js
. For more information, read best practices to use AWS credentials in your development environment
const AWS = require("aws-sdk");
const credentials = new AWS.SharedIniFileCredentials({ profile: "default" });
AWS.config.credentials = credentials;
In addition to that, you'll need the following declarations
const got = require("got");
const fs = require("fs");
const FormData = require("form-data");
const workdocs = new AWS.WorkDocs();
Finally, initialize the WorkDocs client
const workdocs = new AWS.WorkDocs();
Steps to upload a file
To upload a file to a WorkDocs folder you need the following:
- a folder ID to upload
- to get the root folder ID, you need to make a call to
describeUsers
API - if you have created new folders at the root, then you need to call
describeFolderContents
with the root folder ID
- to get the root folder ID, you need to make a call to
- call
initiateDocumentVersionUpload
with the folder ID, name of the file, and optionally, a content type. It returns an Amazon S3 pre-signed upload url, document ID, and a version ID among other things - use
got
to upload the file to the returneduploadUrl
- call
updateDocumentVersion
with document ID, version ID, and setVersionStatus
toACTIVE
Get the root folder ID
Every user has a root folder which can contain one or more children - nothing fancy, just the usual nested folder structure. The root folder has an ID that can used to create folders inside it. Using the describeUsers
API call, we'll get the root folder ID for the user defined by the query
parameter. You can look up OrganizationId
from your Amazon WorkDocs AWS console.
const describeUsers = async () => {
const user = await workdocs
.describeUsers({
OrganizationId: "d-92672xxxxx", // your WorkDocs organization Id
Query: "sahays", // name of an existing WorkDocs user
})
.promise();
return user;
};
Initialize upload
The following code uses initiateDocumentVersionUpload
to initiate the process of uploading a file. The api requires ParentFolderId
to upload the file to, and a Name
. It returns a documentId
for the document, versionId
for the first version of the document, uploadUrl
containing the Amazon S3 pre-signed url, and signedHeaders
containing the content-type
and x-amz-server-side-encryption
encryption type.
const initUpload = async ({ folderId, filename }) => {
try {
console.log("initUpload");
const contentType = "application/octet-stream";
const initResult = await workdocs
.initiateDocumentVersionUpload({
ParentFolderId: folderId,
Name: filename,
ContentType: contentType,
ContentCreatedTimestamp: new Date(),
ContentModifiedTimestamp: new Date(),
})
.promise();
const documentId = initResult.Metadata.Id;
const versionId = initResult.Metadata.LatestVersionMetadata.Id;
const { UploadUrl, SignedHeaders } = initResult.UploadMetadata;
console.log("initUpload complete");
return {
documentId,
versionId,
uploadUrl: UploadUrl,
signedHeaders: SignedHeaders,
};
} catch (e) {
console.log("failed initUpload", e);
throw e;
}
};
The header looks like the following:
headers: {
'Content-Type': 'application/octet-stream',
'x-amz-server-side-encryption': 'AES256'
}
Upload a file using got
The following code uses got
npm library to upload a local file. Please note, we are using a PUT
request. The file is appended to FormData
using a file stream object. The headers retrieved from the previous call initiateDocumentVersionUpload
is used to set a PUT
request header.
const uploadFile = async ({ filename, signedHeaders, uploadUrl }) => {
try {
if (fs.existsSync(filename)) {
console.log("reading file stream");
const fileStream = fs.createReadStream(filename);
console.log("preparing form data");
const formData = new FormData();
formData.append(filename, fileStream);
console.log("uploading to ", uploadUrl);
const extendParams = {
headers: signedHeaders,
};
console.log("got extendParams", extendParams);
const client = got.extend(extendParams);
await client.put(uploadUrl, {
body: formData,
});
console.log("upload complete");
} else {
console.log("file doesn't exist");
throw "file doesn't exist";
}
} catch (e) {
console.error("failed uploadFile", e);
throw e;
}
};
Update document version
This important step completes the file upload transaction by setting the VersionStatus
to ACTIVE
which tells Amazon WorkDocs to mark the just uploaded file as the most recent/active version.
const updateVersion = async ({ documentId, versionId }) => {
try {
await workdocs
.updateDocumentVersion({
DocumentId: documentId,
VersionId: versionId,
VersionStatus: "ACTIVE",
})
.promise();
console.log("document version updated");
} catch (e) {
console.log("failed updateVersion", e);
throw e;
}
};
Time for that faceoff: got
vs axios
Let's take a look at axios
invocation first.
await axios.put(uploadUrl, formData, {
headers: signedHeaders
});
This results in Amazon S3 rejecting the request with the following error:
<Error>
<Code>NotImplemented</Code>
<Message>A header you provided implies functionality that is not implemented</Message>
<Header>Transfer-Encoding</Header>
<RequestId>016D6B18F95E6923</RequestId><HostId>QgYnoYEQTZR4jG7wvdLfAe6lcd2Tg+/eAOeHLvtM+CamqyDxZX8p7CV4ZL+Hph7+IOUiFJkayT8=</HostId>
</Error>
The server returns a 501: not implemented
response
response: {
status: 501,
statusText: 'Not Implemented',
headers: {
'x-amz-request-id': '016D6B18F95E6923',
'x-amz-id-2': 'QgYnoYEQTZR4jG7wvdLfAe6lcd2Tg+/eAOeHLvtM+CamqyDxZX8p7CV4ZL+Hph7+IOUiFJkayT8=',
'content-type': 'application/xml',
'transfer-encoding': 'chunked', // extra header
date: 'Mon, 18 May 2020 22:00:24 GMT',
connection: 'close',
server: 'AmazonS3'
},...
}
Now, let's take a look at the got
invocation:
const extendParams = {
headers: signedHeaders,
};
console.log("got extendParams", extendParams);
const client = got.extend(extendParams);
await client.put(uploadUrl, {
body: formData,
});
This results in a successful 200: OK
response with the same inputs
Bring it all together
The following is the entry point function that runs as a result of running index.js using node index.js
const start = async () => {
try {
const user = await describeUsers();
const rootFolderId = user.Users[0].RootFolderId;
const filename = "da-quiz-storage-result.pdf";
const {
documentId,
versionId,
uploadUrl,
signedHeaders,
} = await initUpload({ folderId: rootFolderId, filename });
await uploadFile({ filename, signedHeaders, uploadUrl });
await updateVersion({ documentId, versionId });
} catch (e) {
console.error(e);
}
};
start();
Finally
After running node index.js
in your terminal, you'll see an output similar to the following:
initUpload
initUpload complete
reading file stream
preparing form data
uploading to https://gb-us-west-2-prod-doc-source.s3.us-west-2.amazonaws.com/1b45f47aa1c4d1d1c1f0978587e10f1e56ce801824ca5d5fce0565dea6f76baf/1589767973739-0d3c7a46986cfe7d0fd8beec8258628a8b6ca0e9b0f412afafcdaf9c6aa7a00e?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200518T021253Z&X-Amz-SignedHeaders=content-type%3Bhost%3Bx-amz-server-side-encryption&X-Amz-Expires=900&X-Amz-Credential=AKIAIM5HWZT6CVS2WHIA%2F20200518%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=025e9ed29fe7f8ab85593c51a4a09b396909de47ea1e893148df14e3435ea080
got extendParams {
headers: {
'Content-Type': 'application/octet-stream',
'x-amz-server-side-encryption': 'AES256'
}
}
upload complete
document version updated
The file da-quiz-storage-result.pdf
is now uploaded as shown in this screenshot:
Top comments (1)
Hi Thanks for this article.
When I tried to upload an image using the same code it doesnt work well. Image file got corrupted. I tried to provide "image/jpg" content type no difference. Do you have any suggestions on uploading image to aws work docs?