BLOG

Extracting Attachments from EML Files

Learn how to extract attachments from EML files using Fetch API and streams. The same method can be extended to extract other parts of EML files.


When working with EML files, we may need to extract attachments from the file to do something with them. This article aims to guide the reader through the process of reading attachments and similar information from an EML file using the Fetch API and streams.

Understanding streams

Streams are an efficient way of handling data that is too large to be held in memory all at once. Instead of reading the entire data into memory, streams allow us to process the data piece by piece, or chunk by chunk. This makes streams very useful when working with large files or network requests. In JavaScript, streams are implemented as a set of interfaces and classes defined in the Streams API. The Streams API defines three types of streams: readable, writable, and transform streams. Some examples using the Streams API

The Streams API also provides built-in utilities for working with streams, such as the pipe() method, which allows us to easily pipe data from a readable stream to a writable stream, and the concat() method, which allows us to concatenate the contents of multiple streams into a single stream. Javascript also allows us to cancel streams if needed.

When working with the Fetch API, we can use streams to handle the response from a fetch request. Instead of loading the entire response into memory, we can create a ReadableStream object from the response body using the body property of the response, and then read the response data piece by piece using a ReadableStreamDefaultReader object.

Understanding eml files

An EML file represents an email message as plaintext with headers and body parts. The headers contain metadata such as the sender and recipient email addresses, subject, and date of the email. The body parts contain the actual content of the email, which can include text, HTML, and attachments.

To identify different parts of an email in an EML file, we can use regular expressions, which are a pattern-matching language used to search for and manipulate text. Regular expressions can be used to match specific patterns within the text, such as email addresses, subject lines, and attachment names.

For example, we can use the following regular expression to match the sender email address in an EML file: /^From: (.+)$/m. The subject of an email can be found with /^Subject: (.+)$/m

Converting the EML file to a Blob object

The first step in reading attachments from an EML file is to convert the file to a Blob object. We can do this using the Fetch API's Response.blob() method.

fetch('https://my-domain.really/email.eml') .then(response => response.blob()) .then(blob => { // Do something with the Blob object here }) .catch(console.error) (

We can then use the Blob object to read the contents of the EML file.

Reading the contents from the blob

We will first convert the blob to a stream and read each chunk of data until we find a chunk that represents an attachment, and extract the attachment data if it does.

.then(blob => { const reader = blob.stream().getReader(); const decoder = new TextDecoder('utf-8'); let attachmentData = '';
const read = () => { return reader.read().then(({ done, value }) => { if (done) { return attachmentData; }
const chunk = decoder.decode(value); // Check if the chunk represents an attachment if (/Content-Disposition: attachment/.test(test)) { // Extract the attachment data from the chunk and add it to the result attachmentData += chunk.split('\r\n\r\n')[1]; }
return read(); }); };
return read();}).then(attachmentData => { // Do something with the attachment data here})

In this example, we're using the TextDecoder API to decode each chunk of data as a string, and we're using a recursive function called read() to read each chunk of data from the ReadableStream. Inside the read() function, we're checking if each chunk of data represents an attachment by looking for the Content-Disposition: attachment header. If the chunk represents an attachment, we extract the attachment data from the chunk and add it to the attachmentData variable.

Finally, we return the attachmentData variable from the read function, and then do something with the attachment data in the then() block.

The curious case of attachment size being larger than chunk size

When reading an attachment from an EML file using streams, it's possible that the attachment size may be larger than the chunk size that we're using to read the stream. In this case, we would need to read the attachment in multiple chunks and concatenate the chunks together to get the full attachment data.

To determine when the attachment has ended, one can look for the end of the attachment part, which is usually indicated by a boundary string that separates the attachment part from the rest of the email message. The boundary string is specified in the Content-Type header of the email message.

const boundaryRegex = /boundary="(.+?)"/.exec(response.headers.get('content-type'));const boundary = `--${boundaryRegex[1]]}`;

Hope this was helpful! Feel free to share your insights on the blog from the Contact Page, or by directly dropping an email tosaranshgupta1995@gmail.com. If you have anything that would help improve this setup, I'd love to hear and discuss about it.