DEV Community

Andriy Andruhovski for Aspose.PDF

Posted on • Edited on

Aspose.PDF Cloud: Converting documents using CSharp SDK (part 3)

Full sample can be downloaded from here.

In the previous part, we learned how to PDF docs can be converted to other formats, but only with default settings. Now will try to customize some conversion settings and get more flexible results. The number and purpose of the settings depend on the conversion format. It should also be noted that there are formats that do not require additional settings, for example, PDF to XPS or PDF to PDF/A. Therefore, we start learning from simple conversions to more complex ones.

In this part, we continue to use ASP.NET MVC5 application, but with changes:

  • we should rewrite Index.cshtml and call different actions for different formats;
  • we need new actions for each format in HomeController;

The new version of Index.cshtml is shown below:

Next, we remove unused and add new actions to HomeContoller.

In addition, we create a model for basic options and derive from it models for specific formats.

public class OptionsModel
{
    [Display(Name = "File Name:")]
    public string Name { get; set; }

    [Display(Name = "Folder:")]
    public string Folder { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

Convert files from PDF to XPS, SVG, PDF/A

As stated above, we have a few formats without additional settings - XPS, SVG and PDF/A. The code for XPS and SVG is very simple and doesn't require additional explanations.

public ActionResult ToSvg(string filename)
{
    SaaSposeResponse response;
    var outPath = $"{FolderName}/svg/{filename.Replace("pdf","zip")}";
    try
    {
        response = PdfApi.PutPdfInStorageToSvg(filename, outPath, folder: FolderName);
    }
    catch (ApiException ex)
    {
        return View("ConvertError", ex);
    }

    ViewBag.OutPath = outPath;
    return View("ConvertSuccess", response);
}

public ActionResult ToXps(string filename)
{
    SaaSposeResponse response;
    var outPath = $"{FolderName}/xps/{filename.Replace("pdf", "zip")}";
    try
    {
        response = PdfApi.PutPdfInStorageToXps(filename, outPath, FolderName);
    }
    catch (ApiException ex)
    {
        return View("ConvertError", ex);
    }

    ViewBag.OutPath = outPath;
    return View("ConvertSuccess", response);
}
Enter fullscreen mode Exit fullscreen mode

Aspose.PDF Cloud supports two subformats: PDF/A-1a and PDF/A-1b, so we can clarify these subtypes.

public enum PdfaSubType
{
    [Display(Name = "PDF/A-1a – Level A (accessible) conformance")]
    PDFA1A,
    [Display(Name = "PDF/A-1b – Level B (basic) conformance")]
    PDFA1B
}
public class PdfaOptionsModel : OptionsModel
{
    public PdfaSubType PdfaSubType { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

For this and other types of transformations, we use the standard approach: we show possible settings for the user (by HTTPGet) and perform the conversion (by HTTPPost).

public ActionResult ToPdfa(string filename)
{
    var pdfaOptions = new PdfaOptionsModel
    {
        Name = filename,
        Folder = FolderName,
        PdfaSubType = PdfaSubType.PDFA1A
    };
    return View(pdfaOptions);
}

[HttpPost]
public ActionResult ToPdfa(PdfaOptionsModel options)
{            
    SaaSposeResponse response;
    var outPath = $"{options.Folder}/pdfa/{options.Name}";
    try
    {
        response = PdfApi.PutPdfInStorageToPdfA(
            options.Name,
            outPath,
            options.PdfaSubType.ToString(),                    
            options.Folder);
    }
    catch (ApiException ex)
    {
        return View("ConvertError", ex);
    }

    ViewBag.OutPath = outPath;
    return View("ConvertSuccess", response);
}
Enter fullscreen mode Exit fullscreen mode

Convert files from PDF to DOC

Let's consider the more complex case. For this conversion we have the following settings:

  • Format - an output format: Word 2003 (doc) or Word 2007 (docx)
  • Mode - a recognition mode: TextBox or Flow. The TextBox mode applies when we need layout much closer to the original doc. We can use the Flow mode when the output document needs further editing. Paragraphs and text lines in the flow mode allow easy modification of text, but unsupported formatting objects will look worse than in the Textbox mode.
  • AddReturnToLineEnd - this setting allow to generate lines (true) or paragraphs (false)
  • MaxDistanceBetweenTextLines - max distance between text lines. This option allows customizing paragraph creation. The larger value allows creating fewer paragraphs and vice versa.
  • RelativeHorizontalProximity - this setting defines the width of space between text elements(letters, syllables) that must be treated as the distance between words during recognition of words in source PDF. It used only in cases when source PDF contains specific rarely used fonts for which optimal value cannot be calculated from the font.
  • RecoginzeBullets - a boolean switch for list markers detection
  • ImageResolutionX, ImageResolutionY - image resolution settings;

Here is DocxOptionsModel class:

public class DocxOptionsModel : OptionsModel
{
    [Display(Name = "File Format:")]
    [RegularExpression("doc|docx")]
    public string Format { get; set; }

    [Display(Name = "Recognition Mode:")]
    public RecognitionMode Mode { get; set; }

    [Display(Name = "Add return to line end:")]
    public bool AddReturnToLineEnd { get; set; }

    [Display(Name = "Max Distance between text lines:")]
    public int? MaxDistanceBetweenTextLines { get; set; }

    [Display(Name = "Relative horizontal proximity:")]
    public double? RelativeHorizontalProximity { get; set; }

    [Display(Name = "Recognize bullets:")]
    public bool RecognizeBullets { get; set; }

    [Display(Name = "Image Resolution X:")]
    public int? ImageResolutionX { get; set; }

    [Display(Name = "Image Resolution Y:")]
    public int? ImageResolutionY { get; set; }
}
Enter fullscreen mode Exit fullscreen mode

Following snippet shows the code of ToWord action:

public ActionResult ToWord(string filename)
{
var docOptions = new DocxOptionsModel
{
Name = filename,
Folder = FolderName,
AddReturnToLineEnd = false,
Format = "docx",
Mode = RecognitionMode.Flow
};
return View(docOptions);
}
[HttpPost]
public ActionResult ToWord(DocxOptionsModel options)
{
SaaSposeResponse response;
try
{
response = PdfApi.PutPdfInStorageToDoc(options.Name,
$"{options.Folder}/{options.Format}/{options.Name.Replace("pdf", "docx")}",
options.AddReturnToLineEnd,
options.Format,
options.ImageResolutionX,
options.ImageResolutionY,
options.MaxDistanceBetweenTextLines,
options.Mode.ToString(),
options.RecognizeBullets,
options.RelativeHorizontalProximity,
options.Folder
);
}
catch (ApiException ex)
{
return View("ConvertError", ex);
}
return View("ConvertSuccess", response);
Enter fullscreen mode Exit fullscreen mode

}

Enter fullscreen mode Exit fullscreen mode




Convert files from PDF to TIFF

The settings for this type conversions we can divide into 3 subsections:general, image and page layout settings.

The general settings include:

  • PageIndex - start page;
  • PageCount - number of pages to be converted;
  • SkipBlankPages - skip blank pages;

The next section allows customize image quality:

  • Brightness - a fractional number from 0.0 to 1.0,
  • Compression - a compression algoritm (RLE, CCITT3, CCITT4 or LZW),
  • TiffColorDepth - a color depth value (1, 4 or 8 bit per pixel);
  • ImageResolutionX, ImageResolutionY - image resolution settings;
  • Width, Height - dimensions for output image.

And the last section allows to make additional design for rendered page:
LeftMargin, RightMargin, TopMargin, BottomMargin and Orientation

The code of action ToTIFF is mostly the same as for DOC.

Conclusion

In most cases, Aspose.PDF Cloud performs the transformation qualitatively without additional settings but also allows you to configure the conversion. In this article, we looked at the most popular formats, but the Aspose.PDF Cloud API also makes it possible to convert PDF to EPUB, XLS, XML, to some other formats.

Top comments (0)