|
Convert Documents
JobMaster
automatically convert documents
JobMaster is an automatic conversion
scheduler, or an automatic converter that convert any file to any
format. It supports Internet source and can directly convert a file
from internet to FTP or website or internet web server or converts
to directory or network drive. You can convert any document to ppt,
mht, pps, txt, gif, dif, jpg, bmp, csv, dbf, ans, htm, html, wps and
many more formats. Conversion-Jobs can be scheduled for automatic
conversion. It is doc-converter, xls-converter, ppt-converter, htm-converter,
web-converter and text-converter in one program and support auto-publishing
to or from any webserver or network drive or directory. you can also
create pdf files and transform files or documents into pdf format.
XML
has become a buzzword that's so over-used that it's difficult to
understand when it might and when it might not be appropriate. In
general, the main reason for XML's popularity is that it provides
an underlying technology that gives "portability" of information
across platforms, applications, and organizations.
Much
of the emphasis on XML has been on sending “structured” data in
between companies. For example, if company A wants to send
a purchase order to company B - they both need to agree on a formatting
convention. XML provides the language of both the description
of that formatting convention, and provides a convenient way to
actually send the purchase order data.
While
there are significant benefits to having inter-operable structured
data, we believe that a use of XML that is just as important is
for the creation, storage, indexing, and publishing of documents
- what is often referred to as “unstructured content”. Unstructured
(and semi-structured) content today in corporations is kept in a
number of locations and typically makes up about 80% of a company's
overall data/information. Unlike structured data, which typically
lives in databases and is well-ordered, unstructured content lives
on individual file servers (as Microsoft Word or PDF files), in
groupware databases (like Lotus Notes), on web servers (as HTML
documents) or in other legacy systems.
This
article is about the reasons why XML is particularly well suited
for this task - the creation, storage, indexing, and publishing
of documents, and why it is cost effective to come up with a strategy
for converting a company's key unstructured assets into XML.
Why
Create/Convert Documents to XML?
Allows
Intelligent Queries of Content.
One
of the main reasons to get documents out of their existing formats
is to be able to search / index those documents in a meaningful
way.
Say,
for example, that your organization has one or more directories
full of resumes. Many resumes come in email or in Microsoft
Word (.DOC) formats. This is not a particularly useful format
for searching or indexing. Suppose you wanted to do a query
to find “all people who worked for Lotus from 1998-2000. It
is difficult, if not impossible to find this information from a
group of files sitting on a file server. One approach has
been to full text index the documents. This might help you
find all people with the word Lotus in their resume- but there is
still no intelligence around the indexing. If the documents were
broken into meaningful XML formats (such as HR-XML, etc.), then
it would be much easier to do this type of querying as you would
have turned your documents into a virtual database.
Similarly,
if you were a mutual fund company you might have a collection of
investment research gathered from a number of different sources,
sitting on file servers as PDF files. PDF files are particularly
difficult to fool around with because they aren't meant to be edited
- only read. However, you might want to query this body of
research to find all of those research reports which upgraded a
stock from a Buy to a Strong Buy. Again, if you were to convert
these into a meaningful XML format (such as RIXML), then you
would be able to do this type of querying against the source data
because it would be intelligently categorized.
This
“intelligent” indexing can happen even if the documents say as individual
XML files on the file server, or it could happen by moving the XML
into an XML database store.
convert,documents,files,Excel,Word,PowerPoint,schedule,automatic,conversion,task,batch,job,auto-convert,Konvertierung,Umwandlung,konvertieren,dokumente,umwandeln,to,zu,doc,xls,ppt,txt,rtf,html,htm,dot,ans,mht,wps,xlt,csv,dif,dbf,wks,slk,pot,bmp,gif,jpg,png
Perhaps
the most important reason to convert documents to XML is when those
documents need to be published. Corporations today have more
than one channel of information to their customers. This includes
printed documents and manuals, electronic communication that
is emailed (brochures, email), web sites (which are in HTML format).
Most
companies don't have a coherent strategy for external publishing
- it is done in different ways throughout the company. One
group might use Word Documents which are printed directly. Another
might use a content management system for the web site. Yet
another might convert to PDF for manuals.
The
key with XML, as shown in Figure 1, is that it can be transformed
into the appropriate publishing format - Word (DOC/RTF), HTML (for
web sites), PDF (for printed documentation), DocBook (An XML standard
for storage and sharing of content), WML (for wireless devices),
and into any other format which becomes available in the future.
This saves time and money because effort doesn't have to be
repeated. With a push of the button the XML can be transformed
(using XSLT, or XML stylesheets) for transformation.
convert, doc to pdf, excel,
word, schedule, automatic, conversion, task, job, doc to HTML,
doc to dot, HTML to pdf, xls to pdf, xls to XML, pdf to HTML,
doc to template,
Another
key benefit that comes from having content stored in XML format
is that it can be “custom-assembled”. This means that
customer A, who might be a customer that is only interested in research
about two companies in the semi-conductor industry and 3 companies
in software, can bet a research report that only covers those companies
- rather than having to go through dozens of companies in each industry.
Because the content can be assembled on the fly, as shown
in Figure 2.
Figure
2: Investment research transformed into XML and custom assembled
for each client
Saves
Time and Money by Streamlining the Authoring Process.
Research
has shown that during the authoring process as much as 50% of the
time that is spent is on formatting. By having templates for
documents that are similar (which can be done using XSLT) and using
an XML authoring tool , the author only has to worry about the content.
For example, most press releases look the same, as do most
product brochures. Most proposals should look the same, but
often don't. Using XML as the mechanism for authoring and
storing content can enforce consistency in standards and allow users
not to have to worry about the eventual formatting, which will be
handled by the templates and by validation files (DTD's or XML schemas).
Encourages
Reuse of Documents and Fragments.
XML
allows for the storage of “document fragments”, which encourages
reuse of existing content. This means that you will be able
to find document fragments and include them in new documents much
more easily.
Distributed
Authoring and Security.
XML
is ideal for a content management system where dozens of people
need to contribute content. Existing authoring tools, such
as Word and other desktop editors are not ideal for this type of
environment. Because each section (or page, within a web site)
may have one or more people who are allowed to edit it, storage
of pages in XML format allows each to be treated as a separate object,
with separate permissions and authors can simultaneously edit different
pages within the overall document.
Another
key benefit is that if end users are only allowed to view certain
parts of documents - by assembling the final document based on the
preferences of the end user is a better way to distribute documents.
Again, if all the sections are in XML, this type of end user
security becomes much easier to enforce. If all the sections
are stored in Word or PDF files, this becomes a much more difficult
task.
Syndication
of Content - Web Services.
XML
is the language of Web Services and of Syndication of Content. This
means that you can distribute your content (research reports, press
releases, product catalogs, brochures) to other web sites or companies
who may need to include your information on their site, but with
some changes. Syndication of Content is often used for aggregation
of content from different sources (for example, an industry site
might want to publish a press release that your company created).
If the information is provided in HTML, this is problematic
because each source site will have different formatting. However,
if each source company provides XML (even if they provide slightly
differing XML), the aggregation site can easily.
Web
services is an emerging trend where one server makes a request for
content from another server. This could be any type of content,
or could be more programmatic structured data. By converting
your documents into XML, you open up Web Services for documents,
which allows for better information sharing with customers, business
partners, and suppliers. For more on Web Services, see the
upcoming white paper, Web Services for Documents.
Portability
of content.
Many
web content management systems provide distributed authoring,
re-use of fragments, etc., but do not store their content in an
XML format. This makes it very difficult to move off of that
particular content management system. If, however, the data
is in XML (or can be easily exported into XML), then the end user
has the flexibility to migrate the content easily into another system
that supports XML rather than being tied to a particular vendor.
In
addition to all of these specific business benefits, XML is particularly
well suited technically for the storage of unstructured and semi-structured
content. This is because most docouments have a tree-like
structure (title, heading 1, section 1, paragraph 1, etc.) , and
XML has a tree-like structures. There is a lot of content
that has been published in HTML format over the last five years
(millions of pages) - and XML is a perfect format for distributing
this information between sites. That is because both HTML and XML
are both based on SGML, which is a more generic language for defining
documents.
convert, doc to pdf, excel,
word, schedule, automatic, conversion, task, job, doc to HTML,
doc to dot, HTML to pdf, xls to pdf, xls to XML, pdf to HTML,
doc to template,
Corporations
have a tremendous amount of information assets that exist today
as individual files in directories. This includes memos, reports,
proposals, brochures, white papers, documentation, research, intranet
sites, public web pages, etc. Because of its unstructured
nature, it has been difficult to leverage this information and to
reduce both the cost and complexity of managing this information.
XML is a powerful tool that simplifies the creation,
storage, indexing, categorization, and publishing of this content
in complex environments. By converting existing
documents and new documents into XML, organizations can achieve
significant savings of both time and money.
|