Data Extraction

At data extractions, we can extract data from a number of disparate sources, standardize it, normalize it and compile it all into a single destination to meet your business needs.

Data formats we can extract include:

Structured Data

  • EDI – Electronic Data Interchange is transmission data between typically between organizations.
  • DBF – dBase’s underlying file format is the dbf file. Usually used with FoxPro and other formats.
  • ASCII – The most common type of data that will be ‘viewable’ in most text editors.
  • EBCDIC – 8-bit character encoded data (code page) used on IBM mainframe operating systems.
  • Packed Data – EBCDIC packed decimal data allows multiple characters per  byte (aka. comp-3).
  • SQL Server – Microsoft’s database management platform.
  • Oracle – Oracle’s database management platform.
  • SAP – Using ABAP custom development.
  • Report Image – Scanned or electronic formats.  Typically Invoice’s, Purchase orders, BOL’s.
  • PDF’s – Adobe PDF’s converted into a number of formats.
  • Excel- Spreadsheets, pivot tables and any other excel/workbook formatted data.
  • CSV – Comma separated files.
  • Delimited- Data with columns and rows separated by special characters.
  • Qualified- Delimited data with qualifiers for certain fields.
  • Fixed Width/Length – Data with columns separated by spaces and rows of equal length.
  • Variable Width/Length – Data with with rows of unequal length.
  • Binary – Lowest level of data containing 0′s and 1′s.

HTML (web page data)

The web has almost limitless amounts of data.  Why spend $300 plus on a tool that is not custom for your exact needs?  Manual content extraction from web sites can be very time consuming and is often not feasible at all if you want to extract complete content structures such as product catalogs, classifieds, financial web sites or any other web site that contains information you may be interested in.  Let us extract your data for you and give you exactly what you need.

Data extractions was able to get currency conversion rates from a website on a daily basis so that we always have up-to-date currency conversion rates in your database.

We can quickly and easily deliver content from targeted web sites automatically and it’ll be structured in the format you want be it databases, spreadsheets, CSV files or as XML.

Unstructured data

Unstructured data is data that either does not have a predefined data model and/or does not fit well into relational tables can be painstaking to extract and normalize/format.   Typically we find that unstructured data is text-heavy, but may contain data such as dates, numbers, and facts as well.   Common techniques for structuring text usually involve manual tagging with metadata.  Let this be our problem and not yours.

We showed them the format of the source data and they gave us exactly what we needed.  Data Extractions saved us time and money.

