Welcome to the Receipt Parser Server documentation!

Receipt parser server is a modular, minimal server to parse receipts.

Upload API

The upload API is used to upload a given receipt to the receipt parser server. The server return the parsed image (if successful) or an ERROR code.

Entrypoint

The entrypoint of the upload api is api/upload.

Parameter

Parameter

Type

Default value

Description

legacy_parser

bool

false

Use the legacy parser

grayscale_image

bool

false

Grayscale the image

gaussian_blur

bool

false

Apply the gaussian blur

rotate_image

bool

false

Rotate image

Please note: The parameter file and access_token is always required. Take a look at the cURL example.

Return Code

Return code

Event

200

request is valid

403

APITOKEN is invalid

415

image is invalid

Curl example

curl -X POST "https://$IP:$PORT/api/upload?access_token=$API_TOKEN -H  "accept: application/json" -H  "Content-Type: multipart/form-data" -F "file=$IMAGE;type=image/jpeg"

with the given parameters:

Parameter

Description

IP

The server ip

PORT

The server port

ACCESS_TOKEN

The server access token

IMAGE

The receipt image

Training API

The training API is used to upload a given receipt to the receipt parser server. The server return the parsed image (if successful) or an ERROR code.

Entrypoint

The entrypoint of the upload api is api/training.

Return Code

Return code

Event

200

request is valid

403

APITOKEN is invalid

415

image is invalid

Parameter

The parameter receipt and access_token is always required. Take a look at the cURL example.

Curl example

curl -X POST "https://$IP:$PORT/api/training?access_token=$ACCESS_TOKEN" -H  "accept: application/json" --data '{"company":"$COMPANY_NAME","date":"$DATE","total":"$RECEIPT_TOTAL"}'   -k

with the given parameters:

Parameter

Description

IP

The server ip

PORT

The server port

ACCESS_TOKEN

The server access token

RECEIPT

Receipt object as json

the receipt object is submitted via the --data flag.

Docker installation guide

The receipt-parser-server image gets built automatically using the Docker Hub. The installation is very simple. First pull the image from Docker hub.

docker pull monolidth/receipt-parser:latest

Manual

You could also run the Docker image without the launcher script e.g.

docker run -i -t -p [YOUR-IP]:8721:8721  monolidth/receipt-parser

Developer installation guide

Clone the repository

First clone the GitHub project.

git clone https://github.com/ReceiptManager/receipt-parser-server

Install project dependencies

Please notice that you install following packages with your favorite package manager:

  • python

  • python-pip

  • libmagickwand-dev

  • tesseract-ocr-all

  • libgl1-mesa-glx

  • libmagickwand-dev

  • qrencode

apt-get install python python-pip libmagickwand-dev tesseract-ocr-all libgl1-mesa-glx libmagickwand-dev

Install python dependencies

Now, install all python dependencies using pip the following

pip install -r requirements.txt

Generate SSL certificates

Now, generate new SSL certificates. First, generate a new file called .private_key and type your favourite password. Please submit at least 8 characters. You can do this using echo like:

echo "favorite_password" > .private_key

The password is used to generate the root certificate. Generate the cert files using

make generate_cert

Now, you should see new certificates located in cert folder which is located in the root directory.

ls cert

The output looks like the following

rootCA.key  rootCA.pem  rootCA.srl  server.crt  server.csr  server.csr.cnf  server.key  v3.ext

Run the server

Now, you are ready to run the Receipt Parser Server.

make serve

Verify installation

If you run the Docker image. The output should like similar to:

...
INFO:     Started server process [16322]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on https://0.0.0.0:8721 (Press CTRL+C to quit)

The API token in printed on the screen. Additionally, you can scan the QR code.

Current API token: XXXXXXXX

Server configuration

Following keys need to be defined.

Config key

Default value

Description

language

de

Define the tesseract language

https

true

Enable HTTPS

receipts_path

“data/txt”

Path where receipts are stored

markets

markets:
store name:
  • likely name 1

  • likely name 2

Markets name

sum_keys

  • summe

  • gesamtbetrag

  • gesamt

  • total

  • sum

  • zwischensumme

  • bar

  • te betalen

Keys to identify sum

ignore_keys

  • rockgeld

  • rusckgeld

  • rückgeld

  • mwst

Keys

sum_format

d+(.s? |,s?|[^a-zA-Zd])d{2}

Regex to identify the receipt total

item_format

([a-zA-Z].+)s(-|)(d,dd)s

Regex to identify the receipt items

date_format

((d{2}.d{2}.d{2,4}) |(d{2,4}/d{2}/d{2})|(d{2}/d{2}/d{4}))

Regex to identify the receipt date

Add new market names

You can add new market entry below the markets key e.g.

Store name:
        - likely name 1
        - likely name 2

Note: that the store name is returned and the likely names are used to scan the receipt for these names. You can consider the receipt parser output in data/txt

In this example, the tesseract output looks like:

EWE Rene Müller 0HGCITY
org-Friedrich-Str.9

}
—
L
L
E
/
D “il s

L „é„ 31 Karlsruhe
| | 50 /
R 0/Z1 / 664 87 954
LL UID Nr. : DE326445229
B ) EUR
—— MIO MIO MATE —
| | )
_„*}„%_ PF£N3t% ?5 1108 4
| 6 Sal ‚19 .EUR X
f““j“i$“ 2 5Stk x \ 0,15 V
E E O
_; Ge R "*M—w—‘—-»»——————*_„::_;:::..:‘::.;:_:‚;r::::.::ä..b-ö-«"
; ‚ Rückgeld BAR EHE 0, 32
%———%———i S£Buer % Netto: steuer B[9£15
| HAL En 2,25 0,43 . 268
/ ““‘};l@samtbetrag 2,25 0,43 a Z
f ı TSE-Signatur: M631mP54IvkcwnNk+H7th3&meTdLüö[w
8 0bo5B71skamunHSsZC1Z4q9ds6BRoDNWg
Sa aUfagzEbyt TDVULU2ecc4rUk5/3211shY

The output looks horrible but you might noticed that the store name is Rewe but the output is: EWE Rene Müller 0HGCITY. Now, add the following market in the config.yml.

REWE:
 - ewe

To identify the market name Rewe but be carefully for duplicate store names. If the store name Rewe exist please only add the likely name ewe.

For docker users

Forward config

If you use the Docker image, you can forward the configuration file config.yml. If the config.yml is in your current directory you can add the following flag

-v "$(pwd):/config" -e RECEIPT_PARSER_CONFIG_DIR="/config"

If the config file is not in your current working directory. Replace $(pwd) with you the configuration folder.

Forward IP

Additionally, you can forward the Docker IP using:

-p $IP:8721:8721

Example config

    # Define the tesseract language
language: deu

# Enable https
https: true

# Where the receipts are stored
# Receipts should be simple text files
receipts_path: "data/txt"

# Market names roughly ordered by likelihood.
# Can contain market locations for fuzzy parsing
markets:
  Colruyt:
     - colruyt
     - Colruyt
  Delhaize:
     - Delhaize
     - delhaize
  Penny:
     - penny
     - p e n n y
     - m a r k t gmbh
  REWE:
     - rewe
  Real:
     - real
  Netto:
     - netto-online
  Kaiser's:
     - kaiser
     - kaiserswerther straße 270
  Aldi:
     - aldi
     - friedrichstr 128—133
  Lidl:
     - lidl
  Edeka:
    - edeka
  Drogerie:
     - drogerie
  Kodi:
     - kodi
  Getraenke:
    - Getraenke Tempel
  Tanken:
     - text
     - esso station
     - aral
     - total tankstelle
     - RK Tankstellen
  Migros:
     - genossenschaft migros

sum_keys:
  - summe
    - gesamtbetrag
    - gesantbetrag
    - gesamt
    - total
    - sum
    - zwischensumme
    - bar
    - te betalen
    - rockgeld
    - rusckgeld
    - rückgeld

    ignore_keys:
      - mwst
      - kg x
      - stkx
      - stk


sum_format: '\d+(\.\s?|,\s?|[^a-zA-Z\d])\d{2}'

item_format: '([a-zA-Z].+)\s(-|)(\d,\d\d)\s'

date_format: '((\d{2}\.\d{2}\.\d{2,4})|(\d{2,4}\/\d{2}\/\d{2})|(\d{2}\/\d{2}\/\d{4}))'

Reverse proxy

To use a reverse proxy, you need to disable HTTPS in the receipt parser config. Change this line

# Enable https
https: true

to

# Disable https
https: false

After, use this example NGINX configuration and replace DOMAIN with your domain and CERT PATH with your SSL certificate path.

server {
        listen 443 ssl http2;
        listen [::]:443 ssl http2;
        server_name [DOMAIN] [DOMAIN];

        # optional
        access_log /var/log/nginx/[DOMAIN].access.log;
        error_log /var/log/nginx/[DOMAIN].log;

        client_max_body_size 0;
        underscores_in_headers on;

        ssl on;
        ssl_certificate [CERT PATH]; # managed by Certbot
        ssl_certificate_key [CERT PATH]; # managed by Certbot

        ssl_stapling on;
        ssl_stapling_verify on;
        include /etc/nginx/snippets/ssl.conf;


        location / {
                proxy_headers_hash_max_size 512;
                proxy_headers_hash_bucket_size 64;
                proxy_set_header Host $host;
                proxy_set_header X-Forwarded-Proto $scheme;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                add_header Strict-Transport-Security "max-age=15768000; includeSubDomains;";
                add_header Front-End-Https on;

                # whatever the IP of your receipt server server is
                proxy_pass http://localhost:8721;
        }
}

server {
        listen 80;
        listen [::]:80;
        server_name [DOMAIN] [DOMAIN];
        access_log /var/log/nginx/[DOMAIN].access.log;
        error_log /var/log/nginx/[DOMAIN].80.error.log;
        root /usr/share/nginx/html/[DOMAIN]/;

        location ^~ /.well-known/acme-challenge/ {
            allow all;
            default_type "text/plain";
        }
        location ^~ /.well-known/pki-validation/ {
            allow all;
            default_type "text/plain";
        }
        location / {
            return 403;
        }
}

Don’t forget to reload your NGINX server, after.