Upload local files#

import os
import base64
import pandas as pd
from utils.byte_genie import ByteGenie

init byte-genie#

init byte-genie in async mode (tasks will run in the background)#

bg_async = ByteGenie(
    secrets_file='secrets.json',
    task_mode='async',
    verbose=1,
)

init byte-genie in sync mode (tasks will run in the foreground)#

bg_sync = ByteGenie(
    secrets_file='secrets.json',
    task_mode='sync',
    verbose=1,
)

'async' mode is suitable for long-running tasks, so that api calls can be run in the background,
while the rest of the code can continue doing other things.
'sync' mode is suitable for short-lived tasks, where we need some output, before we can move on to anything else.

Specify the directory containing files to upload#

directory = r'/tmp/sample-files'

Get all file names in the directory#

file_names = os.listdir(directory)

Generate the fileContents and names arrays

contents = []
filenames = []
for file_name in file_names:
    file_path = os.path.join(directory, file_name)
    with open(file_path, 'rb') as file:
        content = file.read()
        data = base64.b64encode(content)
        data_with_prefix = "data:;base64," + data.decode('utf-8')
        contents.append(data_with_prefix)
        filenames.append(file_name)

upload files#

resp = bg_sync.upload_data(
    contents=contents,
    filenames=filenames,
    username=bg_sync.read_username()
)

check resp#

resp.get_status()

check your uploads#

make api call#

resp = bg_sync.show_uploads(
    username=bg_sync.read_username()
)

get response data#

df_uploads = pd.DataFrame(resp.get_data())

df_uploads.to_dict('records')
[
{'doc_name': 'userid_demo-genie_uploadfilename_demo-portfoliocsv', 'file_type': '.csv', 'filename': 'demo-portfoliocsv', 'username': 'demo-genie'},
{'doc_name': 'userid_demo-genie_uploadfilename_demo_taxonomycsv', 'file_type': '.csv', 'filename': 'demo_taxonomycsv', 'username': 'demo-genie'},
{'doc_name': 'userid_demo-genie_uploadfilename_deploying-a-react-app-using-aws-s3pdf', 'file_type': '.pdf', 'filename': 'deploying-a-react-app-using-aws-s3pdf', 'username': 'demo-genie'},
{'doc_name': 'userid_demo-genie_uploadfilename_mandarin-oriental-sustainability-report-2020-2pdf', 'file_type': '.pdf', 'filename': 'mandarin-oriental-sustainability-report-2020-2pdf', 'username': 'demo-genie'},
{'doc_name': 'userid_demo-genie_uploadfilename_oil-and-gas_taxonomycsv', 'file_type': '.csv', 'filename': 'oil-and-gas_taxonomycsv', 'username': 'demo-genie'},
{'doc_name': 'userid_demo-genie_uploadfilename_real-estate_taxonomycsv', 'file_type': '.csv', 'filename': 'real-estate_taxonomycsv', 'username': 'demo-genie'},
{'doc_name': 'userid_demo-genie_uploadfilename_shell-tax-contribution-report-2020pdf', 'file_type': '.pdf', 'filename': 'shell-tax-contribution-report-2020pdf', 'username': 'demo-genie'},
{'doc_name': 'userid_demo-genie_uploadfilename_vodafone-tax-report-19-20pdf', 'file_type': '.pdf', 'filename': 'vodafone-tax-report-19-20pdf', 'username': 'demo-genie'},
{'doc_name': 'userid_demo-genie_uploadfilename_with-highlights-comments-barclays-country-snapshot-2021pdf', 'file_type': '.pdf', 'filename': 'with-highlights-comments-barclays-country-snapshot-2021pdf', 'username': 'demo-genie'},
{'doc_name': 'userid_demo-genie_uploadfilename_capitaland-recent-documents', 'file_type': '.csv', 'filename': 'capitaland-recent-documents', 'username': 'demo-genie'}
]

Next steps#


* Now that the files are uploaded, we can move on to processing these documents;
* See document_processing/short_pdf_processing.py (.ipynb) and company_research/document_processing.py (.ipynb) for examples of document processing.