Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Working with Various File Formats in Python

Tech 1

CSV Files

Writing CSV Files

import csv
characters = [
    ['Doctor', 'No'],
    ['Rosa', 'Klebb'],
    ['Mister', 'Big'],
    ['Auric', 'Goldfinger'],
    ['Ernst', 'Blofeld']
]

with open('characters.csv', 'w', newline='') as output_file:
    writer = csv.writer(output_file)
    writer.writerows(characters)

Reading CSV Files

import csv

with open('characters.csv', 'r') as input_file:
    reader = csv.reader(input_file)
    data = [row for row in reader]

print(data)

Using DictWriter for Named Columns

import csv

villains = [
    {'first': 'Doctor', 'last': 'No'},
    {'first': 'Rosa', 'last': 'Klebb'},
    {'first': 'Mister', 'last': 'Big'},
    {'first': 'Auric', 'last': 'Goldfinger'},
    {'first': 'Ernst', 'last': 'Blofeld'}
]

with open('villains.csv', 'w', newline='') as output_file:
    writer = csv.DictWriter(output_file, fieldnames=['first', 'last'])
    writer.writeheader()
    writer.writerows(villains)

Using DictReader for Named Columns

import csv

with open('villains.csv', 'r') as input_file:
    reader = csv.DictReader(input_file, fieldnames=['first', 'last'])
    villains = [row for row in reader]

print(villains)

XML Files

sample.xml:

<?xml version="1.0"?>
<menu>
    <breakfast hours="7-11">
        <item price="$6.00">breakfast burritos</item>
        <item price="$4.00">pancakes</item>
    </breakfast>
    <lunch hours="11-3">
        <item price="$5.00">hamburger</item>
    </lunch>
    <dinner hours="3-10">
        <item price="$8.00">spaghetti</item>
    </dinner>
</menu>

Parsing XML with ElementTree

import xml.etree.ElementTree as ET

tree = ET.parse('sample.xml')
root = tree.getroot()
print(root.tag)

for child in root:
    print(f'tag: {child.tag}, attributes: {child.attrib}')
    for grandchild in child:
        print(f'\ttag: {grandchild.tag}, attributes: {grandchild.attrib}')

Alternative XML Libraries

xml.dom: Loads the entire XML document into memory for processing.

xml.sax: Parses XML dynamically without requiring the entire document to be loaded at once.


HTML Files

Web frameworks like Flask and Django, combined with templating engines such as Jinja2, are commonly used to render HTML content for browser display.


JSON Files

Converting Python Dictionary to JSON String

menu = {
    "breakfast": {
        "hours": "7-11",
        "items": {
            "breakfast burritos": "$6.00",
            "pancakes": "$4.00"
        }
    },
    "lunch": {
        "hours": "11-3",
        "items": {
            "hamburger": "$5.00"
        }
    },
    "dinner": {
        "hours": "3-10",
        "items": {
            "spaghetti": "$8.00"
        }
    }
}

import json
menu_json = json.dumps(menu, indent=2)
print(menu_json)

Converting JSON String to Python Dictionary

import json

json_string = '{"breakfast": {"hours": "7-11", "items": {"breakfast burritos": "$6.00", "pancakes": "$4.00"}}, "lunch": {"hours": "11-3", "items": {"hamburger": "$5.00"}}, "dinner": {"hours": "3-10", "items": {"spaghetti": "$8.00"}}}'

menu_data = json.loads(json_string)
print(menu_data)

Handling datetime Objects

import datetime
import json

now = datetime.datetime.utcnow()
json_output = json.dumps(now, default=str)
print(json_output)

YAML Files

The Python standard library does not include YAML support, so the third-party pyyaml library must be installed:

$ pip install pyyaml

sample.yml:

name:
    first: James
    last: McIntyre
dates:
    birth: 1828-05-25
    death: 1906-03-31
details:
    bearded: true
    themes: [cheese, Canada]
poems:
    - title: 'Motto'
      text: |
          Politeness, perseverance and pluck,
          To their possessor will bring good luck
    - title: 'Canadian Charms'
      text: |
          Here industry is not in vain,
          For we have bounteous crops of grain

Loading YAML Files

import yaml

with open('sample.yaml', 'r') as input_file:
    content = input_file.read()
    data = yaml.safe_load(content)
    print(data)

Tablib

Tablib handles tabular data in CSV, JSON, YAML formats, aswell as Microsoft Excel and Pandas DataFrame.

$ pip install tablib

Pandas

Pandas is a powerful library for structured data manipulation, ideal for handling various data processing tasks.

Capabilities:

  • Read and write multiple text and binary file formats
  • Support for CSV, fixed-width text, Excel, JSON, HTML, SQL, HDF5
  • Data operations: grouping, splitting, merging, indexing, slicing, sorting, selection
  • Data type conversion
  • Reshaping and reshaping operations
  • Handling missing data
  • Random value generation
  • Time series management

The read_* functions return DataFrame objects, which represent two-dimensional data (rows and columns), similar to spreadsheets or database tables. One-dimensional data is represented using Series objects.

Reading CSV with Pandas

import pandas

data = pandas.read_csv('villains.csv')
print(data)

The resulting DataFrame is well-suited for nmuerical computations with NumPy and machine learning data preparation.


INI/CFG Configuration Files

The standard library configparser module handles INI file processing.

settings.cfg:

[english]
greeting = Hello

[french]
greeting = Bonjour

[files]
home = /usr/local
bin = %(home)s/bin

Reading INI Files

import configparser

config = configparser.ConfigParser()
config.read('settings.cfg')
print(config)

Microsoft Excel Files

Popular third-party libraries for Excel manipulation include xlrd, tablib, python-excel, xlwt, and xlutils.

Reading Excel with Pandas

import pandas as pd

data = pd.read_excel(r"/home/data.xlsx", sheet_name="Sheet1", sheet_index=0, header=0)

The returned DataFrame contains the Excel data.

Writing Excel with Pandas

import pandas as pd

df = pd.DataFrame(data)
df.to_excel("/home/output.xlsx", sheet_name="Sheet1", na_rep="N/A", header=True)
Tags: Pythoncsv

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.