News

12th December 2015 by aegeuana_sjp_admin

How to export data as a CSV – Django model

I recently faced a new challenge where I had to generate large CSV files from the backend. The whole process can take a while and if done improperly can cause the server to stop responding or even crash in worst-case scenarios.
Let’s assume we have a database full of kittens, and the following queryset to retrieve them:
[code language=”python”]
kittens_qs = Kitten.objects.all() # Assume >50,000 objects inside
[/code]
In order to export a Queryset as a CSV we need to know a little bit more about the given model, such as the fields and the relations it holds, to find out more about the model and the dependencies, we can use the “_meta” property:
[code language=”python”]
model = kittens_qs.model
model_fields = model._meta.fields + model._meta.many_to_many
[/code]
Next let’s define a utility function which will generate a row for each object passed from the queryset. It is very important to use unicode as much as you can, as this will ensure we can export data which is outside of the ASCII capabilities, hence the encoding in “utf-8″. Take a closer look at how we are handling the ManyToMany field, those entries can have one or more results,  so we will be showing them as a comma separated list:
[code language=”python”]
def get_row(obj):
row = []
for field in model_fields:
if type(field) == models.ForeignKey:
val = getattr(obj, field.name)
if val:
val = val.__unicode__()
elif type(field) == models.ManyToManyField:
val = u’, ‘.join([item.__unicode__() for item in getattr(obj, field.name).all()])
elif field.choices:
val = getattr(obj, ‘get_%s_display’%field.name)()
else:
val = getattr(obj, field.name)
row.append(unicode(val).encode("utf-8"))
return row
[/code]
When generating a huge CSV file you DON’T want to generate the file and only then send it to the client, what you should do is actually “stream” the data chunk by chunk to the client once you have enough data generated. This can be accomplished by using the “StreamingHttpResponse” rather than the normal HttpResponse/render. You can read more about this in the Django documenation.
[code language=”python”]
class Echo(object):
"""An object that implements just the write method of the file-like interface."""
def write(self, value):
"""Write the value by returning it, instead of storing in a buffer."""
return value
class KittensExportToCsvView(View):
def get(self, request, *args, **kwargs):
pseudo_buffer = Echo()
writer = csv.writer(pseudo_buffer)
response = StreamingHttpResponse((writer.writerow(get_row(obj)) for obj in kittens_qs), content_type="text/csv")
response[‘Content-Disposition’] = ‘attachment; filename="somefilename.csv"’
return response
[/code]
If you run the previous code you will see that it works quite well, but we are missing a very important part in the CSV, the headers! Let’s add this functionality with another helper function named “stream”. The final code will look like this:
[code language=”python”]
from django.db import models
from django.http import StreamingHttpResponse
from django.views.generic import View
import csv
class Echo(object):
"""An object that implements just the write method of the file-like interface.
"""
def write(self, value):
"""Write the value by returning it, instead of storing in a buffer."""
return value
class ContactLogExportCsvView(View):
def get(self, request, *args, **kwargs):
kittens_qs = Kitten.objects.all() # Assume 50,000 objects inside
model = kittens_qs.model
model_fields = model._meta.fields + model._meta.many_to_many
headers = [field.name for field in model_fields] # Create CSV headers
def get_row(obj):
row = []
for field in model_fields:
if type(field) == models.ForeignKey:
val = getattr(obj, field.name)
if val:
val = val.__unicode__()
elif type(field) == models.ManyToManyField:
val = u’, ‘.join([item.__unicode__() for item in getattr(obj, field.name).all()])
elif field.choices:
val = getattr(obj, ‘get_%s_display’%field.name)()
else:
val = getattr(obj, field.name)
row.append(unicode(val).encode("utf-8"))
return row
def stream(headers, data): # Helper function to inject headers
if headers:
yield headers
for obj in data:
yield get_row(obj)
pseudo_buffer = Echo()
writer = csv.writer(pseudo_buffer)
response = StreamingHttpResponse(
(writer.writerow(row) for row in stream(headers, kittens_qs)),
content_type="text/csv")
response[‘Content-Disposition’] = ‘attachment; filename="all_kittens.csv"’
return response
[/code]
Hope this helps and happy Djangoing!

Leave a Reply

Your email address will not be published. Required fields are marked *