[ Broken pipe errors with SQLAlchemy/Postgres ]
I have a web scraping application that gathers data and stores it in the database. It generally runs OK, but a few times a day there are intermittent errors like this:
OperationalError: (OperationalError) server closed the connection unexpectedly
This probably means the server terminated abnormally before or while processing the
request. could not send startup packet: Broken pipe
I believe this is because the following batch of code (where the error occurs) handles a large amount of data, and I'm not sure how to optimize it.
Edit: I'm assuming this is the cause and included additional information below to that point. If there is some other cause of this error I'd be happy to just know how to fix it :)
Since it's a web scraper, it's polling a web page regularly for data. Each time it polls, it's going to collect between 2000 and 3000 records. 50 to 100 of these records are new, but some attributes of the rest may have new information.
Also worth noting: I am using an ID that's in the web scraped data as the primary key in the database, so the program already knows the ID when it creates the object even if it's not in the database yet.
Here's the code with some added comments:
#objs is a list of Models of all the information it just scraped. This
#finds the oldest timestamp in this set so I can reduce the database query.
#I don't know how many this tends to be.
earliest_scraped = min(o.timestamp for o in objs)
#create the sqlalchemy db session
session = db_setup()
#this queries the objects already in the database, reduced by the timestamp
#above so it is only objects which might overlap with what was scraped
existing = session.query(
Model
).filter(
Model.timestamp > earliest_scraped
)
#this merges the list of objects with the query so it updates any existing
#items and adds any new items
existing.merge_result(objs)
#commit and close
session.commit()
session.close()
Answer 1
After a lot of tinkering, this crash seems to have been happening because of the system running out of memory.
This was a result of a memory leak in the application that had nothing to do with the code above.