Estruturando seu projeto¶

Por “estrutura”, queremos dizer as decisões que você toma sobre a forma como o seu projeto atende melhor ao seu objetivo. Precisamos considerar como melhor aproveitar os recursos do Python para criar um código limpo e efetivo. Em termos práticos, “estrutura” significa fazer um código limpo cuja lógica e dependências são claras, bem como a forma como os arquivos e as pastas estão organizados no sistema de arquivos.
Quais funções devem entrar em quais módulos? Como os dados fluem pelo projeto? Quais recursos e funções podem ser agrupados e isolados? Ao responder perguntas como essas, você pode começar a planejar, em um sentido amplo, como será seu produto final.
In this section, we take a closer look at Python’s modules and importsystems as they are the central elements to enforcing structure in yourproject. We then discuss various perspectives on how to build code whichcan be extended and tested reliably.
Estrutura do Repositório¶
É importante.¶
Just as Code Style, API Design, and Automation are essential for ahealthy development cycle. Repository structure is a crucial part ofyour project’sarchitecture.
Quando um usuário ou colaborador potencial chega à página do seu repositório, ele vê algumas coisas:
- Nome do Projeto
- Descrição do Projeto
- Bunch O’ Files
Only when they scroll below the fold will the user see your project’sREADME.
If your repo is a massive dump of files or a nested mess of directories,they might look elsewhere before even reading your beautifuldocumentation.
Dress for the job you want, not the job you have.
Of course, first impressions aren’t everything. You and your colleagueswill spend countless hours working with this repository, eventuallybecoming intimately familiar with every nook and cranny. The layout is important.
Repositório de Amostra¶
tl;dr: This is whatKenneth Reitz recommended in 2013.
This repository isavailable onGitHub.
README.rstLICENSEsetup.pyrequirements.txtsample/__init__.pysample/core.pysample/helpers.pydocs/conf.pydocs/index.rsttests/test_basic.pytests/test_advanced.py
Vamos entrar em alguns detalhes.
The Actual Module¶
Localização | ./sample/ or./sample.py |
Propósito | The code of interest |
Your module package is the core focus of the repository. It should notbe tucked away:
./sample/
If your module consists of only a single file, you can place it directlyin the root of your repository:
./sample.py
Your library does not belong in an ambiguous src or python subdirectory.
Licença¶
Localização | ./LICENSE |
Propósito | Lawyering up. |
This is arguably the most important part of your repository, aside fromthe source code itself. The full license text and copyright claimsshould exist in this file.
If you aren’t sure which license you should use for your project, checkoutchoosealicense.com.
Of course, you are also free to publish code without a license, but thiswould prevent many people from potentially using or contributing to your code.
Setup.py¶
Localização | ./setup.py |
Propósito | Package and distribution management. |
If your module package is at the root of your repository, this shouldobviously be at the root as well.
Arquivo de Requisitos¶
Localização | ./requirements.txt |
Propósito | Dependências de desenvolvimento. |
Apip requirementsfileshould be placed at the root of the repository. It should specify thedependencies required to contribute to the project: testing, building,and generating documentation.
If your project has no development dependencies, or if you prefersetting up a development environment viasetup.py
, this file may beunnecessary.
Documentação¶
Localização | ./docs/ |
Propósito | Package reference documentation. |
There is little reason for this to exist elsewhere.
Test Suite¶
For advice on writing your tests, seeTestando seu código.
Localização | ./test_sample.py ou./tests |
Propósito | Package integration and unit tests. |
Starting out, a small test suite will often exist in a single file:
./test_sample.py
Once a test suite grows, you can move your tests to a directory, likeso:
tests/test_basic.pytests/test_advanced.py
Obviously, these test modules must import your packaged module to testit. You can do this a few ways:
- Expect the package to be installed in site-packages.
- Use a simple (butexplicit) path modification to resolve thepackage properly.
I highly recommend the latter. Requiring a developer to runsetup.pydevelop
to test an actively changingcodebase also requires them to have an isolated environment setup foreach instance of the codebase.
To give the individual tests import context, create atests/context.py
file:
importosimportsyssys.path.insert(0,os.path.abspath(os.path.join(os.path.dirname(__file__),'..')))importsample
Then, within the individual test modules, import the module like so:
from.contextimportsample
This will always work as expected, regardless of installation method.
Some people will assert that you should distribute your tests withinyour module itself – I disagree. It often increases complexity for yourusers; many test suites often require additional dependencies andruntime contexts.
Makefile¶
Localização | ./Makefile |
Propósito | Generic management tasks. |
If you look at most of my projects or any Pocoo project, you’ll notice aMakefile lying around. Why? These projects aren’t written in C… Inshort, make is an incredibly useful tool for defining generic tasks foryour project.
Sample Makefile:
init:pipinstall-rrequirements.txttest:py.testtests.PHONY:inittest
Other generic management scripts (e.g.manage.py
orfabfile.py
) belong at the root of the repository as well.
Regarding Django Applications¶
I’ve noticed a new trend in Django applications since the release ofDjango 1.4. Many developers are structuring their repositories poorlydue to the new bundled application templates.
How? Well, they go to their bare and fresh repository and run thefollowing, as they always have:
$ django-admin.py startproject samplesite
The resulting repository structure looks like this:
README.rstsamplesite/manage.pysamplesite/samplesite/settings.pysamplesite/samplesite/wsgi.pysamplesite/samplesite/sampleapp/models.py
Don’t do this.
Repetitive paths are confusing for both your tools and your developers.Unnecessary nesting doesn’t help anybody (unless they’re nostalgic formonolithic SVN repos).
Let’s do it properly:
$ django-admin.py startproject samplesite .
Note the “.
”.
The resulting structure:
README.rstmanage.pysamplesite/settings.pysamplesite/wsgi.pysamplesite/sampleapp/models.py
Structure of Code is Key¶
Thanks to the way imports and modules are handled in Python, it isrelatively easy to structure a Python project. Easy, here, meansthat you do not have many constraints and that the moduleimporting model is easy to grasp. Therefore, you are left with thepure architectural task of crafting the different parts of yourproject and their interactions.
Easy structuring of a project means it is also easyto do it poorly. Some signs of a poorly structured projectinclude:
- Multiple and messy circular dependencies: If the classesTable and Chair in
furn.py
need to import Carpenter fromworkers.py
to answer a question such astable.isdoneby()
,and if conversely the class Carpenter needs to import Table and Chairto answer the questioncarpenter.whatdo()
, then youhave a circular dependency. In this case you will have to resort tofragile hacks such as using import statements inside yourmethods or functions. - Hidden coupling: Each and every change in Table’s implementationbreaks 20 tests in unrelated test cases because it breaks Carpenter’s code,which requires very careful surgery to adapt to the change. This meansyou have too many assumptions about Table in Carpenter’s code or thereverse.
- Heavy usage of global state or context: Instead of explicitlypassing
(height,width,type,wood)
to each other, Tableand Carpenter rely on global variables that can be modifiedand are modified on the fly by different agents. You need toscrutinize all access to these global variables in order to understand whya rectangular table became a square, and discover that remotetemplate code is also modifying this context, messing withthe table dimensions. - Spaghetti code: multiple pages of nested if clauses and for loopswith a lot of copy-pasted procedural code and noproper segmentation are known as spaghetti code. Python’smeaningful indentation (one of its most controversial features) makesit very hard to maintain this kind of code. The good news is thatyou might not see too much of it.
- Ravioli code is more likely in Python: it consists of hundreds ofsimilar little pieces of logic, often classes or objects, withoutproper structure. If you never can remember, if you have to useFurnitureTable, AssetTable or Table, or even TableNew for yourtask at hand, then you might be swimming in ravioli code.
Modules¶
Python modules are one of the main abstraction layers available and probably themost natural one. Abstraction layers allow separating code into parts holdingrelated data and functionality.
For example, a layer of a project can handle interfacing with user actions,while another would handle low-level manipulation of data. The most natural wayto separate these two layers is to regroup all interfacing functionalityin one file, and all low-level operations in another file. In this case,the interface file needs to import the low-level file. This is done with theimport
andfrom...import
statements.
As soon as you useimport statements, you use modules. These can be eitherbuilt-in modules such asos andsys, third-party modules you have installedin your environment, or your project’s internal modules.
To keep in line with the style guide, keep module names short, lowercase, andbe sure to avoid using special symbols like the dot (.) or question mark (?).A file name likemy.spam.py
is the one you should avoid! Naming this waywill interfere with the way Python looks for modules.
In the case ofmy.spam.py Python expects to find aspam.py
file in afolder namedmy
which is not the case. There is anexample of how thedot notation should be used in the Python docs.
If you like, you could name your modulemy_spam.py
, but even our trustyfriend the underscore, should not be seen that often in module names. However, using othercharacters (spaces or hyphens) in module names will prevent importing(- is the subtract operator). Try to keep module names short so there isno need to separate words. And, most of all, don’t namespace with underscores; use submodules instead.
# OKimportlibrary.plugin.foo# not OKimportlibrary.foo_plugin
Aside from some naming restrictions, nothing special is required for a Pythonfile to be a module. But you need to understand the import mechanism in orderto use this concept properly and avoid some issues.
Concretely, theimportmodu
statement will look for the proper file, whichismodu.py
in the same directory as the caller, if it exists. If it isnot found, the Python interpreter will search formodu.py
in the “path”recursively and raise an ImportError exception when it is not found.
Whenmodu.py
is found, the Python interpreter will execute the module inan isolated scope. Any top-level statement inmodu.py
will be executed,including other imports if any. Function and class definitions are stored inthe module’s dictionary.
Then, the module’s variables, functions, and classes will be available to thecaller through the module’s namespace, a central concept in programming that isparticularly helpful and powerful in Python.
In many languages, anincludefile
directive is used by the preprocessor totake all code found in the file and ‘copy’ it into the caller’s code. It isdifferent in Python: the included code is isolated in a module namespace, whichmeans that you generally don’t have to worry that the included code could haveunwanted effects, e.g. override an existing function with the same name.
It is possible to simulate the more standard behavior by using a special syntaxof the import statement:frommoduimport*
. This is generally consideredbad practice.Usingimport*
makes the code harder to read and makesdependencies less compartmentalized.
Usingfrommoduimportfunc
is a way to pinpoint the function you want toimport and put it in the local namespace. While much less harmful thanimport*
because it shows explicitly what is imported in the local namespace, itsonly advantage over a simplerimportmodu
is that it will save a littletyping.
Very bad
[...]frommoduimport*[...]x=sqrt(4)# Is sqrt part of modu? A builtin? Defined above?
Better
frommoduimportsqrt[...]x=sqrt(4)# sqrt may be part of modu, if not redefined in between
Best
importmodu[...]x=modu.sqrt(4)# sqrt is visibly part of modu's namespace
As mentioned in theEstilo de código section, readability is one of the mainfeatures of Python. Readability means to avoid useless boilerplate text andclutter; therefore some efforts are spent trying to achieve a certain level ofbrevity. But terseness and obscurity are the limits where brevity should stop.Being able to tell immediately where a class or function comes from, as in themodu.func
idiom, greatly improves code readability and understandability inall but the simplest single file projects.
Packages¶
Python provides a very straightforward packaging system, which is simply anextension of the module mechanism to a directory.
Any directory with an__init__.py
file is considered a Python package.The different modules in the package are imported in a similar manner as plainmodules, but with a special behavior for the__init__.py
file, which isused to gather all package-wide definitions.
A filemodu.py
in the directorypack/
is imported with thestatementimportpack.modu
. This statement will look for__init__.py
file inpack
and execute all of its top-levelstatements. Then it will look for a file namedpack/modu.py
andexecute all of its top-level statements. After these operations, any variable,function, or class defined inmodu.py
is available in the pack.modunamespace.
A commonly seen issue is adding too much code to__init__.py
files. When the project complexity grows, there may be sub-packages andsub-sub-packages in a deep directory structure. In this case, importing asingle item from a sub-sub-package will require executing all__init__.py
files met while traversing the tree.
Leaving an__init__.py
file empty is considered normal and even goodpractice, if the package’s modules and sub-packages do not need to share anycode.
Lastly, a convenient syntax is available for importing deeply nested packages:importvery.deep.moduleasmod
. This allows you to usemod in place of theverbose repetition ofvery.deep.module
.
Object-oriented programming¶
Python is sometimes described as an object-oriented programming language. Thiscan be somewhat misleading and requires further clarifications.
In Python, everything is an object, and can be handled as such. This is what ismeant when we say, for example, that functions are first-class objects.Functions, classes, strings, and even types are objects in Python: like anyobject, they have a type, they can be passed as function arguments, and theymay have methods and properties. In this understanding, Python can be consideredas an object-oriented language.
However, unlike Java, Python does not impose object-oriented programming as themain programming paradigm. It is perfectly viable for a Python project to notbe object-oriented, i.e. to use no or very few class definitions, classinheritance, or any other mechanisms that are specific to object-orientedprogramming languages.
Moreover, as seen in themodules section, the way Python handles modules andnamespaces gives the developer a natural way to ensure theencapsulation and separation of abstraction layers, both being the most commonreasons to use object-orientation. Therefore, Python programmers have morelatitude as to not use object-orientation, when it is not required by the businessmodel.
There are some reasons to avoid unnecessary object-orientation. Definingcustom classes is useful when we want to glue some state and somefunctionality together. The problem, as pointed out by the discussions about functionalprogramming, comes from the “state” part of the equation.
In some architectures, typically web applications, multiple instances of Pythonprocesses are spawned as a response to external requests that happen simultaneously.In this case, holding some state in instantiated objects, whichmeans keeping some static information about the world, is prone to concurrencyproblems or race conditions. Sometimes, between the initialization of the stateof an object (usually done with the__init__()
method) and the actual useof the object state through one of its methods, the world may have changed, andthe retained state may be outdated. For example, a request may load an item inmemory and mark it as read by a user. If another request requires the deletionof this item at the same time, the deletion may actually occur after the firstprocess loaded the item, and then we have to mark a deleted object as read.
This and other issues led to the idea that using stateless functions is abetter programming paradigm.
Another way to say the same thing is to suggest using functions and procedureswith as few implicit contexts and side-effects as possible. A function’simplicit context is made up of any of the global variables or items in thepersistence layer that are accessed from within the function. Side-effects arethe changes that a function makes to its implicit context. If a function savesor deletes data in a global variable or in the persistence layer, it is said tohave a side-effect.
Carefully isolating functions with context and side-effects from functions withlogic (called pure functions) allows the following benefits:
- Pure functions are deterministic: given a fixed input,the output will always be the same.
- Pure functions are much easier to change or replace if they need tobe refactored or optimized.
- Pure functions are easier to test with unit tests: There is lessneed for complex context setup and data cleaning afterwards.
- Pure functions are easier to manipulate, decorate, and pass around.
In summary, pure functions are more efficient building blocks than classesand objects for some architectures because they have no context or side-effects.
Obviously, object-orientation is useful and even necessary in many cases, forexample when developing graphical desktop applications or games, where thethings that are manipulated (windows, buttons, avatars, vehicles) have arelatively long life of their own in the computer’s memory.
Decorators¶
The Python language provides a simple yet powerful syntax called ‘decorators’.A decorator is a function or a class that wraps (or decorates) a functionor a method. The ‘decorated’ function or method will replace the original‘undecorated’ function or method. Because functions are first-class objectsin Python, this can be done ‘manually’, but using the @decorator syntax isclearer and thus preferred.
deffoo():# do somethingdefdecorator(func):# manipulate funcreturnfuncfoo=decorator(foo)# Manually decorate@decoratordefbar():# Do something# bar() is decorated
This mechanism is useful for separating concerns and avoidingexternal unrelated logic ‘polluting’ the core logic of the functionor method. A good example of a piece of functionality that is better handledwith decoration ismemoization or caching: you want to store the results of anexpensive function in a table and use them directly instead of recomputingthem when they have already been computed. This is clearly not partof the function logic.
Context Managers¶
A context manager is a Python object that provides extra contextual informationto an action. This extra information takes the form of running a callable uponinitiating the context using thewith
statement, as well as running a callableupon completing all the code inside thewith
block. The most well knownexample of using a context manager is shown here, opening on a file:
withopen('file.txt')asf:contents=f.read()
Anyone familiar with this pattern knows that invokingopen
in this fashionensures thatf
’sclose
method will be called at some point. This reducesa developer’s cognitive load and makes the code easier to read.
There are two easy ways to implement this functionality yourself: using a classor using a generator. Let’s implement the above functionality ourselves, startingwith the class approach:
classCustomOpen(object):def__init__(self,filename):self.file=open(filename)def__enter__(self):returnself.filedef__exit__(self,ctx_type,ctx_value,ctx_traceback):self.file.close()withCustomOpen('file')asf:contents=f.read()
This is just a regular Python object with two extra methods that are usedby thewith
statement. CustomOpen is first instantiated and then its__enter__
method is called and whatever__enter__
returns is assigned tof
in theasf
part of the statement. When the contents of thewith
blockis finished executing, the__exit__
method is then called.
And now the generator approach using Python’s owncontextlib:
fromcontextlibimportcontextmanager@contextmanagerdefcustom_open(filename):f=open(filename)try:yieldffinally:f.close()withcustom_open('file')asf:contents=f.read()
This works in exactly the same way as the class example above, albeit it’smore terse. Thecustom_open
function executes until it reaches theyield
statement. It then gives control back to thewith
statement, which assignswhatever wasyield
’ed tof in theasf
portion. Thefinally
clauseensures thatclose()
is called whether or not there was an exception insidethewith
.
Since the two approaches appear the same, we should follow the Zen of Pythonto decide when to use which. The class approach might be better if there’sa considerable amount of logic to encapsulate. The function approachmight be better for situations where we’re dealing with a simple action.
Dynamic typing¶
Python is dynamically typed, which means that variables do not have a fixedtype. In fact, in Python, variables are very different from what they are inmany other languages, specifically statically-typed languages. Variables are nota segment of the computer’s memory where some value is written, they are ‘tags’or ‘names’ pointing to objects. It is therefore possible for the variable ‘a’ tobe set to the value 1, then the value ‘a string’, to a function.
The dynamic typing of Python is often considered to be a weakness, and indeedit can lead to complexities and hard-to-debug code. Something named ‘a’ can beset to many different things, and the developer or the maintainer needs to trackthis name in the code to make sure it has not been set to a completely unrelatedobject.
Some guidelines help to avoid this issue:
- Avoid using the same variable name for different things.
Mau
a=1a='a string'defa():pass# Do something
Bom
count=1msg='a string'deffunc():pass# Do something
Using short functions or methods helps to reduce the riskof using the same name for two unrelated things.
It is better to use different names even for things that are related,when they have a different type:
Mau
items='a b c d'# This is a string...items=items.split(' ')# ...becoming a listitems=set(items)# ...and then a set
There is no efficiency gain when reusing names: the assignmentswill have to create new objects anyway. However, when the complexitygrows and each assignment is separated by other lines of code, including‘if’ branches and loops, it becomes harder to ascertain what a givenvariable’s type is.
Some coding practices, like functional programming, recommend never reassigninga variable. In Java this is done with thefinal keyword. Python does not haveafinal keyword and it would be against its philosophy anyway. However, it maybe a good discipline to avoid assigning to a variable more than once, and ithelps in grasping the concept of mutable and immutable types.
Mutable and immutable types¶
Python has two kinds of built-in or user-defined types.
Mutable types are those that allow in-place modification of the content. Typicalmutables are lists and dictionaries: All lists have mutating methods, likelist.append()
orlist.pop()
, and can be modified in place.The same goes for dictionaries.
Immutable types provide no method for changing their content. For instance, thevariable x set to the integer 6 has no “increment” method. If you want tocompute x + 1, you have to create another integer and give it a name.
my_list=[1,2,3]my_list[0]=4print(my_list)# [4, 2, 3] <- The same list has changedx=6x=x+1# The new x is another object
One consequence of this difference in behavior is that mutabletypes are not “stable”, and therefore cannot be used as dictionarykeys.
Using properly mutable types for things that are mutable in natureand immutable types for things that are fixed in naturehelps to clarify the intent of the code.
For example, the immutable equivalent of a list is the tuple, createdwith(1,2)
. This tuple is a pair that cannot be changed in-place,and can be used as a key for a dictionary.
One peculiarity of Python that can surprise beginners is thatstrings are immutable. This means that when constructing a string fromits parts, appending each part to the string is inefficient becausethe entirety of the string is copied on each append.Instead, it is much more efficient to accumulate the parts in a list,which is mutable, and then glue (join
) the parts together when thefull string is needed. List comprehensions are usually the fastest andmost idiomatic way to do this.
Mau
# create a concatenated string from 0 to 19 (e.g. "012..1819")nums=""forninrange(20):nums+=str(n)# slow and inefficientprint(nums)
Better
# create a concatenated string from 0 to 19 (e.g. "012..1819")nums=[]forninrange(20):nums.append(str(n))print("".join(nums))# much more efficient
Best
# create a concatenated string from 0 to 19 (e.g. "012..1819")nums=[str(n)forninrange(20)]print("".join(nums))
One final thing to mention about strings is that usingjoin()
is not alwaysbest. In the instances where you are creating a new string from a pre-determinednumber of strings, using the addition operator is actually faster. But in caseslike above or in cases where you are adding to an existing string, usingjoin()
should be your preferred method.
foo='foo'bar='bar'foobar=foo+bar# This is goodfoo+='ooo'# This is bad, instead you should do:foo=''.join([foo,'ooo'])
Nota
You can also use the% formatting operatorto concatenate a pre-determined number of strings besidesstr.join()
and+
. However,PEP 3101 discourages the usage of the%
operatorin favor of thestr.format()
method.
foo='foo'bar='bar'foobar='%s%s'%(foo,bar)# It is OKfoobar='{0}{1}'.format(foo,bar)# It is betterfoobar='{foo}{bar}'.format(foo=foo,bar=bar)# It is best