Movatterモバイル変換

Maxim Maeder · Abdeladim Fadheli · 7 min read · Updated nov 2022 ·General Python Tutorials

Turn your code into any language with ourCode Converter. It's the ultimate tool for multi-language programming. Start converting now!

In this article, we will make a Python program that will search for classes used in all HTML files in a project and then search and compile these files from the CSS files. The program will serve a specific purpose as it will match classes strictly; which meansbg-black won'tbg-black:hover, the used classes have to appear in the stylesheets as they are used.

This way of minimizing is useful for utility classes such aswidth-800px orcolor-grey-800 that only change on the property. Now maybe your utility classes also entail something like this pattern:child-margin-2rem which in the stylesheet is actuallychild-margin-2rem > *, this won't match by default, but we will make it possible to replace such patterns with the appropriate selector.

Finally, you can change the code, so the minified works better for your case, or you could even redo it on your own with the knowledge gained.

We will utilize a CSS Library called CSSUtils that allows us to parse, read and write CSS.

Imports

Let's start with the modules and libraries we have to import for our little program. The most important will becssutils which has to be downloaded withpip install cssutils. We also want to importre,os,time. We get thelogging module simply to turn off logging becausecssutils throws a lot of errors. We then clear the console withos.system("cls") and save the program's start time to a variable.

import cssutilsimport reimport loggingimport osimport timecssutils.log.setLevel(logging.CRITICAL)startTime = time.time()os.system('cls')

Getting the Files

Firstly we get lists of files ending in.html and.css. We save these lists for later.

htmlFiles = getFilesByExtension('.html', '.')cssFiles = getFilesByExtension('.css', 'style')

Let's also go over the function that searches for all these files. Keep in mind it has to be defined before its usage. Here we use theos.walk() that receives a path and return data about each subdirectory and the directory itself.

We only need the files, which are the third item of the returned tuple. We loop over these, and if they end with the specified extension, we add them to thefoundFiles list. Lastly, we also need to return this list:

def getFilesByExtension(ext, root):    foundFiles = []    for root, directories, files in os.walk(root):        for f in files:            if f.endswith(ext):                # os.path.join(root, f) is the full path to the file                foundFiles.append(os.path.join(root, f))     return foundFiles

Finding all Used Classes

Next up, we want to find all the used classes in all the HTML files. To do this, we first create a dictionary to store each class name as an item, so we don't have duplicates in the end.

We then loop over all HTML files, and for each one, we get the contentusing a regular expression to find all class strings.

Continuing, we split each of these found strings because classes are separated by a space.Lastly, we return the keys of found list dictionary, which are the classes.

usedClasses = findAllCSSClasses()def findAllCSSClasses():    usedClasses = {}    # Find all used classes    for htmlFile in htmlFiles:        with open(htmlFile, 'r') as f:            htmlContent = f.read()        regex = r'class="(.*?)"'        # re.DOTALL is needed to match newlines        matched = re.finditer(regex, htmlContent, re.MULTILINE | re.DOTALL)         # matched is a list of re.Match objects        for i in matched:            for className in i.groups()[0].split(' '): # i.groups()[0] is the first group in the regex                usedClasses[className] = ''    return list(usedClasses.keys())

Translating used Classes

Now we translate some classes; this is useful if the class name won't exactly match the selector, but it follows a pattern like all classes starting withchild- have> * appended to their selector, and here we handle this. We define each translation in a list where the first item is the regex and the second is the replacement:

# Use Translations if the class names in the Markup don't exactly # match the CSS Selector ( Except for the dot at the beginning. )translations = [    [        '@',        '\\@'    ],    [        r"(.*?):(.*)",        r"\g<1>\\:\g<2>:\g<1>",    ],    [        r"child(.*)",        "child\\g<1> > *",    ],]usedClasses = translateUsedClasses(usedClasses)

In the function, we then loop over each regex for each class so every translation is potentially applied to each class name. We then simply apply the replacement with there.sub() method.

def translateUsedClasses(classList):    for i, usedClass in enumerate(classList):        for translation in translations:            # If the class is found in the translations list, replace it            regex = translation[0]            subst = translation[1]            if re.search(regex, usedClass):                # re.sub() replaces the regex with the subst                result = re.sub(regex, subst, usedClass, 1, re.MULTILINE) # 1 is the max number of replacements                # Replace the class in the list                classList[i] = result    return classList

Getting Used Classes from the Stylesheets

After that, we get the style definition from the stylesheets withcssutils. Before we loop over the found style sheets, we first define the path of the minified CSS, which in this case ismin.css, we also create a variable callednewCSS that will hold the new CSS content.

output = 'min.css'newCSS = ''

We continue by looping over all CSS files. We parse each file withcssutils.parsefile(path) and get all the rules in the style sheet with the customflattenStyleSheet() function, we later go over how it works, but it will essentially put all rules hidden inside media queries into the same list as top-level rules. then we define a list that will hold all selector names that are not classes that we encounter. We do this because something likeinput should not be left out.

Then we loop over each rule and each class, and if the selector and selector text of the rule match up, we add the whole CSS text of the rule to thenewCSS string. We simply need to watch out if the rule has a parent rule which would be a media query. We do the same thing for all the rules, not starting with a dot:

for cssFile in cssFiles:    # Parse the CSS File    sheet = cssutils.parseFile(cssFile)    rules = flattenStyleSheet(sheet)    noClassSelectors = []    for rule in rules:        for usedClass in usedClasses:            if '.' + usedClass == rule.selectorText:                # If the class is used in the HTML, add it to the new CSS                usedClasses.remove(usedClass) # Remove the class from the list                if rule.parentRule:                    newCSS += str(rule.parentRule.cssText)                else:                    newCSS += str(rule.cssText)        if rule.selectorText[0] != '.' and not rule.selectorText in noClassSelectors:             # If the selector doesnt start with a dot and is not already in the list,            # add it            noClassSelectors.append(rule.selectorText)            if rule.parentRule:                newCSS += str(rule.parentRule.cssText)            else:                newCSS += str(rule.cssText)

`flattenStyleSheet()` function

Let's quickly go over theflattenstylesheet() function. It will receive the sheet as a parameter and loop over each rule in that sheet. Then it will check if the rule is simply a style rule or media rule so it can add all rules to a one-dimensional list.

def flattenStyleSheet(sheet):    ruleList = []    for rule in sheet.cssRules:        if rule.typeString == 'MEDIA_RULE':            ruleList += rule.cssRules        elif rule.typeString == 'STYLE_RULE':            ruleList.append(rule)    return ruleList

Saving New CSS

Lastly, we minify the CSS further by removing linebreaks and double spaces, and we save this new CSS to the specified location:

newCSS = newCSS.replace('\n', '')newCSS = newCSS.replace('  ', '')with open(output, 'w') as f:    f.write(newCSS)print('TIME: ', time.time() - startTime)

Running the Program

You must put your CSS files in the"style" folder to run the program. After that, put the HTML files in the current working directory (same as the .py file). Then run:

$ python minimize.pyTIME TOOK:  0.04069924354553223

This will print the time taken during the process, and a newmin.css file will appear in the current working directory.

Conclusion

Excellent! You have successfully created a CSS Minifier using Python code! See how to add more features to this program, such as a config file for further options. Also, keep in mind that this program could need some optimization since it runs very slowly on larger projects.

You can always wrap the entire code in a function to make it more usable, readable, and extendable in your projects.

Check the full codehere.

Learn also:How to Extract Script and CSS Files from Web Pages in Python

Happy coding ♥

Take the stress out of learning Python. Meet ourPython Code Assistant – your new coding buddy. Give it a whirl!

View Full Code Explain The Code for Me

Sharing is caring!

Comment panel

Got a coding query or need some guidance before you comment? Check out thisPython Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!