PLONE related

Just some tidbits regarding PLONE.
In no way I pretend to be publishing autoritative information as in the Plone site: not at all. I just wanted to put on-line my every-day little discoveries while I am working on Fefsi new site.


  • How to index PDF documents that are part of your custom Archetype
  • Plone can index standard "File" content in PDF form. The problem aise when you want to build your very own "File-based" content, the following is a resume of my findings, it could be superseded by future releases of Archetypes (I am using release 1.2.5).
    I will mainly focus on the programmatical side here, the necessary step to have a working configuration will be soon added in another item of this list (they are nevertheless platform-dependent, your mileage may vary). First of all, you need TextIndexNG, this is an alternate indexer integrated with written by Andreas Jung: it's highly customizable, multilingual and has relevance ranking, last, but not least has converters for PDF, HTML, Powerpoint, Word and Postscript documents...

    NB: you will need Xpdf and, more specifically, the shell command "pdftotext" along the support X libraries to be able to fully transform a PDF document in text.

    Supposing everything is in place and working let's go immediately to the code. Here is a very simple archetype that has just a description field and a file field where we will store the PDF document:

    from Products.Archetypes.public import *
    from Products.Archetypes.Marshall import PrimaryFieldMarshaller
    from DateTime import DateTime
    from Products.CMFCore import CMFCorePermissions
    from Products.CMFCore.utils import getToolByName
    from AccessControl import ClassSecurityInfo
    from Products.PortalTransforms.utils import TransformException
    from Products.PortalTransforms import *
    schema = BaseSchema + Schema((
        searchable = 1,
        default_output_type = 'text/html',
        allowable_content_types = ('text/plain',
        widget = EpozWidget(description = """ The description that
        appears in the document preview. If empty the teaser is
        used with a standard layout """,
        label = "HTML Body - Extended Description",
        rows = 15)

        searchable = 1,
        primary = 1,
        widget = FileWidget(description = """ The document on your     drive
        to be uploaded""",
        label = "Document to upload"),
    class DocumentItem(BaseContent):
    """Generic Document, is a container that can embed PDF files, """
        schema = schema
        # This is Adrea's code, it provides a hook called
        # txng_get(), when you add a TextIndexNG index on
        # SearchabeText the indexer 'senses' that you provided
        # your content with the hook and automagically uses it.
        # Here I think commenting following statement is the only way
        # to have the whole work, in the original
        # code it was declared as private ...
        # Any suggestion is welcomed.
        def txng_get(self, attr='SearchableText'):
        """Special searchable text source for text indexng2"""
            if attr[0] != 'SearchableText':
            # only a hook for searchable text
            source = ''
            mimetype = 'text/plain'
            encoding = 'utf-8'
            # stage 1: get the searchable text and convert it to utf8
            sp = getToolByName(self, 'portal_properties').site_properties
            stEnc = getattr(sp, 'default_charset', 'utf-8')
            st = self.SearchableText()
            #source+=unicode(st, stEnc).encode('utf-8')
            # get the file and try to convert it to utf8 text
            ptTool = getToolByName(self, 'portal_transforms')

            # Here you have to change the accessor method to your
            # specific needs: I access the field 'documentfile'
            # change it with your definition
            f = self.getDocumentfile()
            if f:
                mt = f.getContentType()
                    result = ptTool.convertTo('text/plain', str(f), mimetype=mt ).getData()
                except TransformException:
                    result = ''
            return (source, mimetype, encoding)
        # End of Andrea's code.

    registerType(DocumentItem, PROJECTNAME)

    The code is commented. Most of it should be sufficiently understandeable. As I mentioned in the listings I "borrowed" code from the Collective from Andreas Jung ATTypes.
    Before registering this Archetype we need to perform the following operations on Plone Catalog through the ZMI:

    - Open the Portal Transform Tool in the ZMI of your Plone site and add a transform with the following params:
    ID: pdf_to_text
    Module: Products.PortalTransforms.transforms.pdf_to_text

    - Add a TextIndexNG index. The index must have the following parameters:
    Name: SearchableText
    Indexed Attributes: SearchableText
    Use Converters: enabled
    The index can include other attributes (e.g. PrincipiaSearchSource or your custom definitions..), what is important is that 'SearchableText MUST stay first.

    - Regenerate the indexes.... Done.


  • "Restart" button is missing
  • This happens to be a characteristics of Plone 2.0RC5 and RC6 but seems to be an early Zope2.7 related issue.
    You always have the option to upgrade to a mainstream version but in case you can't here is how to let the missing button reappear:

    In the file-system of your machine open the following file:


    uncomment the line:

    cmdline = '%s/bin/zopectl start' % site_info.INSTANCE_HOME

    comment the line:

    cmdline = '%s/bin/runzope....

    Restart Plone manually (shell or Windows menu)
    In short: you make Zope start as a daemon through the zopectl script. Only this way you can have the button working.


  • Importing Users from SiteServer 3.0 to Plone
  • I have to clean some messy code before I publish this.. Let's say it will come soon.