Useful Perl

Ever want to have a simple script/program that:

  1. takes a few arguments
  2. reads in a text file
  3. checks to see if the arguments are already in the text file
  4. if they are add them, if not die?

If yes, then you might find the following little snippet of perl i hacked up for work useful:

my %values = ();
my @strs = ();
open DATA, "<test.txt";
while(<DATA>) { 
	@parts = split ':', $_;
	$parts[0] =~ s/^\s+|\s+$//g;
	$parts[1] =~ s/^\s+|\s+$//g;
	if ($parts[0] eq $ARGV[0]) { die "Already in List\n"; }
	$values{ $parts[0] } = $parts[1];
$values{$ARGV[0]} = $ARGV[1];
close DATA;
open DATA, ">test.txt";
for my $x ( sort keys %values ) {
	printf DATA "%11s: %s\n",$x,$values{$x};
close DATA;

CUDA Tests

Lately i've taken an interest in CUDA and it's associated programming interfaces. To test out just out powerful these video cards are for crunching large amounts of arithmetic, and thought the results needed to be shared.

The test code looks like this:

// includes, system
#include <stdlib.h>
// includes CUDA
#include <cuda_runtime.h>
#include <stdio.h>
#include <assert.h>
#include <cuda.h>
__global__ void deviceKernel(unsigned long long *in_a, unsigned long long N) {
	unsigned long long idx = blockIdx.x*blockDim.x + threadIdx.x;
	if (idx<N) {
		unsigned long long x = in_a[idx]; 
		unsigned long long smax = __double2ull_ru(__dsqrt_ru(__ull2double_ru(x)));
		int i;
		for (i=3;i<=smax;i+=2) {
			if ( x % i == 0 ) {
				in_a[idx] = 0;
int main(int argc, char **argv) {
	unsigned long long *search_h, *result_h;           // pointers to host memory
	unsigned long long *search_d;                 // pointer to device memory
	int search_max = atoll(argv[1]);	    // takes a cli argument
	int warpSize = 32;
	unsigned long long i, N ;
	N = ( search_max - 1 ) / 2;		    // search all odd numbers < search_max
	size_t size = N*sizeof(unsigned long long);
	search_h = (unsigned long long *)malloc(size);
	result_h = (unsigned long long *)malloc(size);
	cudaMalloc((void **) &search_d, size);
	// initialization of host data
	for (i=3; i<=search_max; i+=2) search_h[(i-1)/2 -1] = i;   // setup search array
	// copy data from host to device
	cudaMemcpy(search_d, search_h, sizeof(unsigned long long)*N, cudaMemcpyHostToDevice);
	int blockSize = 8*warpSize;
	int nBlocks = N/blockSize + (N%blockSize == 0?0:1); 
	deviceKernel <<< nBlocks, blockSize >>> (search_d, N);	// fire off the kernel!
	cudaThreadSynchronize();	// wait for the kernel to finish
	cudaMemcpy(result_h, search_d, sizeof(unsigned long long)*N, cudaMemcpyDeviceToHost);
	free(search_h); free(result_h); cudaFree(search_d); 
	return 1;

Pretty straight forward. This code block copies a search list on the device, and block copies it back after all work is done. This is pretty obviously not the best way to do things, but for testing purposes, the copy operations are not the limiting factor.

Test results

So, what we see here is O(NlogN) performance for prime finding. What's amazing to me is that this code takes only one second to find every prime number less than ten million!
The graph shows that the runtime and copy operations are the limiting factor up to about one million prime numbers. From there, this naive implementation takes off, crunching an absolutely obscene amount of arithmetic, on "unsigned long long"'s no less!

The source for the test is here: source . Sorry, i hacked up the graph manually.


Testing \LaTeX for WP plugin.

If everything is working, then the Shrodinger equation is:

-\frac{\hbar^2}{2m}\sum_i\frac{\partial^2}{\partial x_i}\Psi(\textbf{x},t) + V(\textbf{x},t)\Psi(\textbf{x},t) = i\hbar\frac{\partial}{\partial t}\Psi(\textbf{x},t)

Ill be curious to see if this has any effects or conflict with the PHP plugin

C2000 Launchpad Assembly only Software PWM Tutorial

So, i've completed an assembly project to build a software PWM program for the C2000 Launchpad.
C2000 is actually a remarkably strong platform for assembly only program with some powerful addressing modes,
many registers, and many powerful instructions. The platform to me reminisces of MSP430, which i had a wonderful
time hacking at with the FOSS tool-chain and VIM. Being forced to use CCS and controlSUITE is a minor annoyance,
(lets face it, it's the only practical way to develop for this platform) but its integration makes up for the
majority of it's shortcomings. Has anyone tried to build Cortex assembly projects with GNU tools and OpenOCD?
Thumb2 will make anyone human pull their hair out. Regardless of me being happy with the platform and environment,
without further ado, here is the code:


; main.asm
; By Daryl Metzler
	.sect	".text"		; Program data
	.global	_main
; memory howto (for the Watchdog)
;memory:		; a simple demo of direct addressing
;		MOVW	DP, #variable		; So this is how to load data
;										;directly from memory
;		MOVL 	XAR0, @variable		; The DP points to the page data is on,
										;then MOV takes offset
; turn off watchdog timer
wd_disable:				; We need to disable the WatchDog before beginning,
					; or the DSP will reset after 512 cycles
		EALLOW			;Enable EALLOW protected register access
		MOVZ DP, #7029h>>6	;Set data page for WDCR register
	    	MOV @7029h, #0068h	;Set WDDIS bit in WDCR to disable WD
		EDIS			;Disable protected register access
; interrupts
intenable:	; Allow interrupts
		CLRC VMAP	; map interrupts from boot rom to 0x0
pll_enable:		; Turn on the PLL
		MOVZ DP, #7021>>6	; PLLCR
		MOV @0x7029, #0xC	; Set PLLCR[DIV] = 12
		MOVZ DP, #7011>>6	; PLLSTS
		OR @0x7011, #0x100	; Set PLLSYS[DIVSEL] = 2
; CPU Timer 1
		OR IER, #0x1000	; Enable INT13 in IER
		MOVZ DP, #0xC0C>>6	; TIMER1TCR
		OR @0xC0C, #0x4000	; Set Interrupt enable in TIMER1TCR
		MOVZ DP, #0xC0A>>6	; TIMER1TCR
		MOV @0xC0A, #0x500	; Set TIMER1PRD period register
		MOVZ DP, #0xC0B>>6	;
		MOV @0xC0B, #0x0	; TIMER1PRDH
; GPIO <blinkys>
		MOVZ DP, #6F86h>>6	; GPA registers
		AND @6F86h, #0xFF00 	; Set GPIO1-4 for GPio
		MOVZ DP, #6FC2h>>6	; GPASET
		OR @6FC2h, #0x000F 	; Set GPIO1-4
		MOVZ DP, #0x6F8A>>6	; GPADIR
		MOV @6F8Ah, #0x000F	; Set direction
		MOVZ DP, #6FC0h>>6	; GPADAT
		AND @0x6FC0, #0xFFF0; Clear GPIO1-4 (LEDS ON)
		OR @0x6FC0, #0xFFFA 	; Set GPIO1-4
	; initialize sw_pwm
	MOV ACC, #0x0
	MOVL XAR0, #0x0
	MOVL XAR1, #0x40
	MOVL XAR2, #0x80
	MOVL XAR3, #0xC0
	MOVL XAR6, #0x0F
	MOVL XAR7, #0
	B onp0, LT
	OR AR6, #0x1
	B onp1, LT
	OR AR6, #0x2
	B onp2, LT
	OR AR6, #0x4
	B onp3, LT
	OR AR6, #0x8
onp3: nop
	CMP AR7, #200h
	B swpwmwrap, GEQ
	B	loop, UNC
	MOVL XAR7 , #0x0
	MOV AR6, #0h	; Clear GPIO1-4 (LEDS ON)
		; We're just incrementing the AR0-3 registers
		; Which pulses their respective LED
		ADD AL, #1
		CMP AL, #100h
		B modulus0, LT
		MOV AL, #0
		ADD AL, #1
		CMP AL, #100h
		B modulus1, LT
		MOV AL, #0
		ADD AL, #1
		CMP AL, #100h
		B modulus2, LT
		MOV AL, #0
		ADD AL, #1
		CMP AL, #100h
		B modulus3, LT
		MOV AL, #0
	B loop,UNC	; loop after fall off

We see here that main is using a lot of magic and most of the work done here is getting the hardware alive and configured.
The output is that the LEDs pulse and blink using PWM. Really, there is no sophisticated output in terms of what is done with
that PWM. The duty cycles increment modulo half the duty period. The C2000 Launchpad's LED's are wired in such a way that even
at 25% duty cycle, it is difficult for the eye to pick out the "dim" LED from one at full intensity.


	.sect ".text"
	.global int1irq
	.global int2irq
	.global timer1irq
		ADD AL, #0x1
		MOVZ DP, #6fC0h>>6
		MOVW @6FC0h, AR6	; Light up the LEDs by AR6 value

The interrupt routine "timer1irq" does little more than increment the PWM counter kept in AR7, and move the current LED GPIO value onto the GPADAT register.

Feel free to check out the directory to grab the code for yourself.

I hope someone finds this helpful, and that this launchpad gains a little more attention and steam with the community.

C2000 Launchpad on Fedora

I have recently received my C2000 Launchpad (from Mouser) and after much headache, finally have things up and working. Since the process was a little bit of a hassle, I figured i would do a post as a kind of Tutorial for bootstrapping the development process in Fedora 17 (as you may know my Distro of Choice). Hopefully this will fill a niche to get Fedora users from an unopened box all the way to custom Blinky application.

Firstly you should hit the following links:

TI's Pages:
TI's C2000 Launchpad page
TI's Wiki

Grab the following Documentation:
C2000 Launchpad Product Brief
The C2000 Launchpad User Guide

And the following Software:
Code Compose r Studio (get the current version for linux)

At this point you should be up to speed on what you have. The Example that comes pre-programmed is pretty interesting. In order to get serial working (if you roll a custom linux kernel like me), make sure that you have the following USB driver built into your kernel:

USB FTDI Single Port Serial Driver

as well as the necessary composite device support. I wasn't getting /dev/ttyUSB0 at first.

CCS Install

Installing CCS is hopefully straight-forward for you, grab the gzip, extract it, and run the included binary. When installing, i put my copy in /opt/ti. For convenience, i will refer to all paths as if you installed yours at /opt/ti, so i hope there's no confusion there.

During installation, choose a custom install, then make sure to include the C2000 libraries, the XDS100 emulator support as a minimum. This should save you some precious space on /.

controlSUITE Install

Next, install controlSUITE with wine into the same parent directory as CCS (again /opt/ti). controlSUITE isn't going to play properly on linux, which is okay as we only really need the headers and drivers, more on which later.

First Impressions

Begin by firing up CCS and selecting a License (i simply chose Evaluate).
Next, you'll find yourself at the Welcome screen. Select Resource Explorer and at the bottom of the screen you should find

"Configure Resource Explorer to discover examples, documentation and generates a resource package"

Add the top level controlSUITE directory here (/opt/ti/controlSUITE). Now search your new package for launchpad to find Example_F2802xLaunchPadDemo, literally the only Launchpad specific example code in controlSUITE.

The Example

Now for some interesting stuff. Try building this project by clicking the little hammer "build" icon. In my beginnings, many hiccups and errors came up. First, setup the target configuration. Click File->New->Target Configuration File. Click through the default on the dialog box that pops up. In the next screen select XDS100v2 USB Emulator, and using a search for 28027 find the TMS320F2807 device. Save this config, and test the connection. Hopefully you get a successful test. If not, check lsusb, lsmod, and your kernel config for the right drivers, to ensure that everything is in line with hardware support.

Next, lets fix a pesky bug. Browse in your project editor to:


In this file edit line 117 to change that last period into a comma, a bug i found out about in this post.

Linker path

Next, there is a problem with the include path for the linker. If an error comes up complaining about IQmath.lib, then add the include path manually. Right click the project in the explorer and select "Show Build Settings". Under

Build->C2000 Linker->File Search Path

remove the bare entry for IQmath.lib, and add a new entry with the full path: /opt/ti/controlSUITE/development_kits/C2000_LaunchPad/f2802x_common/lib/IQmath.lib

Your build and debug should now succeed!

Now for some Custom Blinking

Rather than reinvent the wheel, i forward you here. Pay special attention to the Update at the bottom in order to pull in the headers and driverlib.lib.


AVX Extensions

Testing out code inclusion with the Syntax highlighter backend.

AVX Extensions are a new type of mixed integer and floating point vector instructions that use 256 bit wide registers, similar to the vector processing capabilities of SSE, which used 128 bit registers.

	.file	"main.c"
	.align 	32
	.float 1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8
	.align 	32
	.rept	8	
	.float 0.0
	.align 4
	.globl	main
	.type	main, @function
	pushq	%rbp
	movq	%rsp, %rbp
	vmovdqa intset1, %ymm0
	vmovdqa intset1, %ymm1
	vmulps %ymm0, %ymm1, %ymm2
	vmovdqa %ymm2, intans
	movl	$0, %eax
	popq	%rbp
Fixed this code up, so to whoever might have been following this, now im seeing the correct output when i run this in the debugger.

Some interesting stuff here, backwards compatible with SSE if your kernel doesnt support AVX. (tested on the web server)
We move the string of eight (or four) floats into the vector registers ymm0 and ymm1 (or xmm0 and xmm1 for SSE) and vector multiply them into ymm2 (or xmm2). The result gets saved into bss. Really, there's no way to "see" this code work without the debugger, but the power of doing eight floating point multiplies in one clock cycle is amazing.


Cool new stuff, figured out howto connect and post via my Smarty-Phoneâ„¢...

Also been studying up on php to do some tweaks and tricks later with the menus and sidebars...

For those not familiar with WordPress as a CMS, a nifty feature is Custom Fields, which allow some neat extensibility and categorization tricks, like series of articles, subtitles, embedded pictures, etc.... thats another thing to play with soon...